Draft: Fix/kvm qmpbackup dangling bitmap autorecovery
Currently if qmpbackup is abrubtly stopped (like by server reboot) it results with unrecoverable problem:
main: Connecting QMP socket: [<PARTITION>/var/qmp_socket]
show_vm_state: VM is in state: [running]
show_version: Qemu version: [10.2.0] []
show_name: VM Name: [kvm0]
main: Backup target directory: <PARTITION>/srv/backup/kvm
main: Auto backup mode set to: inc
main: Enabling compress option for backup operation.
get_uuid: Current Backup UUID: [88367741-c591-49fe-81c0-16b10781480a] for folder [<PARTITION>/srv/backup/kvm]
get_block_devices: Excluding device with raw format from backup: [ide0-cd0:<PARTITION>/srv/boot-image-url-select-repository/6b6604d894b6d861e357be1447b370db]
main: Partial backup found in [<PARTITION>/srv/backup/kvm/virtio0], possible broken backup chain. Execute new full backup
main: Version: 0.47 Arguments: /opt/slapgrid/edc8c193b66b0d4d83a14259f3c319af/bin/qmpbackup --socket <PARTITION>/var/qmp_socket backup --compress --target <PARTITION>/srv/backup/kvm --include virtio0 --level auto
main: Connecting QMP socket: [<PARTITION>/var/qmp_socket]
show_vm_state: VM is in state: [running]
show_version: Qemu version: [10.2.0] []
show_name: VM Name: [kvm0]
main: Backup target directory: <PARTITION>/srv/backup/kvm
main: Auto backup mode set to: inc
main: Enabling compress option for backup operation.
get_uuid: Current Backup UUID: [88367741-c591-49fe-81c0-16b10781480a] for folder [<PARTITION>/srv/backup/kvm]
get_block_devices: Excluding device with raw format from backup: [ide0-cd0:<PARTITION>/srv/boot-image-url-select-repository/6b6604d894b6d861e357be1447b370db]
save_info: Saved image info: [<PARTITION>/srv/backup/kvm/virtual.qcow2.config]
create: No node name set for [<PARTITION>/srv/virtual.qcow2], falling back to device name: [virtio0]
create: Create target backup image: [<PARTITION>/srv/backup/kvm/virtio0/INC-1761647026-virtual.qcow2.partial], virtual size: [107374182400]
create: Create fleece image: [<PARTITION>/srv/INC-1761647026-virtio0.fleece.qcow2], virtual size: [107374182400]
prepare_target_devices: Attach backup target devices to virtual machine
main: Error executing backup: Duplicate nodes with node-name='qmpbackup-block197'
remove_snapshot_access_devices: Removing cbw devices from virtual machine
main: Unable to cleanup: Failed to find node with node-name='qmpbackup-block197-snap'
The operator can solve it by:
- sshing to the machine
- using
qemu-img info -U srv/virtual.qcow2and reading the bitmap - stopping the kvm service
- using
qemu-img bitmap --remove srv/virtual.qcow2 <BITMAP> - remove the partial files of the bad backup run
- starting the service
- initiating the backup
The autorecovery shall on start of the KVM service:
- check if there is dangling bitmap
- this can be very hard to do, as there might be a bitmap which is not dangling, so maybe reading the
etc/notifier/feeds/exportermight be required
- this can be very hard to do, as there might be a bitmap which is not dangling, so maybe reading the
- if there is, remove
- check for dangling .partial files, or it can be done better in the qmpbackup wrapper
- then next backup shall be full and good