milkey-mouse / backup-vm Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 9.0 85 KB

Back up a full image of a libvirt-based VM using Borg

License: MIT License

Python 100.00%

backup-vm's People

Contributors

Stargazers

Watchers

Forkers

rugk krbvroc1 m-beno albialbi grmrgecko w3bservice heartshare magma1447 ceeedevops

backup-vm's Issues

RAM snapshot backup

In order for libvirt to allow restoration, backup and restore ops would have to fully store the state of the snapshot in the backup so it can be tacked back onto the VM and switched to. I think this may be out of scope/not feasible with current libvirt API.

Right now --memory is completely broken (it doesn't even parse correctly); the flag should probably be taken out until this feature is implemented.

Integration into existing backup process

How is it intended to be integrated into an existing backup process? E.g. in my script a usual backup is done. If I wanted to include VM images inside of it, I would need to dump these and then backup everything.

But this script automatically executes the backup, so is it designed to have an extra repo just for VM images? (In the usual use case, i think, I want to backup both, usual data and VMs) Or may I just backup the VMs into the same repo, but with a different archive name?

generate tarball to feed to borg import-tar?

See borgbackup/borg#3731 (comment). Would only work if borgbackup/borg#2233 happens.

Save temp snapshots in temp dir

That does not look good:

backup-vm/backup-vm.py

Lines 365 to 367 in 12b57ff

 # we probably can't write the temporary snapshot to the same directory 

 # as the original disk, so use the default libvirt images directory 

 disk.snapshot_path = os.path.join("/var/lib/libvirt/images", filename)

Would not it be better if backups are saved in the temp dir (/tmp), which is usually intended for that purpose?

Block copy still active: disk 'vda' not ready for pivot yet

I am having some issues with backup-vm. The below is happening to me fairly often. Seems to be more frequent for some VMs than others, which could be depending on the load/io in the guest I guess.

Most VMs have worked on first try. Some have required 2-3 tries. But I have one that has not yet worked at all, after 5-6 tries. It has however finished vda once or twice, but then it got stuck on vdb instead, so it's a bit random on that one as well. With the plan to run backups automatically daily or weekly this is a bit of an issue.

backup progress: 100%
libvirt: error code 83: block copy still active: disk 'vda' not ready for pivot yet
Traceback (most recent call last):
File "/usr/local/bin/backup-vm", line 11, in
load_entry_point('backup-vm==0.1.dev30+gf2d6dfd', 'console_scripts', 'backup-vm')()
File "/usr/local/lib/python3.7/dist-packages/backup_vm-0.1.dev30+gf2d6dfd-py3.7.egg/backup_vm/backup.py", line 54, in main
borg_failed = multi.assimilate(args.archives)
File "/usr/local/lib/python3.7/dist-packages/backup_vm-0.1.dev30+gf2d6dfd-py3.7.egg/backup_vm/snapshot.py", line 175, in exit
self.blockcommit(disks_to_backup)
File "/usr/local/lib/python3.7/dist-packages/backup_vm-0.1.dev30+gf2d6dfd-py3.7.egg/backup_vm/snapshot.py", line 105, in blockcommit
if self.dom.blockJobAbort(disk.target, libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) < 0:
File "/usr/lib/python3/dist-packages/libvirt.py", line 784, in blockJobAbort
if ret == -1: raise libvirtError ('virDomainBlockJobAbort() failed', dom=self)
libvirt.libvirtError: block copy still active: disk 'vda' not ready for pivot yet

I don't understand if the issue is in backup-vm or libvirt actually. I found this old, but recently fixed bug in Ubuntu. It's only for older releases though. The package libvirt doesn't exist in neither Ubuntu or Debian anymore.
https://launchpad.net/ubuntu/+source/libvirt/1.3.1-1ubuntu10.29

When the above happens I can see this:

virsh blockjob pgc-srtm-01 vda --info
Active Block Commit: [100 %]

virsh domblklist pgc-srtm-01
Target Source
vda /var/lib/libvirt/images/pgc-srtm-01-vda-tempsnap.qcow2
vdb /dev/pgc-kvm-04/pgc-srtm-01_srtm

I can fix the state by running these two commands:

virsh blockjob pgc-srtm-01 vda --abort
virsh blockcommit pgc-srtm-01 vda --active --verbose --pivot

Sometimes it does however leave the qcow2 in /var/lib/libvirt/images/, and also still links it in the xml. It seems to work fine to just remove the qcow2 file, edit the xml and virsh define it again.

My VMs uses LVM logical volumes as storage back-end. System is Debian Stable (Buster), fully upgraded.

I have also experienced the error message seen in issue #15. Not sure if it's related, but I feel that it could be.

Since I can reproduce it, anything I can do to provide more information?

Otherwise, thank you for a great software. I am really hoping to get it working in my environment as well. It does almost everything I wish for!

Makefile

Great project! And it would be even better with a Makefile.

How to pass additional commands to script?

AFAIK as you call borg directly in the script, it may be hard to pass additional commands to it. E.g. you cannot configure --compression, --chunker-params, etc.

Can't we use this in a more flexible way? Or is there no reason to configure it in such a way? (if so… why?)

setuptools_scm versioning

https://pypi.org/project/setuptools_scm/
https://pypi.org/project/setuptools_scm_git_archive/

Execution fails for empty CD drives

If there is a cd drive with no .iso mounted, the execution of the script fails.

  File "/usr/bin/backup-vm", line 11, in <module>
    load_entry_point('backup-vm==0.1.dev17+gce32c59.d20171207', 'console_scripts', 'backup-vm')()
  File "/usr/lib/python3.6/site-packages/backup_vm-0.1.dev17+gce32c59.d20171207-py3.6.egg/backup_vm/backup.py", line 28, in main
    all_disks = set(parse.Disk.get_disks(dom))
  File "/usr/lib/python3.6/site-packages/backup_vm-0.1.dev17+gce32c59.d20171207-py3.6.egg/backup_vm/parse.py", line 187, in get_disks
    yield from {d for d in map(cls, tree.findall("devices/disk")) if d.type is not None}
  File "/usr/lib/python3.6/site-packages/backup_vm-0.1.dev17+gce32c59.d20171207-py3.6.egg/backup_vm/parse.py", line 187, in <setcomp>
    yield from {d for d in map(cls, tree.findall("devices/disk")) if d.type is not None}
  File "/usr/lib/python3.6/site-packages/backup_vm-0.1.dev17+gce32c59.d20171207-py3.6.egg/backup_vm/parse.py", line 163, in __init__
    if len(xml.find("source").attrib.items()) >= 1:
AttributeError: 'NoneType' object has no attribute 'attrib'

The issue is that the following line:

backup-vm/backup_vm/parse.py

Line 162 in ce32c59

if len(xml.find("source").attrib.items()) >= 1:

is looking for attributes in the "source" entry in the XML, which does not exist in the cd-rom block when there is no .iso file loaded.

backup fails with internal error from libvirt: block name doesn't match

Since a couple of days I'm using "backup-vm" for some qemu/libvirt VMs, so far mostly successful.
Today, a backup failed with the following error:

starting backup
libvirt: error code 1: internal error: qemu block name '/dev/vg_data01/mail2-sys
tem' doesn't match expected '/var/lib/libvirt/images/mail2-sda-tempsnap.qcow2'
Traceback (most recent call last):
  File "/usr/local/bin/backup-vm", line 11, in <module>
    load_entry_point('backup-vm==0.1.dev28+g442ce38', 'console_scripts', 'backup
-vm')()
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/backup.py", line 54, in main
    borg_failed = multi.assimilate(args.archives)
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/snapshot.py", line 175, in __exit__
    self.blockcommit(disks_to_backup)
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/snapshot.py", line 81, in blockcommit
    | libvirt.VIR_DOMAIN_BLOCK_COMMIT_SHALLOW) < 0:
  File "/usr/local/lib64/python3.6/site-packages/libvirt.py", line 701, in block
Commit
    if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirt.libvirtError: internal error: qemu block name '/dev/vg_data01/mail2-syst
em' doesn't match expected '/var/lib/libvirt/images/mail2-sda-tempsnap.qcow2'

The first backup of this VM a day before completd without errors, so either the first backup left the VM in some state which caused problems during the next run, or there was some non-deterministic (e.g. timing-dependent) issue in the second run.

The VM (called mail2) has three disks (LVM logical volums):

sda  /dev/vg_data01/mail2_system
sdb  /dev/vg_data01/mail2_swap
sdc  /dev/vg_data01/mail2_data

After the failed backup, the VMs disks were in the following state:

[root@sabavm1 ~]# virsh domblklist mail2
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/mail2-sda-tempsnap.qcow2
sdb        /var/lib/libvirt/images/mail2-sdb-tempsnap.qcow2
sdc        /dev/vg_data01/mail2-data

I tried then to remove the snapshots manually, but only sdb was succesful:

[root@sabavm1 ~]# virsh blockcommit mail2 sda --verbose --pivot
error: internal error: unable to find backing name for device drive-scsi0-0-0-0

[root@sabavm1 ~]# virsh blockcommit mail2 sdb --verbose --pivot
Block commit: [100 %]
Successfully pivoted

Next, I've shut the VM down and restarted it again. After I did that I was able to remove the snapshot and the status of the disks was back to normal:

[root@sabavm1 ~]# virsh blockcommit mail2 sda --verbose --pivotBlock commit: [100 %]
Successfully pivoted
[root@sabavm1 ~]# virsh domblklist mail2
Target     Source
------------------------------------------------
sda        /dev/vg_data01/mail2-system
sdb        /dev/vg_data01/mail2-swap
sdc        /dev/vg_data01/mail2-data

I wonder if this is an issue with libvirt and/or qemu (I have libvirt version 4.0.0 and qemu 2.9.0) or with "backup-vm".
What could I do to debug things further?

Automatic restore script

Retry block commit in case of failure

Retry the commit part up to 3x, waiting 5 seconds between each try. The commit sometimes fails on my machine.
Perhaps add backup-vm --retry-merge domain option or something to run domBlockJobAbort on all existing disks.

Python 3.5 required?

The README says that Python >=3.4 is required, however, with 3.4 I am getting this error:
File "setup.py", line 22 lines = [*self.format_readme(f)] ^ SyntaxError: can use starred expression only as assignment target

I am by no means an expert with Python, quite the opposite, and may therefore be wrong. But my Google skills says that this syntax requires Python 3.5.

Dump disk images using qemu

If a VM has complex chains of disks (e.g. you want to back up snapshots already created, or even just having a qcow2 with another backing disk before running) not all the content in the VM would truly be backed up, just the last overlay image.

It should be fine to recursively run qemu-img on the disks in the domain so it can be sure images aren't depending on other images that should be backed up (and this option should have a CLI flag, because in the case of e.g. a common fresh Debian install base and overlay images with different software) it would be annoying to have many copies of the base image.

A better solution might be to read the disks the same way as qemu itself, which could probably be accomplished with a simple qemu-img convert -O raw <image> -. (The image should always be exported as raw regardless of input format because borg will do its own deduplication & compression.)

Test with other disk backing types

libvirt has disk types other than the file and block that I've tried. See the libvirt docs (specifically the section on the source attribute).

Auto-update usage in README.rst

When restore-vm (#1) and borg-multi (#6) are both in master, remember to update README.rst with the new content.
Perhaps implement a setup.py build_usage or something, similar to borg? I don't think github includes documents from rst's include directive, so it would have to edit the document in-place. One approach would be to put comments on the lines in the README before and after the usage snippets should be auto-inserted, and generate them by iterating through the entry_points.

Alternative backup engines

I'm currently evaluating a couple of other backup engines which implement public key crypto e.g. https://github.com/dpc/rdedup and I'd like to adapt backup-vm to work with this.

This could either be a fork which shares some of the same code (but no longer supports borg), or a version of backup-vm which supports both backup systems (probably more work in the short term, but better in the long term). It's not really clear to me which would be preferable.

Any thoughts?

Add unit tests

should probably mock libvirt instead of relying on the actual library. I'm leaning towards using the builtin unittest module instead of nose or py.test since it's supported on all python versions backup-vm is targeting (and because of my irrational bias towards included libs).

As of now anything that makes it into master passes the "rigorous" "test" of my daily backup script; master will be unreliable until v1.0.

Move multi-archive handling to separate script

The multiple-archive support is probably going to be useful to more people than just the VM aspect anyway, so separating the part of the script that automatically launches multiple borg instances, calculates total progress percentage, deduplicates prompts, etc. might be good. backup-vm could still call it via subprocess. It could be called borg-multi or something.

Backup failed with permission denied; now snapshot fails

First: great project, thanks for publishing it! Now my issue:

I tried to backup a live VM and it failed with this error:

libvirt.libvirtError: internal error: unable to execute QEMU command 'block-commit': Could not reopen file: Permission denied

Then I manually deleted the snapshot image. Now it fails with this:

libvirt: error code 1: internal error: unable to execute QEMU command 'transaction': Error: Trying to create an image with the same filename as the backing file 
Failed to create domain snapshot

I was root the entire time. Not sure what to do now.

	# we probably can't write the temporary snapshot to the same directory
	# as the original disk, so use the default libvirt images directory
	disk.snapshot_path = os.path.join("/var/lib/libvirt/images", filename)