Automated incremental off-site database backups using ZFS send
Something that I've fought with for a long time is how to make good nightly off-site backups of MySQL and PostgreSQL. Having recently switched back to my favourite *nix for server use, FreeBSD, I am now using ZFS on my server. (Boy is it fast. But that's a post for some other time.)
One of the advantages to using ZFS is that it supports the notion of serialising/deserializing file-systems or snapshots so you can transport them elsewhere and restore them, possibly on the same system to a new pool. One such use is remote backups of databases.
In the past, I've tried nightly full dumps using mysqldump or pg_dump and then rsyncing them along with the rest of my data for nightly backups. This meant that I had an ever-growing quantity of data to ship. Other ideas I had included using diff to generate incremental deltas which could be sequentially patched to generate the latest database dump. This approach could work well, and would only require two full dumps to be kept around (today's and yesterday's)... but I never got around to it. And it would still require periodic full dumps to be copied over, since you really don't want to rely on trusting that many patches will apply cleanly. Plus, depending on your workload, the diffs could be as big as or bigger than the full dump, since diff is line-based. Any modification to one column in every row of a big database would result in a huge delta - for example, adding a new column.
Back to ZFS. My idea is simple: Use Ralf S. Engelschall's snapshot tools for FreeBSD, to keep about a week's worth of nightly snapshots (plus some weekly ones), and send only the nightly ones to a remote site every night. The trick with remote backups is that there's a chance that a night will be missed. ZFS doesn't do all the necessary magic to determine if snapshots were renamed and to determine which ones aren't yet applied.
That's where my simple little sync_remote_snapshots.pl script comes in. This little script is intended to run from the receiver ("local"). It uses ssh as the transport, something you'll have to setup to work without passphrases while the script is running (ideally using ssh-agent).
The script starts by listing the local and remote zfs snapshots (using zfs list -t snapshot -r zpool/fsname), parses the output, then matches up the local and remote snapshots by timestamp. This is done because the snapshot numbers won't match up between the two machines after the next snapshot is run. This is because the freebsd snapshot script renumbers the snapshots such that @daily.0 is the lowest numbered snapshot for a daily snapshot. If two days go by, then @daily.0 on the local end would be for two days prior.
In order to be able to run the incremental backup, we need to first rename all the local snapshots just like the remote side does, purge any old local snapshots we don't want any more, and then actually do the backup.
This script takes care of all that. It also handles the case where the snapshot failed mid-way (loss of network, out of space, etc), leaving it such that the local snapshot numbers don't start at zero.
There's definitely situations it doesn't cover, but it does what I need and uses way less bandwidth than rsyncing compressed nightly dumps.
Trackbacks
Use the following link to trackback from your own site:
http://blog.royhooper.ca/trackbacks?article_id=75
