25 Oct 2009 16:12
TAGS: backup btrfs linux snapshot
Your home directory is where the most important data is stored.
But from time to time you just simply rm -Rf ~ and your all precious data is totally out of luck. Backups you say, don't you?
So let's think how do you do backups. cp /home/quake /my/distant/location ? On my 24 GB home directory? It would take hours. cp /home/quake /home/backups/quake/date ? Better, this will take a few minutes, but wait, I have like 120 GB of disk space, which means I can have no more than 5 backups.
What to do with this? There are two posibilities. Either you minimize data to backup to only backup important data (but figuring out what data is important may take some time and be inappropriate) or move to smarter solution, like incremental backups. Or snapshots.
Having brtfs as the filesystem for my home directory, I chose to make a snapshot of it each hour. It takes between 0 and 1 second to complete and uses almost no disk space. Why? Btrfs is copy-on-write filesystems, which means cloning a filesystems is instant as it only makes it available under two locations. And then modifying one of the two makes a real copy of the modified fragment of it and changes that copy.
OK. How to do it.
First create a btrfs filesystem (you'll need a recent kernel and btrfs-utils):
# mkfs.btrfs /dev/sda7
(sda7 is partition for my /home directory)
Then mount it somewhere else than /home, let's use /vol as an example:
# mount /dev/sda7 /vol
Create some volumes on that filesystem: home, quake, snapshots:
# btrfsctl -S home /vol # btrfsctl -S quake /vol # btrfsctl -S snapshots /vol
The volumes are accessible as the subdirectories of /vol:
# ls -la /vol drwx------ 1 root root 36 1970-01-01 01:00 . drwxr-xr-x 24 root root 4096 2009-10-25 15:04 .. drwx------ 1 root root 20 2009-10-25 15:51 home drwx------ 1 root root 11488 2009-10-25 16:17 quake drwx------ 1 root root 76 2009-10-25 15:40 snapshots
But you can mount then separately:
# mount /dev/sda7 /home -o subvolume=home # mkdir /home/quake # mount /dev/sda7 /home/quake -o subvolume=quake
# chown quake:quake /home/quake /vol/quake /vol/snapshots # chmod 0755 /home/ /home/quake
Now you're ready to do snapshots. Now populate the /home/quake directory:
$ mkdir /home/quake/abcd $ mkdir /home/quake/dddd $ mkdir /home/quake/abcd/eeee $ echo testtest > /home/quake/testfile
Aaaaaand, make snapshots!
$ btrfs -s /vol/snapshots/quake-`date +%Y%m%d-%H%M` /vol/quake
I figured out, that it's quite important to point to /vol/quake and not /home/quake. At first it seams that it's totally the same, but on /home/quake there can be some other filesystems mounted (like .gvfs for GNOME virtual file systems) and /vol/quake contains "pure data". When doing snapshots of /home/quake with filesystems mounted under it, the filesystems freezes for me (btrfs is still experimental, they say). So as noted above, it's better to snapshot pure data directory.
Now, the /vol/snapshots/quake-20091025-1653 (or whatever your date is) and /vol/quake should list the same files and the operation of "cloning" should be just instant no matter how much data you have. But now modifying the contents of /vol/quake should not change anything in /vol/snapshots/quake-20091025-1653 (but of course should in /home/quake).
Also the snapshot doesn't really take any disk space as long as you keep the /vol/quake directory unchanged. Once you change some file from /vol/quake, it needs to really keep two copies of it, so this is when additional space is allocated.
To sum up let's have a table listing possibilities to have the same contents in two directories:
|method||file copy||symbolic link||hard link||bind-mount||btrfs' clone|
|how||cp -a dir1 dir2||ln -s dir1 dir2||ln dir1 dir2||mount -o bind dir1 dir2||btrfs-bcp dir1 dir2|
|takes disk space||yes||no||no||no||no (only the difference)|
|points to the same data
(changing file in dir1 changes it in dir2)
|notes||not on Linux (Mac only?)||btrfs-bcp not distributed in Ubuntu's btrfs-tools|
More notes on btrfs and snapshotting:
- What can be done with btrfs-bcp on directories level can be done with snapshots on volumes level (as described in this post)
- Snapshots in btrfs are not removable yet. You can clear them and reclaim the space taken by saved difference from the starting point. Still a few bytes is taken by having the directory that is not removable. Deleting snapshots is to be implemented in stable version of btrfs.