Home Directory Snapshots

25 Oct 2009 16:12

Your home directory is where the most important data is stored.

But from time to time you just simply rm -Rf ~ and your all precious data is totally out of luck. Backups you say, don't you?

So let's think how do you do backups. cp /home/quake /my/distant/location ? On my 24 GB home directory? It would take hours. cp /home/quake /home/backups/quake/date ? Better, this will take a few minutes, but wait, I have like 120 GB of disk space, which means I can have no more than 5 backups.

What to do with this? There are two posibilities. Either you minimize data to backup to only backup important data (but figuring out what data is important may take some time and be inappropriate) or move to smarter solution, like incremental backups. Or snapshots.

Having brtfs as the filesystem for my home directory, I chose to make a snapshot of it each hour. It takes between 0 and 1 second to complete and uses almost no disk space. Why? Btrfs is copy-on-write filesystems, which means cloning a filesystems is instant as it only makes it available under two locations. And then modifying one of the two makes a real copy of the modified fragment of it and changes that copy.

OK. How to do it.

First create a btrfs filesystem (you'll need a recent kernel and btrfs-utils):

# mkfs.btrfs /dev/sda7

(sda7 is partition for my /home directory)

Then mount it somewhere else than /home, let's use /vol as an example:

# mount /dev/sda7 /vol

Create some volumes on that filesystem: home, quake, snapshots:

# btrfsctl -S home /vol
# btrfsctl -S quake /vol
# btrfsctl -S snapshots /vol

The volumes are accessible as the subdirectories of /vol:

# ls -la /vol
drwx------  1 root  root     36 1970-01-01 01:00 .
drwxr-xr-x 24 root  root   4096 2009-10-25 15:04 ..
drwx------  1 root  root     20 2009-10-25 15:51 home
drwx------  1 root  root  11488 2009-10-25 16:17 quake
drwx------  1 root  root     76 2009-10-25 15:40 snapshots

But you can mount then separately:

# mount /dev/sda7 /home -o subvolume=home
# mkdir /home/quake
# mount /dev/sda7 /home/quake -o subvolume=quake

Fix permissions:

# chown quake:quake /home/quake /vol/quake /vol/snapshots
# chmod 0755 /home/ /home/quake

Now you're ready to do snapshots. Now populate the /home/quake directory:

$ mkdir /home/quake/abcd
$ mkdir /home/quake/dddd
$ mkdir /home/quake/abcd/eeee
$ echo testtest > /home/quake/testfile

Aaaaaand, make snapshots!

$ btrfs -s /vol/snapshots/quake-`date +%Y%m%d-%H%M` /vol/quake

I figured out, that it's quite important to point to /vol/quake and not /home/quake. At first it seams that it's totally the same, but on /home/quake there can be some other filesystems mounted (like .gvfs for GNOME virtual file systems) and /vol/quake contains "pure data". When doing snapshots of /home/quake with filesystems mounted under it, the filesystems freezes for me (btrfs is still experimental, they say). So as noted above, it's better to snapshot pure data directory.

Now, the /vol/snapshots/quake-20091025-1653 (or whatever your date is) and /vol/quake should list the same files and the operation of "cloning" should be just instant no matter how much data you have. But now modifying the contents of /vol/quake should not change anything in /vol/snapshots/quake-20091025-1653 (but of course should in /home/quake).

Also the snapshot doesn't really take any disk space as long as you keep the /vol/quake directory unchanged. Once you change some file from /vol/quake, it needs to really keep two copies of it, so this is when additional space is allocated.

To sum up let's have a table listing possibilities to have the same contents in two directories:

method file copy symbolic link hard link bind-mount btrfs' clone
how cp -a dir1 dir2 ln -s dir1 dir2 ln dir1 dir2 mount -o bind dir1 dir2 btrfs-bcp dir1 dir2
time long instant instant instant instant
takes disk space yes no no no no (only the difference)
points to the same data
(changing file in dir1 changes it in dir2)
no yes yes yes no
notes not on Linux (Mac only?) btrfs-bcp not distributed in Ubuntu's btrfs-tools

More notes on btrfs and snapshotting:

  • What can be done with btrfs-bcp on directories level can be done with snapshots on volumes level (as described in this post)
  • Snapshots in btrfs are not removable yet. You can clear them and reclaim the space taken by saved difference from the starting point. Still a few bytes is taken by having the directory that is not removable. Deleting snapshots is to be implemented in stable version of btrfs.

More posts on this topic


