Home Directory Snapshots

25 Oct 2009 16:12

Your home directory is where the most important data is stored.

But from time to time you just simply rm -Rf ~ and your all precious data is totally out of luck. Backups you say, don't you?

So let's think how do you do backups. cp /home/quake /my/distant/location ? On my 24 GB home directory? It would take hours. cp /home/quake /home/backups/quake/date ? Better, this will take a few minutes, but wait, I have like 120 GB of disk space, which means I can have no more than 5 backups.

What to do with this? There are two posibilities. Either you minimize data to backup to only backup important data (but figuring out what data is important may take some time and be inappropriate) or move to smarter solution, like incremental backups. Or snapshots.

Having brtfs as the filesystem for my home directory, I chose to make a snapshot of it each hour. It takes between 0 and 1 second to complete and uses almost no disk space. Why? Btrfs is copy-on-write filesystems, which means cloning a filesystems is instant as it only makes it available under two locations. And then modifying one of the two makes a real copy of the modified fragment of it and changes that copy.

OK. How to do it.

First create a btrfs filesystem (you'll need a recent kernel and btrfs-utils):

# mkfs.btrfs /dev/sda7

(sda7 is partition for my /home directory)

Then mount it somewhere else than /home, let's use /vol as an example:

# mount /dev/sda7 /vol

Create some volumes on that filesystem: home, quake, snapshots:

# btrfsctl -S home /vol
# btrfsctl -S quake /vol
# btrfsctl -S snapshots /vol

The volumes are accessible as the subdirectories of /vol:

# ls -la /vol
drwx------  1 root  root     36 1970-01-01 01:00 .
drwxr-xr-x 24 root  root   4096 2009-10-25 15:04 ..
drwx------  1 root  root     20 2009-10-25 15:51 home
drwx------  1 root  root  11488 2009-10-25 16:17 quake
drwx------  1 root  root     76 2009-10-25 15:40 snapshots

But you can mount then separately:

# mount /dev/sda7 /home -o subvolume=home
# mkdir /home/quake
# mount /dev/sda7 /home/quake -o subvolume=quake

Fix permissions:

# chown quake:quake /home/quake /vol/quake /vol/snapshots
# chmod 0755 /home/ /home/quake

Now you're ready to do snapshots. Now populate the /home/quake directory:

$ mkdir /home/quake/abcd
$ mkdir /home/quake/dddd
$ mkdir /home/quake/abcd/eeee
$ echo testtest > /home/quake/testfile

Aaaaaand, make snapshots!

$ btrfs -s /vol/snapshots/quake-`date +%Y%m%d-%H%M` /vol/quake

I figured out, that it's quite important to point to /vol/quake and not /home/quake. At first it seams that it's totally the same, but on /home/quake there can be some other filesystems mounted (like .gvfs for GNOME virtual file systems) and /vol/quake contains "pure data". When doing snapshots of /home/quake with filesystems mounted under it, the filesystems freezes for me (btrfs is still experimental, they say). So as noted above, it's better to snapshot pure data directory.

Now, the /vol/snapshots/quake-20091025-1653 (or whatever your date is) and /vol/quake should list the same files and the operation of "cloning" should be just instant no matter how much data you have. But now modifying the contents of /vol/quake should not change anything in /vol/snapshots/quake-20091025-1653 (but of course should in /home/quake).

Also the snapshot doesn't really take any disk space as long as you keep the /vol/quake directory unchanged. Once you change some file from /vol/quake, it needs to really keep two copies of it, so this is when additional space is allocated.

To sum up let's have a table listing possibilities to have the same contents in two directories:

method file copy symbolic link hard link bind-mount btrfs' clone
how cp -a dir1 dir2 ln -s dir1 dir2 ln dir1 dir2 mount -o bind dir1 dir2 btrfs-bcp dir1 dir2
time long instant instant instant instant
takes disk space yes no no no no (only the difference)
points to the same data
(changing file in dir1 changes it in dir2)
no yes yes yes no
notes not on Linux (Mac only?) btrfs-bcp not distributed in Ubuntu's btrfs-tools

More notes on btrfs and snapshotting:

  • What can be done with btrfs-bcp on directories level can be done with snapshots on volumes level (as described in this post)
  • Snapshots in btrfs are not removable yet. You can clear them and reclaim the space taken by saved difference from the starting point. Still a few bytes is taken by having the directory that is not removable. Deleting snapshots is to be implemented in stable version of btrfs.

Comments: 1

0x600DF00D

07 Oct 2009 19:44

Ta notka powstaje z dwóch powodów.

Po pierwsze obiecałem, że napiszę coś na blogasku, jeśli będzie mi się dzisiaj podobać.

Po drugie, muszę oddać hołd mojemu dziewczęciu — Marcie.

Marto, jesteś bardzo dobrze przewidującą osobą. Przewidziałaś dwa wydarzenia w Californication (na dwa strzały — 100% skuteczności).

Poza tym, lepiej znasz kolekcję moich płyt niż ja. Tak, mam w niej płytę Rapid Eye Movement naszego kochanego Riverside'u.

Cieszę się, że jesteśmy razem.

Comments: 2

SSD FS-es Benchmark Results

02 Oct 2009 17:39

As I promised I benchmarked some of the Linux filesystems on my solid state disk.

Introduction

I wanted to benchmark the following filesystems:

  • ext2, ext3
  • ext4
  • xfs
  • reiserfs, reiser4
  • nilfs
  • btrfs
  • zfs (via fuse)

NILFS2 haven't managed to even finish the Bonnie++ test. This means this filesystem is not yet ready to use (but promises very nice features). Other filesystem that has not been benchmarked is reiser4, because the Ubuntu kernel doesn't have support for it. I would need to patch it and I wasn't happy about it.

Images shown here show results of the standard Bonnie++ tests. Command to do them was:

bonnie -d /dir/on/ssd/partition -n 200:200

The -n parameter was tuned so that for each test some values were returned. With default setting I got many values "++++" indicating test was performed so fast, that Bonnie++ was not able to calculate the performance.

Explanation of test names can be found in bonnie++ documentation.

Before each test, filesystem was created on the prepared partition (25 GB) and some (the same for each test) data was copied to it (about 10 GB) to simulate "used" filesystem.

As it appeared I was not able to disable write-caching with running hdparm -W0 /dev/sda. Instead it stated

/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  1 (on)

Possible write-caching is good thing though (and by default enabled), so I have no problem with that.

All tests were run twice, but the results was nearly the same, so I just removed the second results for each filesystem.

For each test, bigger is better with value being thousands operations per second.

The best filesystem

As some suggest, the preferred I/O-scheduler for SSD disk is "noop", which means there's no IO scheduling in kernel, so we rely on scheduling logic in the hardware (which for various reasons is believed to be good in SSD disks) and profit from no software overhead of queuing.

Let's then compare how well filesystems perform with this scheduler chosen:

noop-tests1.png

noop-tests2.png

noop-tests3.png

noop-tests4.png

This benchmark was performed for all filesystems but NILFS2 and reiser4.

Random seeks

When it comes to random seeks (very important for low-latency systems), the best is ext4 with reiserfs and xfs having almost the same result. Btrfs is next (10% slower), then ext3, ext2 and zfs at the end being 6 time worse than the best.

Creation and deletion of files

Ext4 is the fastest in creating files (both sequentially and randomly) while btrfs is the fastest in deleting files, which small exception of ext2 being 7 times faster than everything in sequentially deleting files. On the other hand it's ability to delete files in random fashion is pretty bad. Comparing only btrfs and ext4, both are fast, the difference is about 10% to the one or the other side. Ext3 performs pretty well in this test, reiserfs reaches about half the performance of ext4/btrfs, while xfs and zfs are really slow.

Read/write

Reading and writing of data is pretty equal through filesystems in terms of benchmark results. The worst results has zfs and ext2 (especially in random read, which is vital in modern use of computers).

As per-character reading/writing is not so important, let's concentrate on the rest of tests. As you can see, btrfs is clearly the best, having first place in 3/5 tests being really close to first in the following two. Reiserfs and ext4 perform really well in this area.

Semi-summary

At this point it's clear that when it comes to performance on SSD disk we have two filesystems to consider the best: btrfs and ext4. Reiserfs is slightly worse, but the most mature from them. Xfs and ext3 are just OK, but don't perform equally good in all tests, while ext2 and zfs (on fuse) being not an option at all.

Let's then compare the features of the three:

Limits reiserfs ext4 btrfs
file name 4 KB 256 B 255 B
max file size 8 TB 16 GB to 16 TB
(depends on block size)
16 EB
max volume size 16 TB 1 EB 16 EB
Features reiserfs ext4 btrfs
checksum (error check) no yes yes
snapshots (like time machine) no no yes
mirroring/stripping on FS layer no no yes
compression no no yes

So it seems, btrfs is full of new features compared to (old) reiserfs and (new) ext4 with only small performance penalty in some areas, while even being faster in some.

The rest tests presented only cover ext4, reiserfs and btrfs filesystems.

What is the best scheduler, then?

Having chosen the best filesystems, let's see, which scheduler works best.

ext4

Comparison of schedulers performance for ext4 filesystem:

ext4-tests1.png

ext4-tests2.png

ext4-tests3.png

ext4-tests4.png

As we see, cfq is the best in 5 tests, significantly worse in two (random creation of files and random seeks) and nearly as good as best in the rest. Deadline and noop perform pretty the same (with noop being better at creating files randomly and deadline being better at creating files sequentially).

reiserfs

Schedulers performance for reiserfs filesystem:

reiserfs-tests1.png

reiserfs-tests2.png

reiserfs-tests3.png

reiserfs-tests4.png

For reiserfs, again, cfq does its work really well with only random reads being significantly slower than deadline and noop schedulers.

btrfs

Let's now see what scheduler will be best for btrfs filesystem:

btrfs-tests1.png

btrfs-tests2.png

btrfs-tests3.png

btrfs-tests4.png

This time not cfq, but deadline scheduler is the man! In random seeks, where cfq is generally worse, this time it's worse by about 30% than the best: deadline scheduler. Only in one test deadline is worse than noop, but this is only slight difference. Cfq is only slightly better in 4 tests, but in the rest, deadline is better.

Ultimate comparison of btrfs, reiserfs and ext4

When we know which scheduler will run best with certain filesystems, let's compare Bonnie++ results for the perfect tandems:

  • btrfs with deadline scheduler on underlying disk
  • ext4 with cfq
  • reiserfs with cfq

best-tests1.png

best-tests2.png

best-tests3.png

best-tests4.png

Random seeks

Choosing cfq scheduler on any of the tested filesystems degraded random seek performance. This is why btrfs with deadline scheduler is better than its competitors.

Creation and deletion of files

This time, without ext2 having so high bar, you can see the differences in creating and deleting files on the three filesystems I tested. Btrfs is much (about 2 times) faster than ext4 in deleting files, while ext4 is a bit faster in creating files randomly, and significantly faster (about 50%) in creating files sequentially. Reiserfs is about 2 times slower than the slower filesystem in each test.

Read/write

In read/write tests btrfs, ext4 and reiserfs perform almost equally well, with btrfs being slightly better than the latter two filesystems.

Summary

Ext4 and btrfs filesystems perform really well on SSD disks making the users really happy about the speed they get from normal computer use.

With ext4 being the default filesystem for Ubuntu 9.10, having SSD disk, you'll notice that it boots really fast:

  • from Grub to GDM in 8 seconds
  • from GDM to GNOME in 5 seconds
  • OpenOffice launches the first time in 2-3 seconds (second time is 0 seconds)

With btrfs being equally fast (or even faster) than ext4 it's amazing what features it delivers:

  • snapshots (you can make a snapshot of filesystem and then roll back to it, or just explore historical version files)
  • compression
  • mirroring/stripping — things usually done on block-device level now incorporated into filesystem
  • nice internal structures and algorithms (copy on write, B-trees, …)
  • integrated volume management

On the second hand, btrfs as of version 0.19 has still experimental disk format, which means it can be non-compatible with future kernels, but usually kernel developers create code that either preserve the old format or converts the filesystem to new one on the first mount (and then it's not possible to mount such a partition from an older kernel). Also if I understand well, the biggest changes from 0.18 to 1.0 were just applied in 0.19, so probably this will be the final format of btrfs partitions.

It's clear, that btrfs is the Linux answer to Sun's ZFS, which due to incompatible license can't be incorporated into the kernel (this is why we only have FUSE port available).

Having all this said, it's time for me to migrate my /home (and maybe the system partition too) to btrfs!

You can download raw benchmark data in ODS format.

Comments: 7

Informacja PKS

01 Oct 2009 11:55

Zgłoszone poprzez: http://www.pks.bydgoszcz.pl/kontakt.php?pom=3

Szanowni Państwo,

chciałbym zgłosić reklamację dotyczącą pracy informacji telefonicznej PKS Bydgoszcz.

Dnia 30 września 2009 pomiędzy godziną 23:40 a 23:55 znajdując się na dworcu autobusowym w Bydgoszczy, wykonaliśmy z telefonu komórkowego połączenie do informacji — nr *720-84-00. Zapytanie dotyczyło najwcześniejszego połączenia z dworca w Bydgoszczy do Torunia. Mężczyzna poinformował nas, że najbliższe takie połączenie będzie dopiero po godzinie szóstej dnia następnego, co było nieprawdą. Nabliższe połączenie z owego dworca było bowiem o godzinie 23:55 (wykonywane przez PKS Mława, ze stanowiska nr 12 na dworcu autobusowym w Bydgoszczy), a następne o godzinie 01:40 (do Łodzi przez Toruń, ze stanowiska nr 11 na dworcu autobusowym w Bydgoszczy).

Połączenie telefoniczne kosztowało 5 złotych, o którym to koszcie nie było żadnej informacji na naklejce na szybie na wejściu do budynku dworca. Takiej informacji nie ma różnież na stronie internetowej PKS Bydgoszcz pod adresem: http://www.pks.bydgoszcz.pl/kontakt.php .

Koszty poniesione w wyniku wprowadzenia w błąd, to 5 złotych (czyli koszt telefonu na informację), zszargane nerwy i stanie na chłodzie przez ponad godzinę. Koszty mogły być znacznie większe, gdybyśmy na podstawie przekazanych informacji zdecydowali się nocować w Bydgoszczy lub wynająć inny środek transportu.

Aby sprawę zakończyć polubownie, proponuję zwrot kosztów poniesionych na wykonanie połączenia, oraz wyciągnięcie konsekwencji co do osoby, która wprowadziła nas, czym wpłynęła na znaczne pogorszenie postrzegania przez nas Firmy.

W sprawie zwrotu poniesionych kosztów (podamy numer konta), proszę skontaktować się na podany w formularzu kontaktowym adres e-mail.

Treść zażalenia będzie dostępna również pod adresem: http://piotr.gabryjeluk.pl/dev:informacja-pks do czasu uzyskania odpowiedzi, która będzie objawem zajęcia się tą sprawą.

Z góry dziękuję za pozytywne rozpatrzenie wniosku.

Comments: 2

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License