0x600DF00D

07 Oct 2009 19:44

Ta notka powstaje z dwóch powodów.

Po pierwsze obiecałem, że napiszę coś na blogasku, jeśli będzie mi się dzisiaj podobać.

Po drugie, muszę oddać hołd mojemu dziewczęciu — Marcie.

Marto, jesteś bardzo dobrze przewidującą osobą. Przewidziałaś dwa wydarzenia w Californication (na dwa strzały — 100% skuteczności).

Poza tym, lepiej znasz kolekcję moich płyt niż ja. Tak, mam w niej płytę Rapid Eye Movement naszego kochanego Riverside'u.

Cieszę się, że jesteśmy razem.

Comments: 2

SSD FS-es Benchmark Results

02 Oct 2009 17:39

As I promised I benchmarked some of the Linux filesystems on my solid state disk.

Introduction

I wanted to benchmark the following filesystems:

  • ext2, ext3
  • ext4
  • xfs
  • reiserfs, reiser4
  • nilfs
  • btrfs
  • zfs (via fuse)

NILFS2 haven't managed to even finish the Bonnie++ test. This means this filesystem is not yet ready to use (but promises very nice features). Other filesystem that has not been benchmarked is reiser4, because the Ubuntu kernel doesn't have support for it. I would need to patch it and I wasn't happy about it.

Images shown here show results of the standard Bonnie++ tests. Command to do them was:

bonnie -d /dir/on/ssd/partition -n 200:200

The -n parameter was tuned so that for each test some values were returned. With default setting I got many values "++++" indicating test was performed so fast, that Bonnie++ was not able to calculate the performance.

Explanation of test names can be found in bonnie++ documentation.

Before each test, filesystem was created on the prepared partition (25 GB) and some (the same for each test) data was copied to it (about 10 GB) to simulate "used" filesystem.

As it appeared I was not able to disable write-caching with running hdparm -W0 /dev/sda. Instead it stated

/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  1 (on)

Possible write-caching is good thing though (and by default enabled), so I have no problem with that.

All tests were run twice, but the results was nearly the same, so I just removed the second results for each filesystem.

For each test, bigger is better with value being thousands operations per second.

The best filesystem

As some suggest, the preferred I/O-scheduler for SSD disk is "noop", which means there's no IO scheduling in kernel, so we rely on scheduling logic in the hardware (which for various reasons is believed to be good in SSD disks) and profit from no software overhead of queuing.

Let's then compare how well filesystems perform with this scheduler chosen:

noop-tests1.png

noop-tests2.png

noop-tests3.png

noop-tests4.png

This benchmark was performed for all filesystems but NILFS2 and reiser4.

Random seeks

When it comes to random seeks (very important for low-latency systems), the best is ext4 with reiserfs and xfs having almost the same result. Btrfs is next (10% slower), then ext3, ext2 and zfs at the end being 6 time worse than the best.

Creation and deletion of files

Ext4 is the fastest in creating files (both sequentially and randomly) while btrfs is the fastest in deleting files, which small exception of ext2 being 7 times faster than everything in sequentially deleting files. On the other hand it's ability to delete files in random fashion is pretty bad. Comparing only btrfs and ext4, both are fast, the difference is about 10% to the one or the other side. Ext3 performs pretty well in this test, reiserfs reaches about half the performance of ext4/btrfs, while xfs and zfs are really slow.

Read/write

Reading and writing of data is pretty equal through filesystems in terms of benchmark results. The worst results has zfs and ext2 (especially in random read, which is vital in modern use of computers).

As per-character reading/writing is not so important, let's concentrate on the rest of tests. As you can see, btrfs is clearly the best, having first place in 3/5 tests being really close to first in the following two. Reiserfs and ext4 perform really well in this area.

Semi-summary

At this point it's clear that when it comes to performance on SSD disk we have two filesystems to consider the best: btrfs and ext4. Reiserfs is slightly worse, but the most mature from them. Xfs and ext3 are just OK, but don't perform equally good in all tests, while ext2 and zfs (on fuse) being not an option at all.

Let's then compare the features of the three:

Limits reiserfs ext4 btrfs
file name 4 KB 256 B 255 B
max file size 8 TB 16 GB to 16 TB
(depends on block size)
16 EB
max volume size 16 TB 1 EB 16 EB
Features reiserfs ext4 btrfs
checksum (error check) no yes yes
snapshots (like time machine) no no yes
mirroring/stripping on FS layer no no yes
compression no no yes

So it seems, btrfs is full of new features compared to (old) reiserfs and (new) ext4 with only small performance penalty in some areas, while even being faster in some.

The rest tests presented only cover ext4, reiserfs and btrfs filesystems.

What is the best scheduler, then?

Having chosen the best filesystems, let's see, which scheduler works best.

ext4

Comparison of schedulers performance for ext4 filesystem:

ext4-tests1.png

ext4-tests2.png

ext4-tests3.png

ext4-tests4.png

As we see, cfq is the best in 5 tests, significantly worse in two (random creation of files and random seeks) and nearly as good as best in the rest. Deadline and noop perform pretty the same (with noop being better at creating files randomly and deadline being better at creating files sequentially).

reiserfs

Schedulers performance for reiserfs filesystem:

reiserfs-tests1.png

reiserfs-tests2.png

reiserfs-tests3.png

reiserfs-tests4.png

For reiserfs, again, cfq does its work really well with only random reads being significantly slower than deadline and noop schedulers.

btrfs

Let's now see what scheduler will be best for btrfs filesystem:

btrfs-tests1.png

btrfs-tests2.png

btrfs-tests3.png

btrfs-tests4.png

This time not cfq, but deadline scheduler is the man! In random seeks, where cfq is generally worse, this time it's worse by about 30% than the best: deadline scheduler. Only in one test deadline is worse than noop, but this is only slight difference. Cfq is only slightly better in 4 tests, but in the rest, deadline is better.

Ultimate comparison of btrfs, reiserfs and ext4

When we know which scheduler will run best with certain filesystems, let's compare Bonnie++ results for the perfect tandems:

  • btrfs with deadline scheduler on underlying disk
  • ext4 with cfq
  • reiserfs with cfq

best-tests1.png

best-tests2.png

best-tests3.png

best-tests4.png

Random seeks

Choosing cfq scheduler on any of the tested filesystems degraded random seek performance. This is why btrfs with deadline scheduler is better than its competitors.

Creation and deletion of files

This time, without ext2 having so high bar, you can see the differences in creating and deleting files on the three filesystems I tested. Btrfs is much (about 2 times) faster than ext4 in deleting files, while ext4 is a bit faster in creating files randomly, and significantly faster (about 50%) in creating files sequentially. Reiserfs is about 2 times slower than the slower filesystem in each test.

Read/write

In read/write tests btrfs, ext4 and reiserfs perform almost equally well, with btrfs being slightly better than the latter two filesystems.

Summary

Ext4 and btrfs filesystems perform really well on SSD disks making the users really happy about the speed they get from normal computer use.

With ext4 being the default filesystem for Ubuntu 9.10, having SSD disk, you'll notice that it boots really fast:

  • from Grub to GDM in 8 seconds
  • from GDM to GNOME in 5 seconds
  • OpenOffice launches the first time in 2-3 seconds (second time is 0 seconds)

With btrfs being equally fast (or even faster) than ext4 it's amazing what features it delivers:

  • snapshots (you can make a snapshot of filesystem and then roll back to it, or just explore historical version files)
  • compression
  • mirroring/stripping — things usually done on block-device level now incorporated into filesystem
  • nice internal structures and algorithms (copy on write, B-trees, …)
  • integrated volume management

On the second hand, btrfs as of version 0.19 has still experimental disk format, which means it can be non-compatible with future kernels, but usually kernel developers create code that either preserve the old format or converts the filesystem to new one on the first mount (and then it's not possible to mount such a partition from an older kernel). Also if I understand well, the biggest changes from 0.18 to 1.0 were just applied in 0.19, so probably this will be the final format of btrfs partitions.

It's clear, that btrfs is the Linux answer to Sun's ZFS, which due to incompatible license can't be incorporated into the kernel (this is why we only have FUSE port available).

Having all this said, it's time for me to migrate my /home (and maybe the system partition too) to btrfs!

You can download raw benchmark data in ODS format.

Comments: 7

Informacja PKS

01 Oct 2009 11:55

Zgłoszone poprzez: http://www.pks.bydgoszcz.pl/kontakt.php?pom=3

Szanowni Państwo,

chciałbym zgłosić reklamację dotyczącą pracy informacji telefonicznej PKS Bydgoszcz.

Dnia 30 września 2009 pomiędzy godziną 23:40 a 23:55 znajdując się na dworcu autobusowym w Bydgoszczy, wykonaliśmy z telefonu komórkowego połączenie do informacji — nr *720-84-00. Zapytanie dotyczyło najwcześniejszego połączenia z dworca w Bydgoszczy do Torunia. Mężczyzna poinformował nas, że najbliższe takie połączenie będzie dopiero po godzinie szóstej dnia następnego, co było nieprawdą. Nabliższe połączenie z owego dworca było bowiem o godzinie 23:55 (wykonywane przez PKS Mława, ze stanowiska nr 12 na dworcu autobusowym w Bydgoszczy), a następne o godzinie 01:40 (do Łodzi przez Toruń, ze stanowiska nr 11 na dworcu autobusowym w Bydgoszczy).

Połączenie telefoniczne kosztowało 5 złotych, o którym to koszcie nie było żadnej informacji na naklejce na szybie na wejściu do budynku dworca. Takiej informacji nie ma różnież na stronie internetowej PKS Bydgoszcz pod adresem: http://www.pks.bydgoszcz.pl/kontakt.php .

Koszty poniesione w wyniku wprowadzenia w błąd, to 5 złotych (czyli koszt telefonu na informację), zszargane nerwy i stanie na chłodzie przez ponad godzinę. Koszty mogły być znacznie większe, gdybyśmy na podstawie przekazanych informacji zdecydowali się nocować w Bydgoszczy lub wynająć inny środek transportu.

Aby sprawę zakończyć polubownie, proponuję zwrot kosztów poniesionych na wykonanie połączenia, oraz wyciągnięcie konsekwencji co do osoby, która wprowadziła nas, czym wpłynęła na znaczne pogorszenie postrzegania przez nas Firmy.

W sprawie zwrotu poniesionych kosztów (podamy numer konta), proszę skontaktować się na podany w formularzu kontaktowym adres e-mail.

Treść zażalenia będzie dostępna również pod adresem: http://piotr.gabryjeluk.pl/dev:informacja-pks do czasu uzyskania odpowiedzi, która będzie objawem zajęcia się tą sprawą.

Z góry dziękuję za pozytywne rozpatrzenie wniosku.

Comments: 2

Cleaning Up

17 Sep 2009 16:42

Some of you, following Wikidot code on GitHub may see it's nicely split into templates, php, web and conf directories. But this is the first impression.

Maintaining Wikidot is a bit more complex, because, files uploaded to sites are located in web, side to side with some static Wikidot php and javascript files. Also for historical reasons, there are web/files--common and web/files--local directories, which maps to /common--* and /local--* URLs and in fact, the files--local is never served directly by the web server (need to check permissions first).

Also some time ago, we made static files versioned, so that we can apply more aggressive HTTP caching to them (reducing average page load time) and still be able to fix bugs on them without waiting a few days till the cache expire. In current model, URL to static file contains version hash, this may be for example: http://static.wikidot.com/v--b44e0ce810ee/common--javascript/WIKIDOT.js (notice the b44e0ce810ee). The whole static.wikidot.com is now hosted on Amazon's CloudFront, which means you get static Wikidot files from a server nearby your location and not always from USA.

This all become quite complicated, so we decided to make things really clear and simple in the source code. The primary rule: make the source code (updatable from git) separate from files uploaded by users and generated by Wikidot. Second rule: make files that are automatically generated during installation (not in the runtime) separate from persistent files (like the uploaded by users) and from source code.

And at the end there needs to be some place for logs and a place for temporary data (we need this to generate some random cool stuff, but after generating them, files are deleted).

So we end up with something like this:

  • WIKIDOT_ROOT
    • data/
      • avatars/ — user avatars
      • sites/ — site files (both generated thumbnails and uploaded files)
    • generated/
      • static/ — generated static files. This dir can be server directly by a fast non-PHP webserver for static.wikidot.com in case we don't want CloudFront anymore
    • tmp/ — temporary files including Smarty compiled versions of templates. Content of this dir can be safely removed
    • logs/ — Wikidot logs
    • everything else — comes from git and is unchangeable by application

Application needs write-access to data, tmp and logs. Generated dir needs write access to one installing or upgrading application.

Wikidot persistent data is now ONLY database and data/ directory, so it's easy to backup and restore the application (if you have enough time to make full backup of this).

There is still one exception to this nice schema which is php/db/base directory, which is autogenerated during installation from XML database definition files, but the cleaning is not over, I still work on this.

Nice thing about this work is that it does not need a lot of code changing, because directory paths are usually stored in one (max two) places in application, so this kind of totally reorganizing directory structure does not break things. As such, it is very very worth doing it. In the end we get clean internal structure of files and it's clear which files you can safely remove, which you can restore from git (and thus you can experiment a little on them — in case of crash, just re-download application), which are "state" of the Wikidot and where to look for logs.

This all is also very important, because we aim to make current Wikidot.com source open and as such we want it to be a nice code.

Comments: 0

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License