YAML and PHP

30 Jul 2009 19:03

There are 3 main YAML implementations for PHP:

  • Syck (native C library bindings to PHP)
  • Symphony YAML (pure PHP)
  • Spyc (pure PHP again)

This is the comparison.

What the hell is YAML

Have you heard about XML or JSON? YAML is similarly to JSON and XML a way to store (read and write) structured data like arrays (a.k.a. lists), dictionaries (a.k.a. hash maps) and atomic values like strings and numbers. The structures can be nested, to form a definition of near-real-life objects, for example:

---
Piotr Gabryjeluk:
  company: Wikidot Inc.
  university: Nicolaus Copernicus University, Toruń, Poland
  lives_in: Toruń, Poland
  hobbies:
  - basketball
  - playing the guitar

Which translates to PHP:

<?php
$data = array('Piotr Gabryjeluk' => array(
  'company' => 'Wikidot Inc.',
  'university' => 'Nicolaus Copernicus University, Toruń, Poland',
  'lives_in' => 'Toruń, Poland',
  'hobbies' => array('basketball', 'playing the guitar')
));

So you see YAML is quite nice even when you need to write it yourself.

YAML has its specification (see http://yaml.org), so once we have standard YAML parser and standard YAML dumper we can send arrays from one machine to another and the result should be the same array as was sent.

PHP

So let's see what are the choices if you want to play with YAML in PHP.

Syck

This is the fastest and the most complete YAML dumper and loader library available. This is binding to C library and this is available in PEAR. It is also available as regular package in Ubuntu repository, so install it by simple:

aptitude install php5-syck

In some shared hosting environment this could be a problem, so you need a pure PHP solution.

Spyc

This was the first PHP YAML implementation I saw. It is both dumper and loader and it seemed to work fine, but then I found some bugs, that stopped me from using it as the base and only YAML loader and dumper for Wikidot.

This one has really nice thing, which is nice when you want your users to enter YAML to define things (like we do for forms). It is quite forgiving when it comes to the syntax and ignores things that don't fit and still parses the rest.

Unfortunately as I stated before Spyc dumper so, when you first dump an array and then load it with Spyc you get something different (for example multiple new-lines are treated as one). Not good. Also as a loader it does not fully understand the full YAML specification (which is quite huge BTW).

Symphony YAML

This one is pure-PHP as well, so you don't need special rights, to use it on a PHP-enabled machine.

It's loader does not understand full YAML specification, so for example you can't load documents dumped by Syck. Dumper is good.

Summary

Syck Spyc Symphony YAML
type of library PHP extension pure PHP library pure PHP library
speed fast slow slow
loader: YAML support full bad not bad
loader: if YAML is corrupted exception tries to do its best to load the rest exception
dumper: YAML human-readable more-or-less yes more-or-less if set properly
dumper: YAML conforms to spec yes no yes
loads Syck's dumper output correctly yes no no
loads Symphony's dumper output correctly yes no yes

Verdict: loader

Syck is the winner in loading YAML. If you cannot use Syck, use Symphony YAML. If you need to parse user input (which should be human readable/writable similar to YAML), use Spyc.

Actually, this is nice combination for loading:

<?php
try {
    // if syck is available use it
    if (extension_loaded('syck')) {
        return syck_load($string);
    }
    // if not, use the symfony YAML parser
    $yaml = new sfYamlParser();
    return $yaml->parse($string);
} catch (Exception $e) {
    // if YAML document is not correct,
    return Spyc::YAMLLoadString($string);
}

This way, you have the fastest library used if possible, then the best pure-PHP, and if it fails in a way, that document was badly written (by human being for example), you fall-back to Spyc.

Verdict: dumper

In my opinion Symphony YAML dumper is the best from the three in terms of usability, portability and interoperability, because its output can be read by both itself and Spyc.

However, if you dump YAML often, use (hell faster) Syck for both loading and dumping. The generated YAML won't be readable by Symphony YAML or Spyc, but this is because they don't follow the specification (so not Syck's problem in fact).

Also note, that any valid JSON dumper output is readable by standard YAML 1.2 loaders, because JSON is a subset of YAML 1.2. So if using for data exchange (and not for talking to human) any fast JSON dumper can be used.

Comments: 0

July News

28 Jul 2009 15:41

Some of you may be more used to me posting more often, than in last time.
Some of you may wonder why I stopped blogging.

Brussels

Last month was full of adventures. It started 1st of July with me going to Brussels meat our friend to talk about wikipedia-like site about art. We're going to help this man build the most complete site about art using Wikidot software!

BTW, this was my first flight in lifetime. Quite a strange feeling, but generally fine.

Forms

I was working on some nice technical and UI improvements to Wikidot, that is crucial for the art site (but really really nice for Wikidot as well, like forms for editing, entering and viewing structured data to wiki pages).

Search issues

That week was also spent on some massive Wikidot.com search engine tweaks. A stupid one-line bug, which was not exporting proper LC_ALL environmental variable in indexing script, caused many sites that used Asian or East-European languages to be not indexed (most notably the great ИСТОРИЈСКА БИБЛИОТЕКА). At first we though that we can re-index the broken sites, but our re-indexing mechanism was way too slow (would last for weeks for all broken sites).

Pieter then challenged me. He said he can index whole Wikidot in 6 hours. I thought it's not even possible, but then I started to work on that and I managed to index the whole Wikidot in less than 2 hours without indexing tags at first. Then with tags, it took 2 hours and 10 minutes or so. That was damn fast!

Inspired by this and an accident of disk full error on /var partition of our webserver (but this is why we keep user-uploaded files and other important things on separate disks), I also rewrote the incremental indexer, to work in similar way to the whole-Wikidot-re-indexer.

search-api reindex

If you care about some technical details:

  • all search operations are issued with use of search-api, a separate program that can:
    • re-index whole Wikidot
    • queue indexing page/thread
    • queue deleting page/thread
    • queue re-indexing site
    • flush queue
  • search-api is written in Python
  • search-api uses PyLucene - a native Java Lucene library binded to CPython objects with PyJCC. Compiled with GNU Java Compiler to native code (like C programs), this binding has improved performance over using Lucene with Sun's Java.
  • before rewriting it to only-Python, search-api was written in BASH and was a wrapper to:
    • java -jar searchApiHelper.jar search "phrase-to-search"
    • php search-api-helper.php flush
  • search-api also takes care of file locking to assure that
    • only one process tries to modify the index
    • items are added to queue one-after-another
    • when doing some big index modification (read: full re-index) queue is not flushed (so that after the re-index all changes are applied to new index)
    • when flushing queue takes more time, and cron tries to run more flushing processes, they simply end (so only one process flushes the queue at a time)

Union of Rock Festival

Just after week spent in Brussels in nice hotel I went to Węgorzewo, Mazury (Poland biggest lakes distinct) to have fun on rock music festival. Unfortunately, the music level was not very impressive, so I mainly enjoyed the atmosphere on the camping area.

The weather was not great. It was wet everywhere, the ground was covered in 20 centimeters of mud and it was hard to walk around without getting dirty. But during the first day of being there, I learned to do that.

Improved workflow at Wikidot

Some of you noticed, that recently we started to work more efficiently, but this is not quite true. In fact we work as efficiently as before, but we are better organized, and have better priorities on tasks. Also we keep track of what we do, so we can then tell what we've done. So for us, this is a little more work of "documenting" our work (so maybe we work even less efficiently than before?), but for the outside world, we make more noise (in a positive meaning) around that. So basically, people know what we do, what we are going to do, when they can expect changes and most importantly, they understand why some feature request is being postponed. This is (and was) because we have more important things to do, but before they couldn't tell it.

SquarkSquark turned into a professional project manager, that manages our time. pieterhpieterh decided to talk to the Community and listen to their complaints (he reads or at least skims every post on Community forums). He tells Łukasz what needs to be done, Łukasz knows when we will have time to do this. This way communication inside Wikidot improved. Also we (michal-frackowiakmichal-frackowiak and me) no longer look on Community forums (some of you may regret), but this allows us to concentrate on our work.

The work continues

As I mentioned before, we want to introduce a great feature to Wikidot, which is forms. But the implementation now concentrates on the open source version of Wikidot software (once it's ready, working and tested we'll copy the feature to the Wikidot.com service).

aptitude install wikidot

As forms is a huge change, I started to prepare a good ground for it and closed most important bugs in Wikidot open source and I'm about to start making Ubuntu packages for it to allow even-simpler installation on Debian-based systems. Now the installation involves only 6 child-easy steps and in fact can be done by copying&pasting a few commands.

Yesterday's party

Yesterday I went to met some old-school-times friends in the heart of the city. It was meant to be a meeting for "a beer or two" but evolved into beer and dancing till morning. That was first time I get a morning bus (not even the first) to my home just after partying.

It was such a great fun and great folks I met.

Summary

I hope with this long blog post (but divided into friendly sections ;) ) I recompensed long period of not-posting anything here.

Comments: 0

Black Clouds & Silver Linings

04 Jul 2009 14:38

Dostałem ten krążek niespodziewanie. Chciałem trochę się przygotować na ten moment, ale to było zaskoczenie i miły gest ze strony najbliższej osoby. "Żebyś miał co słuchać w czasie podróży".

Zgrałem sobie piosenki na komputer i na oggówkę1 i zacząłem słuchać na dzień przed podróżą w łóżku przed snem.

Dobre ciężkie riffy, melodyczne fragmenty i nagle, błeee, co to? Jakiś okropny wokal, bez emocji i wyrazu, za chwilę dźwięk klasycznej gitary. Totalne dno. Masakra.

Powracam do krążka w czasie podróży. Nie mogę wytrzymać, zmieniam utwory. W końcu słucham innego albumu.

W pracy słucham tej muzyki w tle. Wychwytuję ciekawe fragmenty. 5 minut, 10 minut i znowu jakieś dno.

Jeśli chodzi o krążek, jego poziom jest równy — każda piosenka jest równie beznadziejna. Jeśli chodzi o piosenki: praktycznie każda jest nierówna. Zaczyna się fajnie, mięsiście, męsko, mocno, metalowo, z wyrazem i potem ni stąd ni zowąd pojawia się jakiś popowy szajs.

Żal mi tych dobrych fragmentów porozsiewanych po sześciu piosenkach, bo wciąż Dream Theater to dobry zespół i te powiedzmy 20 minut mogłoby się złożyć na jedną dobrą piosenkę, dla której warto byłoby kupić album (podobnie jak utwór A Change of Seasons sprawia, że warto kupić album A Change of Seasons). A w związku z tym fatalnym przemieszaniem "good shit" with "just shit" ta płyta nie jest warta zakupu.

Dream Theater: duży minus dla Was. Mam nadzieję, że na koncercie nie będziecie próbować promować tej płyty, bo to nie ma żadnego sensu.

PS: duży minus również dla sekcji wokalnej. Nie odnalazłem na całym albumie ani minuty ciekawego, żywego i ekspresyjnego wokalu, jakiego pełna jest każda inna płyta zespołu.

Comments: 1

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License