Piotr Gabryjeluk blog

YAML and PHP

1248980616|%e %B %Y

There are 3 main YAML implementations for PHP:

  • Syck (native C library bindings to PHP)
  • Symphony YAML (pure PHP)
  • Spyc (pure PHP again)

This is the comparison.

What the hell is YAML

Have you heard about XML or JSON? YAML is similarly to JSON and XML a way to store (read and write) structured data like arrays (a.k.a. lists), dictionaries (a.k.a. hash maps) and atomic values like strings and numbers. The structures can be nested, to form a definition of near-real-life objects, for example:

---
Piotr Gabryjeluk:
  company: Wikidot Inc.
  university: Nicolaus Copernicus University, Toruń, Poland
  lives_in: Toruń, Poland
  hobbies:
  - basketball
  - playing the guitar

Which translates to PHP:

<?php
$data = array('Piotr Gabryjeluk' => array(
  'company' => 'Wikidot Inc.',
  'university' => 'Nicolaus Copernicus University, Toruń, Poland',
  'lives_in' => 'Toruń, Poland',
  'hobbies' => array('basketball', 'playing the guitar')
));

So you see YAML is quite nice even when you need to write it yourself.

YAML has its specification (see http://yaml.org), so once we have standard YAML parser and standard YAML dumper we can send arrays from one machine to another and the result should be the same array as was sent.

PHP

So let's see what are the choices if you want to play with YAML in PHP.

Syck

This is the fastest and the most complete YAML dumper and loader library available. This is binding to C library and this is available in PEAR. It is also available as regular package in Ubuntu repository, so install it by simple:

aptitude install php5-syck

In some shared hosting environment this could be a problem, so you need a pure PHP solution.

Spyc

This was the first PHP YAML implementation I saw. It is both dumper and loader and it seemed to work fine, but then I found some bugs, that stopped me from using it as the base and only YAML loader and dumper for Wikidot.

This one has really nice thing, which is nice when you want your users to enter YAML to define things (like we do for forms). It is quite forgiving when it comes to the syntax and ignores things that don't fit and still parses the rest.

Unfortunately as I stated before Spyc dumper so, when you first dump an array and then load it with Spyc you get something different (for example multiple new-lines are treated as one). Not good. Also as a loader it does not fully understand the full YAML specification (which is quite huge BTW).

Symphony YAML

This one is pure-PHP as well, so you don't need special rights, to use it on a PHP-enabled machine.

It's loader does not understand full YAML specification, so for example you can't load documents dumped by Syck. Dumper is good.

Summary

Syck Spyc Symphony YAML
type of library PHP extension pure PHP library pure PHP library
speed fast slow slow
loader: YAML support full bad not bad
loader: if YAML is corrupted exception tries to do its best to load the rest exception
dumper: YAML human-readable more-or-less yes more-or-less if set properly
dumper: YAML conforms to spec yes no yes
loads Syck's dumper output correctly yes no no
loads Symphony's dumper output correctly yes no yes

Verdict: loader

Syck is the winner in loading YAML. If you cannot use Syck, use Symphony YAML. If you need to parse user input (which should be human readable/writable similar to YAML), use Spyc.

Actually, this is nice combination for loading:

<?php
try {
    // if syck is available use it
    if (extension_loaded('syck')) {
        return syck_load($string);
    }
    // if not, use the symfony YAML parser
    $yaml = new sfYamlParser();
    return $yaml->parse($string);
} catch (Exception $e) {
    // if YAML document is not correct,
    return Spyc::YAMLLoadString($string);
}

This way, you have the fastest library used if possible, then the best pure-PHP, and if it fails in a way, that document was badly written (by human being for example), you fall-back to Spyc.

Verdict: dumper

In my opinion Symphony YAML dumper is the best from the three in terms of usability, portability and interoperability, because its output can be read by both itself and Spyc.

However, if you dump YAML often, use (hell faster) Syck for both loading and dumping. The generated YAML won't be readable by Symphony YAML or Spyc, but this is because they don't follow the specification (so not Syck's problem in fact).

Also note, that any valid JSON dumper output is readable by standard YAML 1.2 loaders, because JSON is a subset of YAML 1.2. So if using for data exchange (and not for talking to human) any fast JSON dumper can be used.

Comments: 0, Rating: 0

PHP as FastCGI backend and Lighttpd

1245099569|%e %B %Y

Wikidot + Lighttpd + PHP5

At Wikidot we use PHP5 as FastCGI backend to Lighttpd light-and-fast webserver. It works like this:

  • there are a few hundreds of php5-cgi processes (name is cgi, but they also support FastCGI mode) running and waiting to be used
  • lighttpd (only one needed!) process manages the network connections to all the clients and once the request is ready serves a static file or forwards the request to one of PHP backends processes.

We used to use internal Lighttpd FastCGI process manager, meaning the lighttpd processes actually used to start the PHPs.

Problems

We encountered some known problems of 500 (server side) errors appearing after some random time, especially under a high traffic. The typical message appearing at the Lighttpd's error.log was:

<some date>: (mod_fastcgi.c.2494) unexpected end-of-file (perhaps the fastcgi process died): pid: ...

There are plenty of reports on this in both Lighttpd's and PHP's forums, bug trackers and even some blogs.

Workarounds

We managed to write some hacky scripts that detected the situation and restarted the backends when needed. The reaction was so quick, that almost no-one noticed the error, but damn, this is not how WE solve problems.

A blind try

We decided to give spawn-fcgi a shot. What is it? It is a program that spawns FastCGI backends (independently from Lighttpd server). Why trying it? I've read somewhere, that it works more reliably than the internal Lighttpd spawner. What's interesting is that this program comes from lighttpd package, so we're in family anyway. It's mainly intended to run the FastCGI backends from different user than the webserver user or to run them on different machine(s) than the webserver machine. This can be used naturally for some smart load-balancing.

The only problem of this solution we encountered was internal limit of number of processes to spawn by a single process which was 256 (hardcoded, fixed in next versions). But at the same time, we decided to build a few FastCGI bridges (each spawning ~200 PHPs) anyway so that was no longer a problem for us.

What was quite surprising (but honestly, I deeply believed in this), our problems with 500 server errors and PHP disappeared. This configuration works for about 2 weeks now with absolutely no hacky scripts involved and no restarting needed. Cool.

Why I wrote this

I wrote this short note just for the record and to let other people know, that using spawn-fcgi instead of the internal Lighttpd's FastCGI spawner might solve their problems with PHP (FastCGI) and 500 internal server errors.

Hope this helps someone.

Comments: 1, Rating: 0

O Zend Framework

1236800046|%e %B %Y

Pewien czas temu, mówiłem ciepłe słowa o Zend Framework. Okazuje się, że nie jest tak różowo jak się wydaje. A wyrażeniem kluczowym jest tutaj:

64 bit

Na 64 bitowym systemie, z Zend Framework jest wiele problemów. Wymienię ich kilka:

Zend_Search_Lucene

Już taki prosty kod, uruchamiany na 64-bitowym systemie powoduje nieskończone pętle i przekraczanie limitu pamięci:

<?php
 
require_once("Zend/Search/Lucene.php");
$index = Zend_Search_Lucene::open('/path/to/index');

Oczywiście pierwsze co robimy, żeby korzystać z indeksu, to go otwieramy, więc ten moduł (Zend_Search_Lucene) staje się zupełnie niezdatny do użytku.

Co ciekawe, problem jest zgłoszony na bug-trackerze ZF. Doszedłem co trzeba zrobić, żeby rozwiązać problem, wrzuciłem na bug-trackera gotowego (mniej lub bardziej) diffa, ale nikt się nie przejął ani błędem, ani rozwiązaniem.

Zend_Db

Jednym z ważniejszych elementów zawartych w Zend Framework, jest warstwa dostępu do bazy danych. Niestety na 64 bitowym systemie, framework ma jakieś problemy z ograniczaniem wyników przy użyciu metody limit. Nakazanie wyświetlenia rekordów począwszy od rekordu 0, wygenerowało mi zapytanie, które kończyło się na:

LIMIT 98382101, 20;

Powinno być:

LIMIT 0, 20;

Głupia sprawa. Może to poprawili w nowszej wersji, może nie. Nie zgłębiałem tego.

Zend_XmlRpc_Server

Ostatnio pracując nad Wikidot API natrafiłem na paskudny i ukryty błąd w komponencie serwera XML-RPC Zend Framework.

Wszystko niby działa, ale wołanie przez klienta XML-RPC funkcji system.methodHelp, czy system.methodSignature kończy się błędem niedopasowania rządanej metody to sygnatur znanych metod. Na 32 bitach wszystko działa.

Podsumowanie

Zend Framework może się wydawać fajny (mi się wydawał), ale uważajcie mocno przy przenoszeniu kodu z 32 bitów (np. na laptopie) na 64 bity (np. na serwer). Jest SPORO bugów w tym naprawdę dokuczliwe, związane z Zend_Db.

Comments: 1, Rating: 0

Working On wdLite

1233622455|%e %B %Y

A few days ago I started working on wdLite — a lite version of Wikidot.

The primary aim of this project is to make installation dead simple and server requirements really small.

Server requirements

wdLite should be installable on:

  • Apache with PHP5 (no safe mode or other limitations) and PostgreSQL on Linux boxes

PHP and PostgreSQL should be already configured to work with each other. You should have a PostgreSQL database and "user/password/database"-based access to it. Wikidot will create tables, but won't create a database. You should either create it as root or have root created it for you before (this process might be automatic on webhosting services with PHP/PostgreSQL).

This configuration should include a whole bunch of virtual hosting providers.

Installation

The installation process should be no harder than this:

  • Get a zip or checkout the newest version from repository
  • Upload the directory to the server
  • Adjust directory permissions
  • Go to install.php script with your browser
  • Supply mail and PostgreSQL credentials
  • Choose your wiki name and create users
  • Enjoy your new wiki

What are the differences between "full" Wikidot installation and wdLite

Limitations

  • only one wiki
  • no page revision diffs
  • more limited page size
  • lower security (especially for IE users)
  • works only with Apache
  • some features disabled or non-working
  • memcached disabled
  • karma disabled
  • notifications disabled

Better than full version, because

  • works with Apache
  • works on any HTTP port (not only 80)
  • works within any directory (also in user directory accessible like http://myserver.com/~quake/something/really/deep/wikidot)
  • easier installation with a web interface
  • no root-access needed
  • works well with GMail to send mails from the service
  • easily installable on Ubuntu
  • the easiest method to start developing with Wikidot
  • no need to manually compile additional software

Current work progress

I'm about to pre-release this software, to let you test it.

I have to:

  • create a list of things that need to work before a final 1.0 release.
  • redirect / to /?/
  • create install.php

Things that work already:

  • logging in/out
  • displaying pages
  • editing pages
  • saving pages
  • some basic modules
  • uploading and displaying files
  • navigation (links are rewritten from absolute to relative)

How does it work

The wdLite is based on Wikidot OpenSource. It contains wikidot, index.php and a bunch of helpers scripts. The index.php file is a hacky PHP script that

  • converts URL-s like http://some.server.com/some/url/?/front:page to http://www.some.server.com/front:page
  • tricks Wikidot software to think that the Wikidot domain is some.server.com and the main wiki is www.some.server.com
  • sets a bunch of system variables Wikidot relies on, like $_SERVER['REQUEST_URI'], $_SERVER['QUERY_STRING']
  • runs a proper one from Wikidot scripts
  • … or serves (more like redirects to) a static file
  • catches the script output
  • runs some transformation on caught output (like converting the links from http://www.some-server.com/some:other-page to ?/some:other-page)
  • sends the data back to browser

WARNING, ACHTUNG
The script is more a dirty hack than a version of Wikidot, but this is intentional. We don't want to mantain to many versions of Wikidot. Having this dirty script "only using without modyfing" Wikidot software makes it quite independent from changes in Wikidot. This means the same wdLite script will work for a newer version of Wikidot (=less maintaining work).
WARNING, ACHTUNG

Comments: 2, Rating: 1

page 1 of 3123next »

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License