tags: brussels dev party python search union-of-rock wikidot
28 Jul 2009 15:41
Some of you may be more used to me posting more often, than in last time.
Some of you may wonder why I stopped blogging.
Last month was full of adventures. It started 1st of July with me going to Brussels meat our friend to talk about wikipedia-like site about art. We're going to help this man build the most complete site about art using Wikidot software!
BTW, this was my first flight in lifetime. Quite a strange feeling, but generally fine.
I was working on some nice technical and UI improvements to Wikidot, that is crucial for the art site (but really really nice for Wikidot as well, like forms for editing, entering and viewing structured data to wiki pages).
That week was also spent on some massive Wikidot.com search engine tweaks. A stupid one-line bug, which was not exporting proper LC_ALL environmental variable in indexing script, caused many sites that used Asian or East-European languages to be not indexed (most notably the great ИСТОРИЈСКА БИБЛИОТЕКА). At first we though that we can re-index the broken sites, but our re-indexing mechanism was way too slow (would last for weeks for all broken sites).
Pieter then challenged me. He said he can index whole Wikidot in 6 hours. I thought it's not even possible, but then I started to work on that and I managed to index the whole Wikidot in less than 2 hours without indexing tags at first. Then with tags, it took 2 hours and 10 minutes or so. That was damn fast!
Inspired by this and an accident of disk full error on /var partition of our webserver (but this is why we keep user-uploaded files and other important things on separate disks), I also rewrote the incremental indexer, to work in similar way to the whole-Wikidot-re-indexer.
If you care about some technical details:
- all search operations are issued with use of search-api, a separate program that can:
- re-index whole Wikidot
- queue indexing page/thread
- queue deleting page/thread
- queue re-indexing site
- flush queue
- search-api is written in Python
- search-api uses PyLucene - a native Java Lucene library binded to CPython objects with PyJCC. Compiled with GNU Java Compiler to native code (like C programs), this binding has improved performance over using Lucene with Sun's Java.
- before rewriting it to only-Python, search-api was written in BASH and was a wrapper to:
- java -jar searchApiHelper.jar search "phrase-to-search"
- php search-api-helper.php flush
- search-api also takes care of file locking to assure that
- only one process tries to modify the index
- items are added to queue one-after-another
- when doing some big index modification (read: full re-index) queue is not flushed (so that after the re-index all changes are applied to new index)
- when flushing queue takes more time, and cron tries to run more flushing processes, they simply end (so only one process flushes the queue at a time)
Union of Rock Festival
Just after week spent in Brussels in nice hotel I went to Węgorzewo, Mazury (Poland biggest lakes distinct) to have fun on rock music festival. Unfortunately, the music level was not very impressive, so I mainly enjoyed the atmosphere on the camping area.
The weather was not great. It was wet everywhere, the ground was covered in 20 centimeters of mud and it was hard to walk around without getting dirty. But during the first day of being there, I learned to do that.
Improved workflow at Wikidot
Some of you noticed, that recently we started to work more efficiently, but this is not quite true. In fact we work as efficiently as before, but we are better organized, and have better priorities on tasks. Also we keep track of what we do, so we can then tell what we've done. So for us, this is a little more work of "documenting" our work (so maybe we work even less efficiently than before?), but for the outside world, we make more noise (in a positive meaning) around that. So basically, people know what we do, what we are going to do, when they can expect changes and most importantly, they understand why some feature request is being postponed. This is (and was) because we have more important things to do, but before they couldn't tell it.
Squark turned into a professional project manager, that manages our time. pieterh decided to talk to the Community and listen to their complaints (he reads or at least skims every post on Community forums). He tells Łukasz what needs to be done, Łukasz knows when we will have time to do this. This way communication inside Wikidot improved. Also we (michal-frackowiak and me) no longer look on Community forums (some of you may regret), but this allows us to concentrate on our work.
The work continues
As I mentioned before, we want to introduce a great feature to Wikidot, which is forms. But the implementation now concentrates on the open source version of Wikidot software (once it's ready, working and tested we'll copy the feature to the Wikidot.com service).
aptitude install wikidot
As forms is a huge change, I started to prepare a good ground for it and closed most important bugs in Wikidot open source and I'm about to start making Ubuntu packages for it to allow even-simpler installation on Debian-based systems. Now the installation involves only 6 child-easy steps and in fact can be done by copying&pasting a few commands.
Yesterday I went to met some old-school-times friends in the heart of the city. It was meant to be a meeting for "a beer or two" but evolved into beer and dancing till morning. That was first time I get a morning bus (not even the first) to my home just after partying.
It was such a great fun and great folks I met.
I hope with this long blog post (but divided into friendly sections ;) ) I recompensed long period of not-posting anything here.