23 Jun 2009 18:25
TAGS: dev lucene search wikidot
As some of you know, some time ago I worked on a new Wikidot search. It is used for the Search all sites page.
This work was done, because searching the whole Wikidot database was far from being fast. At first we used Google Custom Search Engine to solve the problem, but we wanted to be independent from it. Also we wanted to include search results that are accessible by the person that searches but not by search engines (like private sites).
This worked really good. The only problem was long time spent to display the results. It was like 20 seconds, which was far better than when using previous search, but much worse than Google. This was strange, because from previous tests, we calculated the average search time should be about a second or two.
It started to be clear, when we noticed that only 20-30 searches per day are performed. The index is quite big, and it needs to be cached in RAM to work with sufficient performance.
Today I did more tests under heavy load and it seams Lucene can handle big number of queries. When users search often, the index is partially or even fully cached by the filesystem and searches are really quick!
But our main problem to solve today was slow local search a.k.a. "search this site". Moreover many concurrent queries were degrading database performance (not only for searching), so we decided to enable Lucene for local searches as well.
I must say it works really nice, fast and has a nice set of syntax tricks you can do with it, for example you can search for pages with something in tags. Just search for youtube tags:embed. This would search for pages matching youtube (in tags, title or content) and with embed in tags. If no such pages are found, partial matches are also returned, like: pages matching youtube (but with no embed in tags) or pages with embed in tags, (but not matching youtube).
To sum up, new search is faster, gives more accurate results, saves the database performance (which was the main goal) and allows nicer syntax than the old one.

Hi Piotr:
I haven't delved too deeply yet, but it seems the new search doesn't prioritize well.
With multiple search terms, the results seem to reflect any of the terms, rather than first showing items that contain all of the terms.
youth theater or "youth theater" return results that contain youth or theater, when I want items that contain both.
Hi Scott,
New search should put results with both youth and theater at the very top (but still allow pages having one of them somewhere in the bottom). If you want to explicitly search for youth AND theater try +youth +theater. Plus sign means it is required in the search result.
Piotr Gabryjeluk
visit my blog
http://community.wikidot.com/forum/t-165068/search-module-doesn-t-work
Search engine does not work!
Vir bonus miser vocari, at esse non potest miser.
It still does not work…. :(:(:(:(
Vir bonus miser vocari, at esse non potest miser.
When this will start to work???????
Vir bonus miser vocari, at esse non potest miser.
Hi again,
we have to reindex a lot of pages, this takes some time as Pieter explains. Your sites will be available for search in a few days.
Sorry again for the inconvenience.
Piotr Gabryjeluk
visit my blog
http://community.wikidot.com/forum/t-165068/search-module-doesn-t-work#post-527269
Vir bonus miser vocari, at esse non potest miser.
Post preview:
Close preview