Wikidot API
tags: dev php python wikidot xmlrpc
22 Jan 2009 18:57
A few days ago I started working on Wikidot API. The API will be a standardized way to access the Wikidot.com service in a programmable way (i.e. not using a browser) to retrieve, create and update information stored on Wikidot, including site browsing, page editing and commenting.
In simple words this will allow people to write applications that connect to Wikidot.com and perform some actions for the user that runs the application.
Technically, the Wikidot.com API is an XML-RPC service exporting methods from a few especially designed classes.
To connect to an XML-RPC service, you must know its endpoint, which is a regular URL (http:// or https://) address. We decided to use HTTPS to secure the channel from the very start.
The operations we are going to support are:
Browse
- site.categories
- site.pages
- page.get
Above ones are already implemented. Using the API calls you get retrieve almost all data you stored on the Wikidot.com sites!
Modify
- page.save
This will be the basic method to update the content on your site. We plan several other methods, but this is the one that is the most important.
Comments
- page.comments
- page.comment
They will be used to get and post comments on a given page. Using reply_to parameter, there is a possibility to reply to a particular comment.
Forum
- forum.groups
- forum.categories
- forum.threads
- forum.post
This bunch of methods are going to give you full access to the forums you have started on Wikidot.
How to use the API
We haven't yet enabled the API access to the main Wikidot.com server, but testing the API with Python XML-RPC library is as easy as this:
>>> from xmlrpclib import ServerProxy >>> s = ServerProxy('SOME-URL') >>> s.system.listMethods() ['system.listMethods', 'system.methodHelp', 'system.methodSignature', 'system.multicall', 'site.pages', 'site.categories', 'page.get'] >>> print s.system.methodHelp('site.pages') Get pages from a site Argument array keys: site: site to get pages from category: category to get pages from (optional) >>> s.site.categories({'site': 'gamemaker'}) ['project', 'rpg', '_default', 'action', 'admin', 'badge', 'beginner', 'contests', 'error', 'event', 'example', 'forum', 'gamemaker', 'gml', 'gmlcode', 'gmupload', 'help', 'ide', 'include', 'mamber', 'member', 'nav', 'portal', 'resource', 'search', 'system', 'talk', 'template', 'tutorial', 'video-tutorial', 'wiki', 'challenge', 'howto', 'recent-changes', 'scratch-pad', 'helpdesk', 'helprequest', 'default', 'testimonial', 'testimonials'] >>> [p['name'] for p in s.site.pages({'site': 'gamemaker', 'category': 'badge'})] ['cool', 'flux', 'c-team', 'gml', 'member', 'madman', 'not-a-noob', 'f-madman', 'start', 'break-it'] >>> print s.page.get({'site': 'gamemaker', 'page': 'badge:cool'})['source'] [[table style="width:98%;margin-right:auto;margin-left:auto;margin-bottom:1%;"]][[row]][[cell style="width:360px;"]] [[div class="error-block"]] Included page "include:cool-badge" does not exist ([/include:cool-badge/edit/true create it now]) [[/div]] The Cool Badge is given to people who make something really cool. [[/cell]] [[cell style="vertical-align:top;border:1px solid #ddd;padding:1%;"]] +++ Display Code [[table class="code"]][[row]][[cell]] @@[[include include:cool-badge member=member name]]@@ [[/cell]][[/row]][[/table]] +++ Tag {{cool-badge}} [[/cell]][[/row]][[/table]] [[table style="border:1px solid #ddd;padding:1%;margin-right:auto;margin-left:auto;margin-bottom:1%;width:98%;"]][[row]][[cell]] ++ Earn It * Program something really cool using gml * Make a really cool game +++ Tips * Make sure you post your examples and games on the [[[forum:start|forum]]]. Otherwise no one can see it and nominate you for the cool badge. [[/cell]][[/row]][[/table]] [[table style="border:1px solid #ddd;padding:1%;margin-right:auto;margin-left:auto; width:98%;"]][[row]][[cell]] ++ Members Who Have Earned the Cool Badge [[module ListPages category="member" order="titleAsc" tag="cool-badge" perPage="100" separate="false"]] * %%linked_title%% [[/module]] [[/cell]][[/row]][[/table]] >>>
A few words of explanation:
- first we import ServerProxy class from XML-RPC library,
- then we construct the ServerProxy object s supplying the endpoint URL (SOME-URL in this case, as we don't have yet decided what the URL is going to be)
- we can see a list of methods by calling system.listMethods on the ServerProxy object
- we get a help message for a method by calling system.methodHelp
- then we get categories of site gamemaker (yeah, it's a part of the wikicomplete.info)
- then we call site.pages method (specifying site and category parameters), but instead of displaying the whole list of structures that describe pages, we only display their names
- calling page.get returns an array with the information about a page, including:
- wiki source, array key: source
- generated HTML, array key: html
- array with various meta-data, array key: meta
- we call page.get passing as the argument array that specifies site and page name, get the page object, but displays only what's stored under the source array key
As you see playing with this is really easy as is browsing the available methods and using them.
Why XML-RPC
We've chosen this protocol because it is an easy way to develop both server and client in almost any programming language. Also it gives some flexibility in passed arguments and return values.
We use struct XML-RPC type as the argument and return value type, which is mapped to associative array or dictionary in client (and server) libraries. Each API method gets a bunch of required and optional parameters, that are basically values stored in the struct passed to API methods.
For example site.pages gets a struct with the following keys:
- site (site name to get pages from) — required
- category (category to get pages from) — optional
This means, you have to create an associative array (when using PHP) or a dictionary (using Python) and pass it as the method argument:
# PHP
$pages = $server->site->pages(array("site" => "my-site", "category" => "my-category"));
# Python pages = server.site.pages({"site": "my-site", "category": "my-category"})
Using other programming languages, you'll end with something similar. You can almost always create the array/dictionary in-place, so having this convention is not a big deal.
Applications
I'm working on a filesystem based access to Wikidot site (using FUSE and Python).
We plan having a Wikidot application for iPhone.
A save-it-directly-on-wikidot plugin would be a nice thing for various text editors (and probably other applications).
And probably there are billions of other ways to use this API we're not even aware of. If you have any, feel free to leave a comment.
Comments: 2
Hand-made jewelery
tags: fiancee hand-made jewelry wire-wrapping
19 Jan 2009 14:33
I would like to show you a few pieces of jewelery my fiancée has made.
Enjoy!
We're looking for your comments!
Comments: 6
Bridging Python And PHP
tags: dev php python xmlrpc
11 Jan 2009 10:48
Imagine you have a PHP-based application (like Wikidot). Now, you want to extend it using Python. Through all ways to do it, I'll show you how to achieve this using XML-RPC protocol.
Background
XML-RPC is a client-server protocol for remote procedure call.
On server this works like getting a bunch of functions from your application and exporting it with HTTP.
On client this works like connecting to a XML-RPC server, finding out what function it delivers and constructing a so called server proxy — an object having a method for every function exported by an XML-RPC server.
Calling the methods of the server proxy connects to the server using HTTP, passes arguments and transport the result back to the client. So basically this works AS you have a remote located object locally available.
The data encoding between client and server is defined in XML-RPC specification and is a language based on XML (but you actually never touch it, the XML is converted to objects by libraries).
Overview
We want to run an XML-RPC server exposing a class in PHP and an XML-RPC client in Python to communicate with the XML-RPC server.
Traditionally we would need to have an HTTP server for the PHP XML-RPC server, because HTTP is used as the XML-RPC transport. But digging a bit into the specification, you'll discover, that none HTTP-specific parts of the protocol are used. It's just used as a line to transport the XML data.
So you may wonder if it's possible to use XML-RPC with transport other than HTTP. In short, yes. But you may need to hack around the XML-RPC libraries (because they usually suppose you'll want to use HTTP).
PHP XML-RPC server
First, you need some class, that you want to expose with PHP XML-RPC:
class MyClass { /** * @param string $input * @return string */ public function repeat($input) { return $input; } }
Notice I've set the parameter and return type in phpdoc.
Now let's expose this class with Zend Framework XML-RPC implementation.
You need to download Zend Framework first, let's say to /path/to/zf directory.
class MyClass { /** * @param string $input * @return string */ public function repeat($input) { return $input; } } set_include_path(get_include_path() . PATH_SEPARATOR . 'zf/library'); require_once "Zend/XmlRpc/Server.php"; $server = new Zend_XmlRpc_Server(); $server->setClass('MyClass', 'myclass'); echo $server->handle();
Set_include_path line adds the /path/to/zf/library directory to PHP path, so you can import the Zend_XmlRpc_Server class (located in /path/to/zf/library/Zend/XmlRpc/Server.php file).
Then there is an instance of Zend_XmlRpc_Server created, then there is MyClass attached as the class for myclass XMLRPC namespace. This means the repeat method is to be called via the XML-RPC as myclass.repeat.
If you place the file on your server and have it under some URL, for example:
http://your-server.com/myclass.php
This URL is fully valid XML-RPC server endpoint for XML-RPC clients.
Python client
Having the XML-RPC server running we can connect to it from any XML-RPC enabled library in any programming language around.
In Python, to call the remote procedure myclass.repeat on the XML-RPC endpoint http://your-server.com/myclass.php, you would do the following:
from xmlrpclib import ServerProxy server = ServerProxy('http://your-server.com/myclass.php') print server.myclass.repeat('Hello RPC service')
Running this code:
# python xmlrpc-test.py
gives you:
Hello RPC service
Under the hood:
- Python script makes a connection to http://your-server.com/myclass.php
- your webserver runs the myclass.php script
- the $server->handle() line processes the data received
- chooses a class and a method to run (this would be MyClass and repeat)
- passes the arguments (a string 'Hello RPC service') to the method
- gets the return value
- passes it back to the client wrapped in XML-RPC protocol
- the $server->handle() line processes the data received
- your webserver runs the myclass.php script
- Python gets XML reply and converts it back to simple string ('Hello RPC service')
- and prints it on the console
Omitting the HTTP protocol
Probably you have both Python and PHP scripts to be run on the same machine, so the HTTP part is quite useless and an additional point of failure.
As I already stated, the HTTP is only a transport and you can replace it (with some cost) with some other transport.
I came into an idea to use stdout/stdin as the transport, so Python would execute a PHP script (command line interface) and pass the XML-RPC request to the script's stdin. PHP would then have to get the XML-RPC request from stdin instead of from HTTP request.
This means two modifications in server and client code.
First the server:
class MyClass { /** * @param string $input * @return string */ public function repeat($input) { return $input; } } set_include_path(get_include_path() . PATH_SEPARATOR . 'zf/library'); require_once "Zend/XmlRpc/Server.php"; require_once "Zend/XmlRpc/Request/Stdin.php"; $server = new Zend_XmlRpc_Server(); $server->setClass('MyClass', 'myclass'); echo $server->handle(new Zend_XmlRpc_Request_Stdin());
The change is passing an instance of Zend_XmlRpc_Request_Stdin to $server->handle(). This is all needed. Guys from Zend Framework already predicted such a use.
Then, the client part.
Xmlrpclib allows passing a custom transport in case you want to implement some proxies or other thing. We'll make a transport, that instead of making a HTTP connection, runs a PHP script, passes the request to its stdin and gets the response from stdout:
from xmlrpclib import Transport, Server from subprocess import Popen, PIPE class LocalFileTransport(Transport): class Connection: def setCmd(self, cmd): self.cmd = Popen(['php', cmd], stdin=PIPE, stdout=PIPE) def send(self, content): self.cmd.stdin.write(content) self.cmd.stdin.close() def getreply(self): return 200, '', [] def getfile(self): return self.cmd.stdout def make_connection(self, host): return self.Connection() def send_request(self, connection, handler, request_body): connection.setCmd(handler) def send_content(self, connection, request_body): connection.send(request_body) def send_host(self, connection, host): pass def send_user_agent(self, connection): pass server = Server('http://host.com/path/to/the/php/script/myclass.php', transport = LocalFileTransport()) print server.myclass.repeat('Hello XML-RPC with no HTTP service')
Notes:
- host.com in the URL is completely ignored, use whatever value you want
- /path/to/the/php/script/myclass.php in URL is passed as the PHP script to run
What to do next?
Having this simple skeleton, you can now extend the MyClass, actually give it more proper name first! You can also attach more classes to the XML-RPC server using different namespaces:
$server->setClass('SomeClass', 'some);
$server->setClass('MyClass', 'my');
$server->setClass('YourClass', 'your');
Only public methods are exposed to the XML-RPC clients, so you can hide some logic inside of private or protected methods and only expose what you need from given classes.
This solution is a quick way to actually use some of your well-working PHP code in your fancy-new and elegant Python application. This can help if you want to make a filesystem with Python-FUSE, but want to data be taken from PHP application.
Did it help you?
I hope this helps someone. Feel free to comment.
Comments: 1
Wikidot is BIG
tags: dev high load lucene search wikidot
10 Jan 2009 12:48
As you may know I'm implementing a new search engine for Wikidot.
This seemed quite easy at first having nice Lucene implementation in PHP — included in Zend Framework and indeed during tests it was fast, simple and powerful. But this was tested on about 100,000 documents (document is a Wikidot page or forum thread) and we have about 2,500,000 documents in Wikidot now. And this is where the problem begins.
After indexing roughly 1,800,000 documents there were some problems with memory consumed by the indexing process (500 MB merory limit was not enough in SOME cases).
Even earlier I realized that the search times weren't good enough. This is why I implemented the searching part in Java, which is the native platform for the Lucene indexer. This sped things up.
Do you think indexing a document in just a second is fast? I though this is a good result. Indexing a document takes about 0.2 s when having small amount of documents in the index already. But when you have 400,000 documents in index, adding another document to the index takes about 0.4 s. And having even this "good" indexing time (below a second), indexing the whole Wikidot would take at least a few days.
This leads me to a conclusion, that Wikidot is really BIG.
A similar situation also applied to the user uploaded files. There was a problem of a limit of filesystem reached, which was about 32,000 directories max in a single directory. Having all user-uploaded files in a directory structure of one-directory-per-wiki, this resulted in a problem when having more than 32,000 wikis.
Replicating this structure to another machine (also known as live-backup of user-uploaded files) was also quite a challenge, because we've reached a limit of directory watches in the kernel-level filesystem-monitoring system (inotify).
It all shows, that things that seem easy are not necessarily easy because of the high scale of the Wikidot, which touches some limits on nearly every piece of software we use. But this is also a great chance to really test those projects and how they react to such a high load.