Svn2github problems

12 Dec 2014 18:31

I'm the one behind the svn2github.com service. The service was started a few years ago to help me and my team start a PHP project that we wanted to host in Git. The PHP libraries we wanted to use were hosted in Subversion. Given the Composer was not too popular we decided to put the links to the libs into the repo using git submodules.

Given how basic the git-svn is, and how easy it is to have a Git repo in GitHub, we thought, hey, there must be an automatic mirroring tool to clone publicly available SVN repos to GitHub. The service would be probably called svn2github.com.

Such service was not there and we struggled looking for one, but again, since this was so easy to set up, I decided that I can do it myself! So I did.

Svn2github.com operated with only minor administration from my side, but at some point I realized there are over 500 repos mirrored by it, which was fine.

But recently the server svn2github is hosted on started to experience some problems, mostly with I/O throughput and I connected them directly to the operation of the svn2github processes.

Basically with so many repositories stored at the disk, the periodical task to do just git svn rebase and git push if there were any changes might be challenging due to just the number of IO operations needed to accomplish this. Also at some point the (most important) data just stops fitting in cache and the disk IO needs to be requested each time an FS operation is needed.

The problem became apparent to me, because of the other services that I run on the same hardware, mainly the database. It started to be terribly slow meaning all the other apps would take forever to do even a basic task.

I needed to suspend the svn2github operation to let the more important services continue to run, but I planned to bring this useful service back to life. As often in such cases I want to add more features while doing that and make the updates more clever, so they don't consume so much server "life" as they were.

The first step though is restarting svn2github, which means you can now add more SVN repos to be mirrored to GitHub and the repos will be synchronized with one small exception. Any repository that contains more than 2000 files (including the .git files) will not be automatically updated.

I'll update the GitHub descriptions of those "paused" mirrors and if you want them to be "resumed", I'll ask you to contact me and let me know. This way the service will continue to work for the small repos (which are the majority), which don't cause so much trouble for the machine, while the big repos would be only updated when requested (I assume most of them were needed "once" and now no-one really needs them in place).

Happy SVN mirroring! See you on svn2github.com!

UPDATE: Some svn2github stats

To give a notion of scale this project is at here are some stats:

Repositories with less than 2000 files each (including the .git files):

Number of them: 482
Total size of them on disk: 19G
Total number of files on disk: 954321
The biggest one: 635M (DevIL)
The smallest one: just 208k (aszip)

Repositiories with over 2000 files in each:

Number of them: 231
The total size: 308G (took 133m58.527s to compute that)
The biggest one: 42G (testingazuan)


More posts on this topic

Comments

Add a New Comment
or Sign in as Wikidot user
(will not be published)
- +
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License