12 Dec 2014 18:31
TAGS: github svn svn2github
I'm the one behind the svn2github.com service. The service was started a few years ago to help me and my team start a PHP project that we wanted to host in Git. The PHP libraries we wanted to use were hosted in Subversion. Given the Composer was not too popular we decided to put the links to the libs into the repo using git submodules.
Given how basic the git-svn is, and how easy it is to have a Git repo in GitHub, we thought, hey, there must be an automatic mirroring tool to clone publicly available SVN repos to GitHub. The service would be probably called svn2github.com.
Such service was not there and we struggled looking for one, but again, since this was so easy to set up, I decided that I can do it myself! So I did.
Svn2github.com operated with only minor administration from my side, but at some point I realized there are over 500 repos mirrored by it, which was fine.
But recently the server svn2github is hosted on started to experience some problems, mostly with I/O throughput and I connected them directly to the operation of the svn2github processes.
Basically with so many repositories stored at the disk, the periodical task to do just git svn rebase and git push if there were any changes might be challenging due to just the number of IO operations needed to accomplish this. Also at some point the (most important) data just stops fitting in cache and the disk IO needs to be requested each time an FS operation is needed.
The problem became apparent to me, because of the other services that I run on the same hardware, mainly the database. It started to be terribly slow meaning all the other apps would take forever to do even a basic task.
I needed to suspend the svn2github operation to let the more important services continue to run, but I planned to bring this useful service back to life. As often in such cases I want to add more features while doing that and make the updates more clever, so they don't consume so much server "life" as they were.
The first step though is restarting svn2github, which means you can now add more SVN repos to be mirrored to GitHub and the repos will be synchronized with one small exception. Any repository that contains more than 2000 files (including the .git files) will not be automatically updated.
I'll update the GitHub descriptions of those "paused" mirrors and if you want them to be "resumed", I'll ask you to contact me and let me know. This way the service will continue to work for the small repos (which are the majority), which don't cause so much trouble for the machine, while the big repos would be only updated when requested (I assume most of them were needed "once" and now no-one really needs them in place).
Happy SVN mirroring! See you on svn2github.com!
UPDATE: Some svn2github stats
To give a notion of scale this project is at here are some stats:
Repositories with less than 2000 files each (including the .git files):
Number of them: 482
Total size of them on disk: 19G
Total number of files on disk: 954321
The biggest one: 635M (DevIL)
The smallest one: just 208k (aszip)
Repositiories with over 2000 files in each:
Number of them: 231
The total size: 308G (took 133m58.527s to compute that)
The biggest one: 42G (testingazuan)
Love the project!
Keep up the great work.
Love it
1 Show errors, please, not hang. For example if the repo needs authentication.
2 Allow to provide username and probably password (they must be in whitelist) because some services need authentication and dont allow login without password.
Hi dfssdfgsd,
Agreed on the first point.
The idea of svn2github is to clone only open source password. User/password protected for the source means it's not open.
Cheers,
Piotr
Piotr Gabryjeluk
visit my blog
Not always. Sometimes it is. Try to add http_://_qeforge._qe-forge.org_/svn_/q-e/trunk/espresso (remove undescores)
It is FOSS, but it causes hang.
Why?
Maybe it needs auth despite it is FOSS.
In fact it needs auth (I have added a quote, but it is not displayed by unknown reason, so the next sentences is the quote).
This project allows anonymous checkouts of its source code. Use the command below and use 'anonymous' as the username and a blank password to checkout the code.
Please could you take a look to see why svn2github/agg isn't updating?
It's a small project (~20MB, <400 files) but it is 24 revisions behind the source.
Thanks.
Hello.
I mirrored some projects and something went wrong: It got unable to detect trunk, branch. I also found other mirrored repositories with similar issues:
DoubleCommander
kitty
ida-x86emu
virtualbox
1541UltimateII
This didn't mirror at all
winswitch
I did many tries with other repositories, but not remember them all. Sorry.
Kind regards.
Is there any way to claim SVN commits with my GitHub account?
timofonic, what URLs are you trying to clone? I just tried svn://svn.code.sf.net/p/doublecmd/code/trunk and it worked OK (cloned to http://github.com/svn2github/doublecmd-test). It took a lot of time, but it worked.
Piotr Gabryjeluk
visit my blog
Post preview:
Close preview