21 August 2010

Speed up your Subversion repository

Normally, I am satisfied with the performance of Subversion. I used to work on a 20GB repository with ca 20000 revisions, 20 developers in a not so fast local network. Daily business operations like update, commit were done in a few seconds.

Now I work with 30 developers on 4GB repository with ca 18000 revision in a super fast network. And the performance sucks, compared to my former experiences. A checkout of a 120MB working copy takes more than 10 minutes. Updating (without any changes) takes at least 40 seconds.

So what’s the problem here? I have some guesses but are not yet sure about the real reasons. But here are my suggestions:

  • Use the FSFS mode (I hope nobody is using Berkeley DB anymore?)
  • Use Sharding. This affects the way how the repository is stored on the server. At most 1000 files are put in one directory then another directory will be created. This helps a lot if the underlying file system performance worse with an increasing number of files per directory.
  • Pack your repository regularly. Normally, each revision is stored as a single file. If you have a sharded repository then svnadmin pack will convert all full shards into one big files. This saves space and helps the OS to reduce I/O and improve caching.
  • Use the svn:// protocol. The http and https protocols are just a tunnel for WebDAV delta-v which is a quite chatty protocol. For each file you need a full roundtrip from the client. On high latency networks this could be a real bottleneck. The svn protocol is much faster and consumes less cpu cycles. On my test machine a complete checkout of 120MB working copy took on average 5min 20s over https:// and only 4min over svn://.
  • Check your commit hooks. Perhaps you have installed some expensive commit hooks. On Windows, try using RunDetached to prevent subversion from waiting for the hook to finish.
  • Beware of Polling Build Agents. If you are Continuous Integration there will be some kind of mechanism in place to detect changes in the repositories. The most efficient one is a post commit hook, but for example CruiseControl.NET and TeamCity use an inefficient polling approach that is basically doing an “svn log” and parsing the output. Doing this over https every second from forty build agents can easily bring up the load on the repository server to 100%. A more efficient polling mechanism would be to store the latest revision and query only the newest revision of the repository. This is only implemented by some custom plugins.
  • Monitor I/O load. Still, when people think of performance they think of CPU performance. But for the subversion repository I/O is the limiting resource. An update or checkout operation will do many small reads on different areas of the repository. Therefore the average access time is the most important factor. If your repository is running on a virtual machine make sure that the repository is located on a physical drive that is exclusively reserved for this purpose. Use a SSD or at least a 10,000 RPM drive.
  • Store your repository on a  Intel Solid State Disk. The disk should be exclusively reserved for use by the repository. No other application should touch it. This is the simplest and most effective way to improve performance.
  • Optimize your working copy. Change the layout so you can do partial updates. Try to use svn update –-svn-depth -exclude to exclude parts you don’t need in your day-to-day work. Remove files you don’t need. 

Note: All tips are written at the time of Subversion 1.6 and increase the server performance. Subversion 1.7 will improve the local working copy which theoretically should also increase the client performance.