21 August 2010

Speed up your Subversion repository

Normally, I am satisfied with the performance of Subversion. I used to work on a 20GB repository with ca 20000 revisions, 20 developers in a not so fast local network. Daily business operations like update, commit were done in a few seconds.

Now I work with 30 developers on 4GB repository with ca 18000 revision in a super fast network. And the performance sucks, compared to my former experiences. A checkout of a 120MB working copy takes more than 10 minutes. Updating (without any changes) takes at least 40 seconds.

So what’s the problem here? I have some guesses but are not yet sure about the real reasons. But here are my suggestions:

  • Use the FSFS mode (I hope nobody is using Berkeley DB anymore?)
  • Use Sharding. This affects the way how the repository is stored on the server. At most 1000 files are put in one directory then another directory will be created. This helps a lot if the underlying file system performance worse with an increasing number of files per directory.
  • Pack your repository regularly. Normally, each revision is stored as a single file. If you have a sharded repository then svnadmin pack will convert all full shards into one big files. This saves space and helps the OS to reduce I/O and improve caching.
  • Use the svn:// protocol. The http and https protocols are just a tunnel for WebDAV delta-v which is a quite chatty protocol. For each file you need a full roundtrip from the client. On high latency networks this could be a real bottleneck. The svn protocol is much faster and consumes less cpu cycles. On my test machine a complete checkout of 120MB working copy took on average 5min 20s over https:// and only 4min over svn://.
  • Check your commit hooks. Perhaps you have installed some expensive commit hooks. On Windows, try using RunDetached to prevent subversion from waiting for the hook to finish.
  • Beware of Polling Build Agents. If you are Continuous Integration there will be some kind of mechanism in place to detect changes in the repositories. The most efficient one is a post commit hook, but for example CruiseControl.NET and TeamCity use an inefficient polling approach that is basically doing an “svn log” and parsing the output. Doing this over https every second from forty build agents can easily bring up the load on the repository server to 100%. A more efficient polling mechanism would be to store the latest revision and query only the newest revision of the repository. This is only implemented by some custom plugins.
  • Monitor I/O load. Still, when people think of performance they think of CPU performance. But for the subversion repository I/O is the limiting resource. An update or checkout operation will do many small reads on different areas of the repository. Therefore the average access time is the most important factor. If your repository is running on a virtual machine make sure that the repository is located on a physical drive that is exclusively reserved for this purpose. Use a SSD or at least a 10,000 RPM drive.
  • Store your repository on a  Intel Solid State Disk. The disk should be exclusively reserved for use by the repository. No other application should touch it. This is the simplest and most effective way to improve performance.
  • Optimize your working copy. Change the layout so you can do partial updates. Try to use svn update –-svn-depth -exclude to exclude parts you don’t need in your day-to-day work. Remove files you don’t need. 

Note: All tips are written at the time of Subversion 1.6 and increase the server performance. Subversion 1.7 will improve the local working copy which theoretically should also increase the client performance.

08 August 2010

Requirements for a Dependency Injection Container

Recently I was asked by a coworker about my requirements for a DI Container as part of a poll to all developers. My first reaction was to answer with the famous Ford quote “If I’d asked people what they wanted, they would have said faster horses.” This was because I personally realized the benefits of using a DI container only after working with one in a real project. Before this experience I wasn’t really able to give reasons why I should use one at all. Sure, I wanted one to try out, because I had the feeling it could be useful, but giving requirements was out of scope.

Today I have worked with Spring.NET and much more with Unity. I know StructureMap and Autofac (but Castle Windsor is still on my list :-). I believe that DI containers should be provided by the .NET framework (and sooner or later will be) just like the collection classes. No big up front requirements analysis should be done because a DI container is no longer rocket science. Just start using one that is accepted by the community. If you haven’t used one you wouldn’t know what a DI can do for you. If you have used only one you would repeat features as requirements. If you know more than one you would list the features you love most.

This is my list of important and useful features:

  • Container setup should be possible in code with a readable and fluent API. Use explicit xml configuration only as a last resort (too much bad experiences with Spring.NET xml configuration). Setup with code allows intellisense and checking at compile time. Most setups will be done in test code, not in production code!
  • Wiring dependencies should be possible by conventions or attribute based. Use explicit wiring only as a last resort (bad maintainability).
  • Understandable error message and diagnostic help if something went wrong when constructing/resolving a type.
  • Constructor and property must be possible, event and method injections would be nice to have.
  • Nested Container. That means you can create a container that inherits from an existing one and add or overwrite some mappings or strategies. Useful for test code.
  • Extensibility: it should be possible to implement autofaking or automocking strategies (described here and here) which are extremely useful for unit testing.
  • Lifetime of objects should be configurable in different ways (free, singleton, container bound, thread bound, …).
  • If object lifetime can be bound to the container lifetime the disposal of the container should also dispose all contained objects.
  • Automatic factories. The possibility to not inject a single object but a generic factory, say Func<T>(), without explicit configuration.
  • The container should have at least two distinct interfaces, one for configuring the container and one for resolving/constructing types.
  • Static Service Locator Facade (with override possibility) for working with legacy code.
  • Partial construction if you have no control over object creation (for frameworks like WPF or ASP.NET) but still want to use you container to inject some dependencies.
  • Should have no or very tedious interface to specify constructor parameters at resolve time. Reason: if you do so you don’t use your DI container as intended.
  • … (to be continued) …