14 November 2012

How to find deadlocks in an ASP.NET application almost automatically


This is a quick how-to for finding deadlocks in an IIS/ASP.NET application running on a production server with .NET4 or .NET 4.5.

A deadlock bug inside your ASP.NET application is very ugly. And if it manifests only on some random production server of your web farm, maybe you feel like doom is immediately ahead. But with some simple tools you can catch and analyze those bugs.

These are the tools you need:
  • ProcDump from SysInternals   
  • WinDbg from Microsoft (Available as part of the Windows SDK (here is even more info)) 
  • sos.dll (part of the .NET framework)
  • sosext from STEVE'S TECHSPOT (Copy it into your WinDbg binaries folder)
ProcDump will be needed on the server where the deadlock occurs. All the other tools are only needed on your developer machine. Because WinDbg doesn't need any installation you can also prepare an emergency USB stick (or file share) with all the necessary tools.

If you think a deadlock occurred do the following:
  1. Connect to the Server
  2. Open IIS Manager 
  3. Open Worker Processes 
  4. Select the application pool that is suspected to be deadlocked
  5. Verify that you indeed have a deadlock, see the screenshot below
  6. Notice the <Process-ID> (see screenshot)
  7. Create a dump with procdump <Process-ID> -ma
    There are other tools, like Task Manager or Process Explorer, that could dump but only ProcDump is smart enough to create 32bit dumps for 32bit processes on a 64bit OS.
  8. Copy the dump and any available .pdb (symbol) files to your developer machine. 
  9. Depending on the bitness of your dump start either WinDbg (X86) or WinDbg (X64)
  10. Init the symbol path (File->Symbol File Path ...)
    SRV*c:\temp\symbols*http://msdl.microsoft.com/download/symbols
  11. File->Open Crash Dump
  12. Enter the following commands in the WinDbg Command Prompt and wait
  13. .loadby sos clr
  14. !load sosex
  15. !dlk
You should now see something like this:

0:000> .loadby sos clr
0:000> !load sosex
This dump has no SOSEX heap index.
The heap index makes searching for references and roots much faster.
To create a heap index, run !bhi
0:000> !dlk
Examining SyncBlocks...
Scanning for ReaderWriterLock instances...
Scanning for holders of ReaderWriterLock locks...
Scanning for ReaderWriterLockSlim instances...
Scanning for holders of ReaderWriterLockSlim locks...
Examining CriticalSections...
Scanning for threads waiting on SyncBlocks...
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
Scanning for threads waiting on ReaderWriterLock locks...
Scanning for threads waiting on ReaderWriterLocksSlim locks...
*** WARNING: Unable to verify checksum for System.Web.Mvc.ni.dll
*** ERROR: Module load completed but symbols could not be loaded for System.Web.Mvc.ni.dll
Scanning for threads waiting on CriticalSections...
*** WARNING: Unable to verify checksum for System.Web.ni.dll
*DEADLOCK DETECTED*
CLR thread 0x5 holds the lock on SyncBlock 0126fa70 OBJ:103a5878[System.Object]
...and is waiting for the lock on SyncBlock 0126fb0c OBJ:103a58d0[System.Object]
CLR thread 0xa holds the lock on SyncBlock 0126fb0c OBJ:103a58d0[System.Object]
...and is waiting for the lock on SyncBlock 0126fa70 OBJ:103a5878[System.Object]
CLR Thread 0x5 is waiting at System.Threading.Monitor.Enter(System.Object, Boolean ByRef)(+0x17 Native)
CLR Thread 0xa is waiting at System.Threading.Monitor.Enter(System.Object, Boolean ByRef)(+0x17 Native)


1 deadlock detected.


Now you know that the managed threads 0x5 and 0xa are waiting on each other. With the !threads command you get a list of all threads. The Id Column (in decimal) is the managed thread id. To the left the WinDbg number is written. With  ~[5]e!clrstack command you can see the stacktrace of CLR thread 0x5. Or just use ~e*!clrstack to see all stacktraces. With this information you should immediately see the reason for the deadlock and start fixing the problem..


Deadlocked Requests visible in IIS Worker Process

Automate the Deadlock Detection

If you are smart, create a little script that automates step 2 to 7. We use this powershell script for checking every minute for a deadlock situation:


param($elapsedTimeThreshold, $requestCountThreshold)
Import-Module WebAd*
$i = 1
$appPools = Get-Item IIS:\AppPools\*
while ($i -le 5) {
ForEach ($appPool in $appPools){
 $count = ($appPool | Get-WebRequest | ? { $_.timeElapsed -gt $elapsedTimeThreshold }).count
 if($count -gt $requestCountThreshold){
$id = dir IIS:\AppPools\$($appPool.Name)\WorkerProcesses\ | Select-Object -expand processId
$filename = "id_" +$id +".dmp"
$options = "-ma"
$allArgs = @($options,$id, $filename)
procdump.exe $allArgs
 }
}
Start-Sleep -s 60
}


3 comments:

  1. Excellent!!! Thanks!
    Just a little detail - in the script provided you probably have deleted the line incrementing $i by mistake.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Thank you so much for this. Identified the cause of an issue that's plagued us for a good number of days through following this process. Thank you! Dankeschön Herr Christian!

    ReplyDelete