Exception Hunter 1.0 – Find crashes before they happen

In many ways this has been a very long time coming… think Daikatana, but without the offensive advertising campaign, the disgruntled employees, or the wild expenditure on, uh, what I’ll loosely term as software development related expenses. And nobody involved, at any point, has owned (or even driven) a Ferrari. As a matter of fact whilst the protracted timescales are almost identical, thankfully that’s pretty much where the similarities end.

I first remember a conversation about this when Lionel and I talking about it at the office Christmas party in 2004, around six months after I’d joined Red Gate, one month after Lionel joined, and when we still only had half a dozen developers in total. Then Lionel and James discussed it again maybe a year later, but this time went some way towards working out how you might go about doing it, and then in between times there were more casual discussions involving various different people.

But at the time Red Gate’s focus was largely SQL, so it never got off the ground.

Then in mid-2007 Red Gate formed the .NET Developer Tools division. We still had only one developer tool, ANTS Profiler, but Exception Hunter‘s time had come.

So what is it?

Put simply, it’s a tool that allows you to analyse your code and find the unhandled exceptions before they ever happen. But so what?

So this

Before .NET 2.0 came along, if your application threw an unhandled exception somewhere other than the main thread, you might get away with it. You might never even know it had happened, and maybe your user would quit the application before you knew anything bad had happened, or maybe your application would slowly edge its way into a world of utter agony in which it depends more and more strangely until it eventually conks out entirely, but by this time you’ve really no way of finding out why this has happened. It’s a debugging nightmare. Back in the mists of time I was asked to fix a bug in SQL Packager that turned out to be happening because deep in the bowels of the application a background thread was being terminated by an unhandled exception and leaving the application in an inconsistent state.

With .NET 2.0 the situation changed radically. Bam! Suddenly your application that worked most of the time under .NET 1.1 now doesn’t work for any length of time at all because there’s this background process that regularly throws unhandled exceptions and in .NET 2.0 if that happens then your application is terminated, or in ASP.NET your worker process is killed and a new one spawned. And, unless you’ve set up a delegate to listen for one of the unhandled exception events, such as AppDomain.UnhandledException, or Application.ThreadException, you’re probably not going to get any useful information either.

By now we’re all pretty used to this behaviour, and as a number of other authors and speakers have pointed out, it’s actually a good thing. And it’s certainly a lot easier to debug, assuming you do yourself the favour of setting up these event listeners and logging the exception information (including inner exceptions).

However, what this completely fails to address, is that when your application suddenly goes unexpectedly KERBLAM!!! out there in the field when it’s being used by a real customer, it’s also extremely embarrassing. You really don’t want that to happen. Many of us will have sat on both sides of that fence. It annoys your customers, it causes stress and hassle for your support team, who are probably going to have to deal with somebody actually foaming at the mouth, if not actually spitting burning jet fuel, and it soaks up time from your developers and testers, who are quite likely working on different projects by now, so the knock-on is that it hurts those new projects as well.

And it takes time, time during which your customer isn’t likely to get any happier. We tried an experiment recently, which was to see if we could get a patch out the door to a customer within a few days, even as a private release with extremely limited QA time assigned to it. We failed miserably. We failed because it’s not realistic to get the three or four people we needed to do all the things we needed in a very short period of time to do so when they’re all working on different projects, all of which have problems of their own. In reality what we ended up doing was rolling up perhaps a dozen fixes of varying levels of severity into a private release, each of us involved set aside an entire day to do the work we needed to do, and in total the whole process took around four weeks. And we’re a pretty small company in the grand scheme of things. For larger companies the situation is clearly going to be worse.

And whilst the “time is money” analogy can be taken too far, in this case time is money. Lost time equates to lost, or delayed sales. Too much lost time can equate to a failure to consolidate a product’s position in the market before other competing products are released. That sounds pretty extreme, but it can happen, and it does happen all the time. How many software products are delayed because they’re plagued by stability problems? Wouldn’t it be great if you could find and fix these issues before a tester even touches a piece of functionality, never mind a user?

The point is, you definitely want this stuff fixed before it goes out the door. It’s also entirely unrealistic to expect that your testers are going to find absolutely every scenario in which an unhandled exception can bring your application down before you release, even if they’re really good. Trust me, I know. At Red Gate the developer to tester ratio on every project we’ve run since I’ve been here has been somewhere between 1:1 and 2:1, and 1:1 is pretty common, even if for only part of the project. Some tools do even better: 3:4 in one case.

But still, you can’t find everything, and still your application can crash embarrassingly once it’s been released to the world at large, which is where Exception Hunter comes in.

Exception Hunter is a compile time analysis tool that analyses your assemblies and dives down into their dependencies to find out which exceptions your code can throw. It works by allowing you to pick the methods you want to analyse, shows you which exceptions they throw, and allows you to drill down through the stack traces to find their exact origin. Once you’ve found the line of code you’re interested in you can open up your source at that exact line in Visual Studio for editing. At this point you might choose one of the following courses of action:

  • Catch and handle the exception by performing some alternative logic to do what you need,
  • Wrap the exception in a domain specific exception type that you’ve created, or in some other different but more appropriate exception type, populate it with an informative message if you think you might report it to a user later, and then rethrow it,
  • Wrap your code in a try block, then add a finally block to perform resource clean-up, then add XML documentation comments to your code indicating that the exception is thrown,
  • Choose to do nothing in that method, but instead deal with the exception or perform clean-up at a higher level.

You might also find that the exception isn’t particularly interesting. For example, you’re 100% confident that the values you provide to a method will always be valid, and that therefore the ArgumentOutOfRangeException that is nominally thrown will never occur, so you never need to worry about it. That’s fine: you can and should make value judgements like that, but at least you’re now in a position to do so in an informed manner, rather than guessing and hoping.

You can also use Exception Hunter when you do get a crash report. So, you get the (probably obfuscated) stack trace back from the customer, or from your website logs, and figure out what’s going on, but you might legitimately wonder if anything else can go wrong. It’s a great time to run Exception Hunter over the offending code to see if there’s anything else that might catch you out. I did exactly this when fixing a problem with the meta-data cache in SQL Prompt, and managed to find three more potential problems in less than five minutes. I was then able to fix these before they caused any trouble.

Exception Hunter works with code written with every version of .NET from 1.1 right through to 3.5. It can analyse code originally written in C#, VB, and Managed C++. It also provides more limited support for J#.

You can find more information about Exception Hunter, including a demonstration video, at:

http://www.red-gate.com/products/exception_hunter/index.htm

You’ll find the support forum release announcement at:

http://www.red-gate.com/messageboard/viewtopic.php?t=6108