Exceptionally Mysterious

If you are adept in puzzling over the cryptic messages in your stack trace, when a .NET exception is thrown, you may be surprised when they start talking complete nonsense. Is it really possible that two threads could build the stack trace on the same exception object at the same time?

One of the lovely things about .NET is its exception handling – it allows a method being called to signal that something has gone wrong, and better yet, it provides a stack trace showing exactly where that error occurred. Great for debugging!

Since it’s neither practical nor sensible to conduct testing with a debugger attached, a lot of our applications log any exceptions to a file on disk, allowing the developers to look back after a crash has happened, and dis­sect what went wrong. Usually, this works well…

Nigel earnestly ponders the stack trace

The other day, a rather confused Nigel wandered over with an interesting problem: he had a log of an excep­tion where the stack trace showed a very different path through the code than the message that accompanied it. We checked for all the obvious stupidities (over-zealous use of copy and paste), and ruled out the possibility of it being a side-effect of our obfuscator by reproducing the bug in a debug build.

One thing that raised alarm bells was that two exceptions, both with the same message but different stack traces, were happening at almost exactly the same time: race condition alert! We tried to blame it on the logg­ing framework, but to no avail: everything was correctly synchronised as necessary. Back to the drawing board.

In an attempt to reproduce it more quickly, I wrote a little test application:

This worked exactly as you’d expect: “Exception in M1” was always followed by a stack trace showing Method1Callee() on top, followed by Method1(), and likewise for Method2.

With the strangeness continuing, the next step was to use the same cause of the exception as Nigel was seeing in his application: a failed SqlConnection.Open() call. So, the test application was modified:

Here’s the sort stack trace we expected to get:

But here’s what we actually got, in quite a few cases:

Surely not? Method2 never calls itself, and certainly not Method1. And SqlConnection.Open calling back into Method2? Madness!

And in any case, why was this example showing the behaviour, but my initial application that was throwing ArgumentExceptions not? Time to drag Andrew, master of all that is deep and dark in the .NET Framework (well, one of developers of ANTS Profiler 4, which kind of requires that kind of knowledge!)

… master of all that is deep and dark in the .NET Framework…

Much head scratching later, he suggested that the same stack trace was getting filled in on multiple threads simultaneously, resulting in the garbage we were seeing. But the exception’s only being used on one thread, we said!

.NET said otherwise. We added some code to the catch block that stored the exception object in a list of “previously seen exceptions”, and then did a reference comparison (Object.ReferenceEquals()) of the newly thrown exception with all the ones we’d seen before. And then we saw it…

The exceptions being thrown in SqlConnection.Open() were being re-used between multiple calls. At this point, everything fell into place. Of course the code that builds up the stack trace string isn’t thread safe – why would it need to be? What we were seeing was the result of two threads concurrently building the stack trace on the same exception object, leading to an interesting mix of the two real stack traces.

I’m no expert in how the SQL Native Client library works, or whether it’s some artefact of some unmanaged interop somewhere, but this just seems wrong to me. If it’s deliberate re-use of exception objects, it seems like plain bad programming practice. If it’s an artefact of some COM calls somewhere, well, that raises the question of where else this might happen. I go away from this somewhat less confident that my stack traces are telling the truth in the future!