.NET Oddities #2

I had a feeling I’d be writing quite a few entries like this, but I must confess that even I would have been surprised if you’d told me I’d be writing #2 a mere two days after #1. So what is it this time?

Well, it’s StackOverflowException. Jeff Richter has this to say about StackOverflowException in chapter 18 of his excellent book, Applied Microsoft .NET Framework Programming:

The CLR throws this exception when the thread has used all its stack space. Your application can catch the exception, but finally blocks won’t execute since they would require additional stack space and none is available. Also, whilst a catch block might catch this exception (to log some information to help debugging), it should never swallow this exception. The reason is that the application is in an undefined state now because its finally blocks didn’t execute. Any catch block that catches StackOverflowException should rethrow it — let the CLR terminate the process. If the stack overflow occurs within the CLR itself, your application code won’t be able to catch the StackOverflowException exception and none of your finally blocks will execute. In this case, the CLR will connect a debugger to the process or, if no debugger is installed, just kill the process.

Incidentally, if you’re a .NET developer, and particularly if you work with C#, and you don’t have a copy of this book, I’d strongly recommend you get one. It makes for an invaluable reference, and often does a superb job of explaining why things are the way they are, along with some of the problems with the .NET framework.

I think it’s basically fair to say that I agree with everything Jeff says above. It all makes perfect sense. However there’s one thing about the .NET implementation of StackOverflowException that really gets my goat, which is that it doesn’t provide any kind of stack trace. This makes it extremely difficult to debug. Now I know what you’re thinking: you’re thinking that from what Jeff says above, it’s probably fair to assume that because the CLR is out of stack space there’s no room to allocate the stack trace, and actually you’re probably right. But the thing is, like most of you, I’m working to a deadline, and there’s some sense of needing to get things done to meet that deadline, so I don’t care whether or not there’s stack space available: I want that information, I need that information, and I want it regardless of the hoops the CLR has to jump through to give it to me, especially when having it could save me hours of debugging time (see below).

You might be thinking I’m being slightly unreasonable at this point, but let’s take a look at a really simple example written in Java:

package stackoverflowtest;

public class StackOverflowTest
{
    private static void ThisWillBlowUp2()
    {
        ThisWillBlowUp();
    }
  
    private static void ThisWillBlowUp()
    {
        ThisWillBlowUp2();
    }
  
    public static void main(String[] args)
    {
        ThisWillBlowUp();
    }  
}

And here’s the output on standard error when I run it in an old beta of NetBeans 4.0:

Exception in thread “main” java.lang.StackOverflowError
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp2(StackOverflowTest.java:7)
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp(StackOverflowTest.java:12)
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp2(StackOverflowTest.java:7)
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp(StackOverflowTest.java:12)
    … (goes on for about another thousand lines)
    …
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp2(StackOverflowTest.java:7)
    at stackoverflowtest.StackOverflowTest.ThisWillBlowUp(StackOverflowTest.java:12)

Magic! That’s exactly what I want: I’ve no idea how they do it, but they do do it. And that’s going to make my life a whole lot easier as a developer if this exception ever occurs. So why can’t the CLR do it? Answer again: I don’t know. What I do know is that in the face of this, the whole “it doesn’t have any stack space left” argument is starting to sound like an excuse rather than a plausible reason, since this is presumably also true for the JVM.

Of course, the only way to figure out what was going on was to manually step through the code in the debugger, and progressively set breakpoints throughout my test case and the code it invoked (I was testing some code I’d just written with NUnit). This took a while; in fact it’s taken me a good couple of hours. The StackOverflowException was in fact masking another exception thrown elsewhere in a finally block by an apparently unrelated chunk of code squirreled away in a Dispose() method buried deep within our application. Needless to say, it was a particularly bad piece of code, and I can’t imagine what my mind was on when I wrote it, but I was unbelievably frustrated by the time I managed to find it.

If I’d had a stack trace I could have figured all that out in about 1 minute, rather than 2 hours. I guess my point here is that the more concrete information you have, the easier debugging becomes, and yet sometimes .NET hides the key information you need to solve a problem. The reasons for this are unclear, particularly when you consider that a big part of .NET’s raison d’etre is to make our lives as developers easier and more productive. I imagine we can hope for some improvement in future versions. At any rate, let’s hope so.