Debugging the debugging experience
By Clive Tong, March 30th 2012
The CLR is a canny beast that takes liberties with optimising your code using several different mechanisms, which can make debugging retail builds of software an ungainly experience. I’ve been doing some investigation into how to make this process as similar as possible to “normal’ debugging, which I’d like to share with you. But first, let’s look at those liberties the CLR is taking.
First, there’s the ahead-of-time ngen process. If we’re not jitting the IL, then there’s nothing waiting for a quick result, and so we can apply more of the traditional compiler optimisations to generate tighter code. The results are saved as dll files which can be loaded later into the process.
Second, there’s the JIT compiler itself, which can see if there’s a debugger attached to the process at the point that the code is jitted. If there is a debugger present, then the JIT compiler can decide to generate code which is more debuggable. In this case, it makes sense that the CLR tries to keep a more accurate mapping between the machine code and the initial IL, which is, in turn, related to the original source code using a pdb file.
Unfortunately, there is a tradeoff between optimisation and one’s ability to debug the running code. In the old days, the amount of effort a compiler spent improving the generated code was controlled using compiler flags, which in turn were set very differently between retail and debug builds.
Most of the information about the optimisations that the CLR applies is sparsely threaded through independent blogs posts and MSDN documentation. For quite some time I’ve wanted to sit down and look at the effects of various techniques which people have described as giving the user the best debugging experience. With Reflector VSPro, we really need to know this kind of information – our users typically want to attach to a retail build of a product and debug it, and we’d like to keep the experience as close as possible to the initial debugging experience you get with a debug build.
With that in mind, this post describes just some of the experiments I did to get confidence with the steps that are needed to keep the debugging experience good. Without a definitive description of the workings of the CLR, experiments are all that we have and, as with all experiments, you can develop a model that doesn’t accurately reflect what’s happening. I welcome any feedback.
In case you just want the result and don’t want to read any further, the debugging experience seems to be best if you have the environment variable COMPLUS_ZAPDISABLE set from the time that anything gets jitted, and you also have a suitable ini file placed against one or more assemblies (by suitable I mean “contains suitable control flags to turn off the optimisations’). I’d naturally still recommend that you read on, as there are deeper details which are worth knowing!
I fully intend to get these settings into Reflector VSPro, so that it’s the tool rather than the user that sets up the necessary files and environment. At the very least, we intend to have the tool give the user enough information for them to do the work themselves (if Reflector can’t automate the process). Of course, user intervention is always going to be needed in the case of, for example, an IIS web application which is going to be started before a debugger is attached to it.
Initial Experiments with Optimization
For some initial investigation, we’ll start with the following simple C# console application:
The small method is designed to provoke inlining by the jit compiler, as a way to tell if such optimisations have been disabled. The never-ending loop gives us time to retrospectively attach, while at the same time taking enough processor cycles for the application to show up in process Explorer.
We’ll compile this code into both debug and release builds. In the case of a debug build we get the following IL
And the following attributes on the Assembly:
The NOPs are present to allow a better mapping, via the pdb file, back to the original source code. I believe it is this long DebuggableAttribute that turns off some of the optimisations (as we’ll soon see).
In release build, we get the following code:
And the following attributes
We first run the release version, letting it get into the code which does the printing so that we can be sure the methods are jitted. We can then attach using windbg and look at the loaded modules.
I believe the CLR currently doesn’t re-jit methods – this has been added as a feature in .NET 4.5 so that an attached profiler can demand the re-jit of methods. However, the CLR doesn’t yet do the kinds of work that a Java HotSpot compiler does using on-stack replacement and adaptive compilation to specialise the code, depending on dynamic call patterns. What it does do is optimised interface dispatch using dynamically generated stubs after a slow path has been executed a number of times, but that doesn’t affect our example here.
In Windbg, we see the following modules have been loaded:
The mscorlib.ni.dll is the result of the ngen processing of the assembly.
If we do the same thing with the debug version, letting it run and then attaching, we see that the following modules were loaded:
We see that the debug version of our compiled method doesn’t load the ngen version of mscorlib. I’ll attempt to show below that this has led to a better debugging experience.
In the release case, we can attach with Visual Studio after the warm-up period described above. Using the disassembler pane, we see that the code for our method looks like this:
We can see that the small method which calls WriteLine has been inlined, and that the stack trace doesn’t show the call into Console.WriteLine.
If we now run the Debug version, we see that there is no inlining happening, and mscorlib is less heavily optimised, so the stack trace is more accurate. This lack of optimisation is essential for being able to accurately debug with a minimum of hassle.
The big question is this: can we take the Release build and get the same debugging experience?
First, we can try setting COMPLUS_ZAPDISABLE, which prevents the system loading ngen’ed assemblies. This needs to be set before the point that the loading is triggered by the jit.
This leads to the following set of loaded modules, which I got by letting the release build run and then attaching windbg:
And if we attach from Visual Studio, we see the following:
We can see that the ngen’ed image hasn’t been loaded, but inlining has still taken place.
The second trick is to set up an ini file to control the optimization for jitted code. In this case, we’ll be setting it against the top level application – it has the same name as the file but with the exe or dll extension replaced with ini. Into it, we can put some configuration options; one saying that we’d like to track the machine-code-to-IL relationship, and another to turn off optimisations.
Adding this .ini file has a dramatic effect on the debugging experience.
Notice that no inlining has happened inside the user code, and we now get a faithful stacktrace for mscorlib.
Back to the lab!
As a further experiment, we can unset the COMPLUS_ZAPDISABLE, which leads to the ngen’ed mscorlib being reloaded, though the inlining has still been turned off:
If we take a small example project, the COMPLUS_ZAPDISABLE noticeably changes the behaviour so that there are many more places where we can effectively set the next statement in the code. For a simple project using the in-built WebDev webserver, if the devenv.exe that launches it was launched in a COMPLUS_ZAPDISABLE environment, then stepping into something like mscorlib is a lot better. Unfortunately, the experience doesn’t become “full’ until you put an .ini file next to the mscorlib to ensure proper stepping and ‘move next statement’ behaviour. Hence we seem to need both the environment variable and the ini file to get the best results.
We should also point out that, in the above experiments, we have been improving the behaviour when we attach the debugger after the application has been running for a while, and so jitting has already taken place. If you are debugging into a project by launching it from Visual Studio, then the fact that the debugger is attached when the jitting happens also improves the debugging experience. The extra complication in that scenario is that there is a hosting process which is used by Visual Studio to provide debugging – turning this off can also be useful.
I was also interested to see how to improve the experience for debugging a web application running under IIS, so I wrote a simple web application with a simple button on the main page, and the following code behind. Console.WriteLine is a common theme in these examples because it gives us an easy way to get into mscorlib.dll, and the WriteLine method makes a call to another method, so we can see if this call gets inlined (allowing us to see if optimisation is happening).
The loop is handy, because we can easily identify the w3wp.exe process that is hosting the application by seeing which one starts consuming CPU after the button is hit in the ASP.NET application.
A little help from Microsoft
I was a little worried about tweaking the setting for the w3wp.exe process, which is used by IIS to host the CLR running the web application. However, while looking around, I came across a script that Microsoft provides to aid with the profiling of .NET Web Applications:
c:/Program Files (x86)/Microsoft Visual Studio 8/Team Tools/Performance Tools/VSPerfCLREnv.cmd
This also sets various environment variables to control loading of the CLR Profiler around a re-launch of the application. The CLR loads a profiler dll as a COM component, if directed to do so at startup time by environment variable settings.
The script also controls the environment variables for services using the following part of the registry.
With this in mind, I went into the registry, added a setting of COMPLUS_ZAPDISABLE to 1, and then killed the w3wp.exe process, and the svchost process that was its parent. I did this using the process explorer tool from SysInternals, which allows you to easily see the relationship between running processes, and which also allows you to see the environment variable settings for a given process. This tool makes it easy to check that the changes have taken effect.
When I hit the web page again, w3wp.exe is re-launched and process explorer shows that the setting has taken effect:
Attaching windbg to the w3wp shows that there is no ngen loading for the IIS process:
Using Process Monitor, you can also see the process probing for .ini files next to all of the assemblies that it jits. Going into the GAC and placing a suitable .ini file against the relevant assembly, in this case mscorlib.dll, allowed me to later attach with Visual Studio and get a good debugging experience – stepping was accurate, variable values were always available, and ‘set next statement’ worked effectively.
Whereas in the case of the simple windows forms application, placing an .ini file against the top level assembly seemed to affect the behaviour of the jit for the loaded modules, in the case of the w3wp.exe, the only .ini files that are probed are the ones that are next to the loaded assemblies (included those that are dynamically generated by ASP.NET)
The final question is whether we need to set the debug property in the Web.Config:
<compilation debug=“true” targetFramework=”4.0″ />
And I don’t yet have a particularly good answer for that, though I imagine it affects at least the setting of the compiler when ASP.NET generates the application assemblies. Setting it and then attaching windbg to the resulting w3wp.exe shows that the ngen images are being loaded:
… So it certainly doesn’t turn off all optimization. I’ll need to do more work to find out exactly what it does do.
In summary, if you want to get a better debug experience for debugging retail builds, you need to make sure that COMPLUS_ZAPDISABLE is set, and probably place .ini files in the right place (which is most likely next to all of the assemblies your process is going to load). This needs to be done in advance, or at least before the assembly is loaded in order to get the best experience.
Hopefully you’ll be able to use this method right away, and we’re working on getting as much of this automated in .NET Reflector as possible. And as I said at the start, this is all rather experimental, so all feedback is welcome.