The 5 stages of solving real-life
.NET memory problems
Stage 1 by Beth Aitman
Getting started: the 4 main types of .NET memory problem
15 October, 2013
This series introduces five steps for solving memory problems in .NET. We start with understanding symptoms and getting set up, then walk through the four main types of memory issue we hear about from the community and our customers. We'll present some of the theory behind fixing the issues, as well as details of how people have solved them in practice.
In this article, we're giving an introduction to the types of problem you might encounter, and the general troubleshooting workflow that will help.
Symptoms and setup
Before you start investigating, it helps to have a clear idea of how you know you've got a memory problem. What are the symptoms? Are you getting
OutOfMemory exceptions? Or do you just have a feeling your application is using more memory than you'd like? The more specific the symptoms, the easier it'll be to track down the problem in your code.
Reproduction steps are also really useful. It's possible to use ANTS Memory Profiler for an open-ended investigation of what's using memory, but it's much easier to identify a real problem if you know roughly when or under what conditions it occurs. It's even better if you can identify a particular part of your application that you think is associated with the problem.
An ideal example might be: when looking at how much memory your application is using in Task Manager, the memory used goes up when you do a particular task repeatedly, and doesn't go down after you've finished the task. That would be a great place to start investigating, but the steps don't need to be as specific as that example. You can use any set of steps that you can follow to reliably reproduce the problem.
Good profiling – why comparison?
Instead of looking at what's using the most memory, you can look at what's growing the most
The strategy we recommend is comparing the memory your application uses at different times. You do this by taking snapshots, which gather information about everything that's using memory in your application at a point in time.
The idea is to start at a point that can act as a baseline for memory usage – a point you hope the usage will stay at or return to after it completes a specific task – and compare that baseline to a point when the problem has definitely manifested itself. The advantage here is that it cuts out a lot of the background noise.
Looking at the memory your application is using at a single point in time doesn't give you much information. You can look at the classes that are using the most memory, but normally that just reflects how your applications works. For example, if your application is a word processor, it's not surprising that it holds on to a lot of strings. So unless you have a very severe memory problem, the classes that are using the most memory are likely to be related to the central functions in your application.
What's the alternative? Comparing the memory usage at two different points in time. This gives you a very different set of information, because instead of looking at what's using the most memory, you can look at what's growing the most.
If you've got steps to reproduce the problem, and you take a snapshot before you start reproducing (not when the application is loading, but when it's fully started up) and after you've reproduced it, you can see what's grown the most during that time. If you expect your application to go back to baseline usage after it finishes the task, and it doesn't, this gives you a lot of information.
You expect the memory your application uses to be roughly the same in both snapshots, which means that anything that's grown larger is worth investigating as the source of your problem.
You don't just have to compare to a baseline of the app in a stable state. Taking snapshots during your reproduction steps can also give you information, and might help pin down exactly when in those steps the problem occurs.
Starting a profiling session
Start ANTS Memory Profiler and set up the session. Start profiling, and let your application load into a state where it's functional.
Tips to get the best data
This is where repro steps are useful, but it's important to wait until your application is ready to go. Taking a snapshot isn't very helpful until your application is in a stable state, because lots of things are still changing. Wait for your application to be in a useful baseline state. If there's a state you expect your application to return to in between tasks, that should be your baseline.
Once your application has fully loaded, click Take Memory Snapshot. This'll be your baseline snapshot, and it's used to compare all your later memory usage to. Then, go through your reproduction steps. You might want to take a few snapshots during the steps, to try to pinpoint exactly what's triggering the problem; if you do, keep note of the step you were at when you took the snapshot, so you know what you're comparing.
Types of memory problem
This is what you're looking for once you've got your data. There are four broad types of memory problem, each with its own symptoms.
Large Object Heap (LOH) fragmentation
This occurs when the space between objects on the heap is insufficient to fill with new objects. It can cause performance degradation and outright crashes. Your application may be throwing
OutOfMemory exceptions, and not at predictable times. Despite the exceptions, there seems to be plenty of free memory. This is one of the more frustrating issues to encounter.
ANTS Memory Profiler tells you if it looks like you've got a problem here.
You may well have a problem if most of the LOH is being taken up by free space, and if that free space is in small chunks. It's worth looking at size of largest fragment in comparison to total size of free space. The smaller that biggest fragment is, the more likely you are to have a problem.
How to investigate
Once you have profiling results, filter them by Objects on Large Object Heap, to see what the objects causing the problem are.
Often the problem is churn: short-lived large objects cause problems. Here's an article that gives an idea of what can cause problems: the Dangers of the Large Object Heap.
One possible solution is to avoid creating large objects by breaking down big classes into smaller ones. Also consider using LOH compaction in .NET 4.5.1, although that can have a serious performance overhead in some situations.
Managed memory leak
To find a managed memory leak, look at those classes that have grown the most when the problem has manifested itself. There's no quick trick to spotting which class is the problem; you need to systematically go through those classes until you find one being referenced that's not being used any more.
Make sure you have a baseline snapshot, and then take another when the bytes in the all heaps counter rises substantially on the timeline. At this point, you're looking for any suspicious classes.
How to investigate
Basically, work through the suspicious classes, examining what's holding on to them using the instance retention graph, until you find a reference that should be broken.
You can use filters to build a shortlist of classes that look suspicious. For example, ANTS Memory Profiler lets you filter the class list and instance categorizer by reference, narrowing it down to only objects kept in memory by disposed objects or event handlers. Undisposed event handlers are a common cause of managed memory leaks.
Or you can filter by New objects or Survivors in growing classes. Are there any unexpected classes, that shouldn't be creating new objects or hanging around? Don't forget to look at the classes at the top of the Class list when you sort by Size diff or Instance diff.
Then, for each suspicious class, in turn:
- Show the Instance Categorizer graph.
- If the class is one of yours, switch to the All references mode, so you can see the objects this class references.
- Check how the class is being kept in memory by looking at classes displayed to the left of your selected class. Check especially for any event handlers referencing your selected class.
- If a path looks incorrect, switch to the Instance Retention Graph to show how an instance is referenced along that path.
Once you find a chain of references that looks like it shouldn't be there, it's time to dig into your code and see how it can be broken. Breaking these chains that keep unwanted objects in memory should clear up a leak of managed memory.
Unmanaged memory leak
3rd party components can consume unmanaged memory in ways that are not always obvious
At a high level, unexpected crashes and OOM exceptions can easily come from unmanaged memory problems. In particular, if you're working with a web app, then crashes under heavy load or multiple users, and performance degradation over time can be red flags. In the memory profiler, you'll probably see that on the timeline the number of private bytes increases more quickly than the number of bytes in the .NET heaps.
How to investigate
If, when you look at the summary screen, the unmanaged section on the left hand pie chart is large, try a new profiling session with unmanaged memory profiling enabled. Unmanaged profiling can be slow, so only use it if you've seen you have high unmanaged usage. It's quite usual that the large unmanaged section is mostly the CLR, which is normal, but it can also indicate a problem. This is why we'd generally recommend checking for managed leaks first.
However, it's worth keeping in mind that 3rd party components and an awful lot of legacy code can consume unmanaged memory in ways that are not always obvious just from the .NET memory footprint. So if you know that you're using 3rd party components (graphical libraries are a frequent offender) then you may want to skip straight to checking for unmanaged issues if you see a big unmanaged slice of the pie.
To investigate, look at the unmanaged modules bar chart and see what's growing. You can also go to largest classes and order by unmanaged size. If you're seeing classes that look like part of a component, or code you know to handle unmanaged memory, then from here it's pretty simple – you just follow the process for troubleshooting a managed leak.
General high memory usage
This is the open-ended investigation. If you just want to reduce your application's memory usage, it pays to be systematic.
Look through the largest classes, work out what's holding them in memory using the Instance retention graph, and keep going until you find things that aren't needed any more at the time of the snapshot. Go back to your code and break those references. It's a bit like the investigation strategy for managed leaks, but without the helpful tell-tale symptoms.
If the issue isn't a managed or unmanaged leak or LOH fragmentation, you really do need to know your code. It's time to loop through large looking classes and make sure you're only referencing these classes when you really need to.
Trying the filters in ANTS Memory Profiler can help you eliminate possibilities, but this kind of investigation is always going to take time. We'll aim to cover this in a bit more detail and give some examples of situations where this has a happy ending later in the series.
This has been a brief overview of the troubleshooting strategies we recommend for sorting out .NET memory problems. We'll be going into more detail about managed and unmanaged leaks, LOH fragmentation, and covering some practical scenarios in more detail.
Generally, the best way to save time is to have clear reproduction steps for the issue, to take a series of snapshots after activities that affect performance or cause known problems, and to really know your code.
In the next part of the series, we'll go a bit deeper into Large Object Heap fragmentation.
ANTS Memory Profiler will drastically cut the time you spend finding and fixing memory leaks.
With ANTS Memory Profiler, you can:
- Profile managed and unmanaged code
- Optimize your C# and VB.NET code's memory usage
- Create better performing applications