Ricky Leeks presents:

How to avoid avoiding automatic garbage collection

Five tips for avoiding automatic garbage collection

The garbage collector is a brilliant piece of software engineering, but sometimes collections occur at inconvenient times. If you need to write high performance, real-time code, you generally want to avoid GC collections during times where execution speed is critical.

Of course, sometimes GC collections are unavoidable, in which case it is best to try to minimize the duration of individual collections. While there is some overlap between optimizing to minimize GC frequency and optimizing to minimize GC latency, the two goals can also collide.

Understanding a bit more about how .NET's memory management works can help you with both paths, but generally the best rewards come from using memory analysis tools like ANTS Memory Profiler to understand what is causing garbage collections to run.

1. Know the difference between value types and reference types

There are two kinds of data types in .NET: value types and reference types. The difference between these two types ultimately comes down to how they store their data. An instance of a value type holds its data within itself, whereas an instance of a reference type holds a reference to the memory location of its data. A value type cannot be null and exists for as long as the object that contains it does. As such, a value type has, at worst, a negligible effect on garbage collection. In fact, value types often do not exist within the GC's heap.

By using value types where appropriate, you can both help prevent GC allocations and make the GC's collections speed along more quickly.

2. Consider a struct instead of a class

When creating a type, especially one that you are going to use in an array or a generic collection, consider making it a struct. When a GC collection does run, it only checks reference types to see if they are still reachable.

This is only a small benefit for individual struct instances, but the benefit grows quickly with arrays. With an array of structs, the GC just looks to see if the array itself is still a live object, since structs cannot be! There are several downsides to structs, though, so don't get too carried away.

Nonetheless, when you have a data structure which will have many thousands of instances created of it, or of which it would be impractical to create an object pool, a struct can often be a big benefit in terms of GC performance and minimizing automatic GC collections.

3. Design classes so that instances are reusable and then use object pools where appropriate

Regardless of the platform, most managed objects, excluding arrays and generic collections, tend to be small, so reusing (rather than recreating) objects can be a useful trick. Doing so prevents the GC's internal allocation counter from increasing, which helps prevent an automatic GC collection from running. Write objects you are likely to use many times so that each overload of its constructor has an equivalent public method that you can call to reinitialize that object. Then use a generic collection type such as a List to pool object instances.

Bear in mind that object pools can be very expensive if you are unable to eliminate GC collections during time critical code.

4. Use collection constructor overloads that take an initial capacity for collection types that have them

Internally, List<T>, Dictionary<TKey, TValue>, and the other generic collections use one or more arrays along with a tracking variable to identify how many items of an internal array hold valid data. Whenever adding an item to the collection would exceed the capacity of the internal array, a new array (with double the previous capacity) is created, increasing the GC's internal allocation counter.

If you know approximately how many elements will ever be in that collection at one time, you can use one of the constructor overloads to create the internal array(s) with a specific initial size. Assuming you set your initial capacity high enough, that internal array will never need to be resized, meaning you can avoid generating new allocations.

5. Be aware of common operations that generate allocations

There are a few common operations in .NET that generate GC allocations in a way that might not be obvious. Two prime targets are the boxing of value types and most string operations.

Boxing occurs when explicitly casting a value type to an interface. While interfaces cannot be instantiated, the CLR treats them as being reference types whenever you create an instance of an interface. A reference type container (a 'box') needs to be created to hold the underlying value type when it is cast to Object or to an interface. Simply by knowing about boxing, you are more aware of the memory implications of your code.

Strings are a strange thing in .NET. They are reference types and thus are managed by the GC, yet immutable and not subject to re-use due to the way they are stored. As such, virtually all operations that involve transforming a string will in some way result in GC allocations.

One way to minimize allocations caused by string operations is to use System.Text.StringBuilder. Unfortunately, not many classes in .NET that take a string will also take a StringBuilder. Nevertheless, you can use StringBuilder to compose strings from many different parts, and can reuse StringBuilder instances to cut down on allocations.