The .NET garbage collector is, in general, a wonderful thing. It has freed countless programmers from the drudgery both of writing allocation and de-allocation code, and also of tracking down heap corruption and memory leaks (due to bad pointers or de-allocation errors). It allows programmers to simply write code and focus on function without normally needing to care about how memory is actually managed. This tends to make it much easier for junior programmers to write correct (or at least functional) code, thereby reducing training costs and the time necessary to do things like code reviews.
However, for all that the garbage collector is a brilliant piece of software engineering, sometimes collections still occur at inconvenient times. If you need to write high performance, real-time code, in addition to measuring performance using tools like ANTS Performance Profiler, you generally want to try to avoid GC collections during times where execution speed is critical. For example: in an XNA game, you want to avoid collections during gameplay to prevent visual glitches in your animations; in a real time financial analysis and trading application, you want to avoid collections between when you begin analyzing incoming data and when you finish executing an algorithmically-controlled trading decision based on that data. Of course, sometimes GC collections at inconvenient times are unavoidable, and in those situations it is best to instead try to minimize the duration of individual collections (See: http://blogs.msdn.com/b/shawnhar/archive/2007/07/02/twin-paths-to-garbage-collector-nirvana.aspx). This article focuses mainly on avoiding collections, and while there is some overlap between optimizing to minimize GC frequency and optimizing to minimize GC latency, the two goals can also collide. This is certainly the case with object pooling, which can be very helpful with reducing frequency, but is almost guaranteed to increase latency (especially with non-generational GCs).
As you’re no doubt already aware, the GC runs automatically when certain defined conditions are met, and the specific conditions vary depending on the common language runtime in which the program is running. The .NET 4.0 Framework CLR has different conditions than the .NET 3.5 Compact Framework CLR, for instance. Nonetheless, the GCs of every CLR that I am familiar with all run a collection when a certain number of bytes have been allocated since the previous collection, and it’s this triggered collection that we will consider.
Note: Generational GCs are more complicated, as we saw in Clive Tong’s article, The Top 5 .NET Memory Management Misconceptions. The essence of a generational GC is that objects which survive a collection of their current generation are promoted to the next generation (if any). The higher generations are collected less frequently, such that the GC typically runs more quickly (it only needs to examine all the objects which are at or below the generation for which it is running a collection). For more, see: http://msdn.microsoft.com/en-us/library/ee787088.aspx and http://blogs.msdn.com/b/abhinaba/archive/2011/06/08/wp7-mango-mark-sweep-collection-and-how-does-a-generational-gc-help.aspx. Non-generational GCs maintain all objects at the same level such that every collection entails an examination of all live objects.
The .NET Framework 4.0 CLRs that run on Windows OSs have generational GCs, with algorithms that will vary the number of bytes that trigger an automatic collection in order to increase performance. On the other hand, the CLR that runs on the Xbox 360 and on Windows Phone 7.0 both run automatic collections whenever 1 MB has been allocated since the last collection. To be more specific, the Windows Phone Mango CLR has a generational GC that collects Gen0 whenever 1 MB has been allocated to Gen0, and collects Gen1 whenever 5 MB has been promoted to that generation (see: http://blogs.msdn.com/b/abhinaba/archive/2011/06/14/wp7-mango-the-new-generational-gc.aspx).
Understanding a bit more about how .NET’s memory management works can help you with both paths. These tips and techniques should help to both increase your understanding of what your code is doing, and to design your code to run faster with a minimum of extra work.
1. Know the difference between value types and reference types
There are two kinds of data types in .NET: value types and reference types. (See http://msdn.microsoft.com/en-us/library/ms173104.aspx, specifically the material beginning at the “The Common Type System” subheading) The difference between these two types ultimately comes down to how they store their data. An instance of a value type holds its data within itself, whereas an instance of a reference type holds a reference to the memory location of its data. Both value types and reference types have a default value, which is assigned to the instance when it is created in various situations.
Typical situations where the default value is automatically assigned are static variables, member variables of structs when instantiated with the default constructor, member variables of classes when a value is neither assigned in the constructor nor explicitly specified in a field’s declaration.
For value types, the default value is the result of a zeroing of the type’s member variables (e.g. for integers it’s zero, for booleans it’s false, for enumerations it’s zero (normally the first member of the enum), and for structs it’s the default value of each member of struct). For reference types, the default value is null. For those who are curious, the value of null in .NET is a zeroed out reference (the size of a reference depends on the runtime platform’s native memory address size and is thus only known at runtime).
So why does this matter? In truth, the difference is normally not too important, although sometimes you might trip over some peculiarities of value types (e.g. if you have a struct as a property, you cannot directly set any of its public fields or properties since the property’s get method returns a copy of the struct, instead of a reference to the struct). However, when writing to avoid or minimize GC collections, the difference becomes very important.
A value type either exists or does not exist. It cannot be null (Nullable<T> struct simulates nullity by maintaining a bool that specifies whether or not the struct has a value and by overriding the Equals method inherited from System.Object to handle checking for null. So it is never null, it just tests as equal to null in the appropriate circumstances) and exists for as long as the object that contains it does. As such, a value type has, at worst, a negligible effect on garbage collection. In fact, value types often do not exist within the GC’s heap. For example, a value type instance that is created within a method is not managed by the GC and, as such, the GC’s internal allocation amount tracking variable does not increase. On the other hand, if the value type is a struct, and that struct has a reference type as a member, then that reference type (should it be instantiated) is still managed by the GC. However the struct itself, when locally scoped in an ordinary method, is not subject to the GC’s purview (see http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx).
Further, even when a value type is stored on the GC’s heap, e.g. when you create an array of value types, the GC does not need to check to see if those value type instances still exist when it does a collection since a value type cannot be null; the mere fact that the array still exists guarantees that they exist (and when the array is no longer reachable, they are also thereby not reachable, as their value nature precludes a reference to them existing in such circumstances ).
References to value types are possible, but never ones that could survive the object that contained the value type itself. An example would be a method like: public void AddHalf(ref double value) { value += 0.5; } . The ref double parameter is legal (and saves 4 bytes in the method call in a 32-bit program since references are 4 bytes and doubles are 8) but it’s impossible to have an object that contains the referenced double (such as an array) be subject to collection so long as the reference to the double exists.
So when the GC, in the course of doing a collection, comes across an array of one million floats, just needs to check to see if the array itself is still reachable. By contrast, for an array of one million class instances, it needs to check to see if the array itself is reachable, and then it needs to check each and every one of those one million elements to see if they are valid references or are null. The same holds true for generic collections. By using value types where appropriate, you can both help prevent GC allocations and can also make the GC’s collections speed along more quickly.
2. Consider a Struct Instead of a Class.
When creating a type, especially one that you are going to use in an array or a generic collection, consider making it a struct. Since structs are value types, they cannot be null such that the GC does not manage them (though it does manage any reference types that are members of the struct). When a GC collection does run, it only checks reference types to see if they are still reachable. This is only a small benefit for individual struct instances, but the benefit grows quickly with arrays, and can help to significantly reduce collection times when an array is composed of struct instances rather than class instances.
In my tests, I’ve come across one possible exception. Where you have a struct that contains a reference type as a member and a class with the same members, if most class instances in an array are null then the .NET Framework 4.0 x86 CLR’s GC will be faster on the array of classes. If there are no reference types within the definition or if even a small number (less than one-tenth) of the class instances are non-null, then the struct array retains its advantage.
With an array of class instances, the GC has to check each item in that array to see if it is a live object or not (the same is true for generic collections, which use an internal array). With an array of structs, the GC just looks to see if the array itself is still a live object, since structs cannot be null (this is true even for Nullable<T> structs, which just use an internal tracking mechanism to determine nullity). So that is potentially thousands or even millions of items that the GC does not need to examine when a collection runs!
There are several downsides to structs, though, so don’t get too carried away. They cannot inherit from classes or even from other structs, so while they can implement interfaces, the lack of inheritance can lead to a fair bit of code duplication and can increase your development time as a result. They also, as the term ‘value type’ implies, are copied by value rather than passed by reference when used as method parameters or assigned to / retrieved from arrays, collections, and properties. As such, when they have a large number of members, the cost of copying begins to add up. Lastly, they cannot have destructors (also known as finalizers) such that they are a poor choice for items that need to implement IDisposable. Nonetheless, when you have a data structure which will have many thousands of instances created of it, or of which it would be impractical to create an object pool, a struct can often be a big benefit in terms of GC performance and minimizing automatic GC collections.
The XNA Framework serves as a great example of appropriately using structs in preference to classes. To the extent that it was possible, the types which are commonly created and used in gameplay code, such as Vector2, GamepadState, and Color, have been implemented as structs made up of other value types. Types that aren’t commonly created during performance critical code or which need to be classes to implement IDisposable properly, such as BoundingFrustum and Texture2D, are implemented as classes.
3. Design classes so that instances are reusable and then use object pools where appropriate.
Regardless of the platform, most managed objects, excluding arrays and generic collections, tend to be small. Because of this, keeping instances of objects around when you are done with them and reusing them rather than creating new instances every time can be a useful trick. Doing so prevents the GC’s internal allocation counter from increasing, which helps prevent an automatic GC collection from running. Write objects you are likely to use many times so that each overload of its constructor has an equivalent public method that you can call to reinitialize that object. Then use a generic collection type such as a List<T> to pool object instances.
My own coding tends to be on platforms that use the .NET Compact Framework. The Xbox 360 and Windows Phone 7 both use versions of the .NET CF, and both automatically run a collection every time that 1 MB of GC-managed memory has been allocated since the previous collection. On the .NET Framework, the GC is more flexible on when it runs a collection, and also has separate heaps for large objects (the “LOH”) and for small objects, along with a generational GC which introduces efficiencies that make GC collections less intrusive. Object pools can also be very expensive if you are unable to eliminate GC collections during time critical code, since you are keeping around more objects that the GC will need to examine (this expense is especially a factor on platforms without a generational GC).
The following is an example of an object pool.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
/// <summary> /// A basic generic object pool. Objects are removed from the pool when request /// and re-added when returned. /// </summary> /// <typeparam id="T""> /// The type of object to pool. This must be a class and have a parameterless /// constructor. /// </typeparam> public class ObjectPool<T> where T : class, new() { /// <summary> /// The pool itself. /// </summary> private System.Collections.Generic.List<T> pool; /// <summary> /// A lock for synchronizing access to the pool. /// </summary> private readonly System.Object _poolLock; /// <summary> /// Returns the count of items currently in the pool. /// </summary> public int Count { get { lock (_poolLock) { return pool.Count; } } } /// <summary> /// Creates the object pool with <paramref id="initialCount""/> number /// of instances created and added to the pool. The pool will have an /// initial capacity of <paramref id="initialCapacity""/>. /// </summary> /// <param id="initialCapacity""> /// How large the initial capacity of the pool should be. Set this high /// enough that you won't exceed it, but not so high as to needlessly /// waste memory. /// </param> /// <param id="initialCount""> /// How many object instances to create initially. /// </param> public ObjectPool(int initialCapacity, int initialCount) { _poolLock = new System.Object(); pool = new System.Collections.Generic.List<T>(initialCapacity); for (int i = 0; i < initialCount; i++) { pool.Add(new T()); } } /// <summary> /// Gets an instance of <typeparamref id="T""/> from the pool. If the pool is /// empty, a new instance of <typeparamref id="T""/> is created and returned /// to you instead. /// </summary> /// <returns>An unused instance of <typeparamref id="T""/>.</returns> public T GetObject() { lock (_poolLock) { if (pool.Count == 0) { System.Diagnostics.Debug.WriteLine("Had to create a new item."); return new T(); } // Remove from the end of the pool to prevent an internal rearrangement. var item = pool[pool.Count - 1]; pool.RemoveAt(pool.Count - 1); return item; } } /// <summary> /// Return (or add) an existing instance of <typeparamref id="T""/> to the pool. /// </summary> /// <param id="item""> /// The instance of <typeparamref id="T""/> to return to the pool. /// </param> /// <exception cref="System.ArgumentNullException"> /// Thrown when <paramref id="item""/> is null. /// </exception> /// <exception cref="System.InvalidOperationException"> /// Thrown when <paramref id="item""/> is already in the object pool. /// </exception> /// <remarks> /// Do not return an object that you are still using. This will likely lead /// to the same object being "checked out" twice which will cause bugs. Also if /// <typeparamref id="T""/> implements <see cref="System.IDisposable"/>, do not /// return an object that has been disposed. /// </remarks> public void ReturnObject(T item) { if (item == null) { throw new System.ArgumentNullException("button"); } lock (_poolLock) { if (!pool.Contains(item)) { pool.Add(item); } else { throw new System.InvalidOperationException("The pool already contains this item."); } } } /// <summary> /// If <typeparamref id="T""/> implements <see cref="System.IDisposable"/>, this /// will call Dispose on all objects in the pool. /// </summary> /// <remarks> /// If <typeparamref id="T""/> does not implement /// <see cref="System.IDisposable"/>, this will do nothing. You should /// only call this method when you are completely finished with the pool as /// as you will otherwise get disposed objects back when calling /// <see cref="GetObject"/>. /// </remarks> public void DisposeAllObjects() { if (typeof(System.IDisposable).IsAssignableFrom(typeof(T))) { foreach (var item in pool) { (item as System.IDisposable).Dispose(); } } } } |
4. Use collection constructor overloads that take an initial capacity for collection types that have them.
Most of the generic collections have constructor overloads that take an ‘int initialCapacity’ parameter. Internally, List<T>, Dictionary<TKey, TValue>, and the other generic collections use one or more arrays along with a tracking variable to identify how many items of internal array hold valid data. Whenever adding an item to the collection would exceed the capacity of the internal array, a new array that is double the previous capacity is created. The existing array’s elements are then copied to the new array, and the new array then replaces the old internal array. All of this generates allocations that increase the GC’s internal allocation counter.
When you first create a collection using its parameter-less constructor with the .NET Framework 4.0 CLR, the internal array’s length is 0. The first addition causes it to resize to hold 4 elements. These are all implementation details, including the fact that there is an internal array at all. These internal details could change in the future so you should not rely on these internal implementation details. The advice itself (use initial capacity constructors) is part of the public interface, though, so using it is safe.
If you know approximately how many elements will ever be in that collection at one time, you can use one of the constructor overloads that tells the collection to create its internal array(s) with a specific initial size, rather than its default size. If you know the exact number of elements your collection will hold, use that. If not, use an estimation with a small bit of extra capacity, just in case. If you don’t have any idea, consider using System.Diagnostic.Debug.WriteLine to write out the name, current Count, and current Capacity of the collection whenever you have just added an item to it. Then run your program as normal and make note of the debugging output. This will give you a good estimation of each collection’s capacity needs, which you can then use to decide on an initial capacity.
Assuming you set your initial capacity high enough, that internal array will never need to be resized, meaning you can avoid generating new allocations. However, setting the initial capacity to be far in excess of your needs will just cause wasted memory (especially with collections of value types), and will cause GC collections to take longer when they do occur.
5. Be aware of common operations that generate allocations.
There are a few common operations in .NET that generate GC allocations in a way that might not be obvious. The two prime targets are the boxing of value types and most string operations. We’ll look at each in turn.
Boxing of value types refers to the act of casting a value type (such as an integer or a float) to a reference type (such as System.Object or an interface). One place this is commonly found is in string operation methods that take a ‘params object[]‘ argument such as String.Format and Console.WriteLine. Since every System.Object has a ToString method, you can cast anything (including value types) to Object, and then pass them to a method that uses ToString to get that object’s string representation. Boxing also occurs when explicitly casting a value type to an interface. While interfaces cannot be instantiated, the CLR treats them as being reference types whenever you create an instance of an interface (through casting an existing data item that implements that interface).
There is also a corresponding ‘unbox’ CIL instruction for casting a boxed value type from Object back to its value type. This is commonly found when a method takes in a parameter as an Object instance and then casts it to its appropriate value type. Unbox does not generate GC allocations.
For value types, a reference type container (a ‘box’) needs to be created to hold the underlying value type when it is cast to Object or to an interface . If you use a disassembler tool such as ildasm.exe or .NET Reflector® , you can look for the Common Intermediate Language (“CIL”) ‘box’ instruction to find instances where you are boxing value types. It’s not always possible to eliminate boxing, and it may not be worthwhile to do so even when it is possible, but simply by knowing what it is and how to spot it, you are more aware of the memory implications of your code.
Strings are a strange thing in .NET. They are reference types and thus are managed by the GC. However they are immutable and are not subject to re-use due to the way they are stored (see http://blogs.msdn.com/b/ericlippert/archive/2011/07/19/strings-immutability-and-persistence.aspx). As such, virtually all operations that involve transforming a string will in some way result in GC allocations. String concatenation creates a new string, for instance, as does trimming whitespace with Trim and its cousins. Most other string operations (e.g. almost any one that returns a string) will generate allocations. Transforming objects into their string equivalents by using the ToString method also normally generates GC allocations. They’re hard to avoid.
One way to minimize allocations caused by string operations is to use System.Text.StringBuilder. StringBuilder is to the String class as List<T> is to an array. By this I mean that it uses an internal array that it automatically resizes itself when needed (using the same “create an array that is twice as big, copy over the data, and replace the existing array with the new array” mechanism). Unfortunately, not many classes in .NET that take a string will also take a StringBuilder. Nevertheless, you can use StringBuilder to compose strings from many different parts, and can reuse StringBuilder instances to cut down on allocations. It’s not often a large gain, but it can help save some allocations in code that performs a lot of string operations.
Wrap Up
Automatic memory management is by no means a new invention (John McCarthy, the inventor of LISP, first came up with the concept in 1959 ). Nonetheless, it’s a wonderful thing for programmers, and is something that will likely find its way into many more venues as computing power continues to grow. Knowing a bit about how it works is something I think every .NET programmer could benefit from. I hope this has helped shed a bit of light on it for you.
Bonus Tip – 6. Force a GC collection at convenient times.
Using the GC class, you can force a collection by calling its static Collect method. In ordinary programming, it’s usually best to let the GC handle such matters itself. However, if you have a situation where a GC collection would be bad, e.g. during the middle of a level in a game, it usually makes sense to force a collection right after you have finished loading that level’s content (right before going into gameplay). Doing so will reset the GC’s internal allocation counter, and delay the next collection. If you have managed to limit, but not eliminate, garbage generation during gameplay, then you might find that your time-sensitive operation will complete before you pass the magic allocation mark at which an automatic collection occurs. So, if you can identify good places within your code to force a GC collection, consider doing it and measuring the performance results. A forced collection might be just the thing you need to stave off an unwanted automatic collection.
Load comments