02 May 2012

20,280 views

17 0

.NET Memory Management and Finalization

In this excerpt from his new book, Practical Performance Profiling: Improving the Efficiency of .NET Code, Jean-Phillipe Gouigoux discusses the Dispose mechanism and the finalization process in the context of .NET Garbage Collection

The System.GC class provides access to several methods related to memory management. Unless you have a great deal of experience in .NET memory management and know precisely what you are doing, it is strongly recommended that you simply forget about its very existence.

There is a lot of temptation for beginner developers to explicitly call the Collect method at the end of a process which they know is memory-intensive. Of all the cases that the author is aware of since the early versions of .NET, none has ever shown that calling the GC mechanism explicitly could give better results in memory consumption or performance than simply letting the GC do its job.

Of course, in use cases where the usage does not vary at all, explicit garbage collection can allow control of memory use in one’s application, but this comes with a high price tag:

Sooner or later, the use of the application will change, and the memory consumption will vary.
Explicit GC calls at regular intervals have the natural consequence of increasing the number of garbage collections. In the end, this increases processing time.
As a general rule, there is no need to limit memory consumption of an application as long as the system keeps enough memory available. In the case of a system with a high memory pressure, the GC will adapt by running passes more frequently than on a system well equipped with RAM.

In short, it is essential to let the GC do its job, and not try to improve memory use by directing the collection process in any way.

We can have much more impact by helping the GC to recycle memory efficiently, by taking care of resources as explained below.

The question of high memory use
A question is often asked by beginner developers: is it normal that such and such a .NET process uses so much memory? The feeling that the CLR does not release enough of the memory used is very common, but one must understand that, as long as memory is available, it is normal for .NET to use it. And why would it do any differently? As long as the OS can provide it with memory, there is no sense in .NET limiting its use of it: running the GC more often would take time and thus slow down the application. The only important point to check is that the same process can also run in a smaller memory set.

Releasing external resources when the GC is fired

Firstly, let us make it clear that we are only talking about external resources here, such as connections to a database, memory spaces not controlled by managed code (typically in COM interoperability), or any other resources that are not controlled directly by the CLR. Indeed, contrary to what happens with C++, an object in .NET does not need to worry about whether the managed objects it references should be recycled or not. The CLR will check whether this object is referenced by any others, and if not, it will free them as well.

By contrast, in the examples below, it is important to release resources correctly. In the case of a database connection, this means calling the closing method on the corresponding API. But in most cases, as below, this operation is explicit, and there is no need to wait for the end of the object’s life to close the connection.

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

class Program

{

static void Main(string[] args)

{

SqlConnection Connection = new SqlConnection();

SqlCommand Command = new SqlCommand("SELECT * FROM TEST", Connection);

IDataReader Reader = null;

try

{

Connection.Open();

Reader = Commande.ExecuteReader(CommandBehavior.CloseConnection);

if (Reader.Read())

Console.WriteLine(Reader.GetString(0));

}

finally

{

if (Reader != null && ! Reader.IsClosed)

Reader.Close();

}

Listing 1

In the example above, the CommandBehavior.CloseConnection parameter used in the ExecuteReader method guarantees that the connection closing operation will be called automatically upon closure of the associated reader.

By contrast, we can imagine a .NET object for which we would need to initialize a connection during construction, and to close the connection only when the object is at the end of its life. To do so, there exists a way of informing the CLR that it should execute an action when freeing an object. Typically, this works like this:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

public class Tracer

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

Connection.Open();

}

~Tracer()

{

Connection.Close();

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connection);

Commande.ExecuteNonQuery();

}

Listing 2

Obviously, this example is over-simplified: keeping the connection open throughout the life of this object would only make sense if it was destined to be called extremely frequently on the Log() method. In the more plausible case of the method being called irregularly, it would definitely be better to open the connection and close it at the end of the function call.

This would remove the need to deal with closing the connection upon disposing of the instance, and would also free database connections for other uses, making the code more capable of handling high loads. But this is not the end of the matter, and one should remember that performance handling is often about choosing where to strike the balance between two extremes. In this example, one could argue that opening and closing the connection at each call takes processing time and slows the process down. In particular, opening a database connection is a heavy operation, which involves starting a new thread, calculating authorization levels, and several other complex operations.

So, how does one choose? Quite simply, by knowing the mechanisms used in database connection management. In practice, SQL Server will pool the connections, bringing better performance even if they are opened and closed frequently. When the Close instruction is called on an ADO.NET connection, the underlying object that deals with the actual database connection is in fact not abandoned, but only deactivated, and marked as available for another user. If the object is then taken from the pool, the opening of a connection is much less complex, since the object exists and the code only has to reactivate it for another use, usually only having to re-authorize it.

In short, since we have no need to deal with the object finalizer, we can write:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

public class Tracer

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connection);

try

{

Connection.Open();

Command.ExecuteNonQuery();

}

finally

{

Connection.Close();

}

Listing 3

Early release of resources

The method described above (releasing a resource upon object recycling) still has a major drawback: if the resource is precious, it is a waste to wait minutes or even hours for the GC to release it.

This is the reason behind yet another .NET mechanism: the IDisposable interface. Implementing this interface forces a class to have a Dispose() method, allowing the class instances to release resources as soon as the developer calls the method, whether it be explicitly or through the using keyword. Let us take an example:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

public class Tracer : IDisposable

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

Connection.Open();

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connexion);

Command.ExecuteNonQuery();

}

#region IDisposable Members

public void Dispose()

{

Connection.Close();

}

#endregion

}

Listing 4

The user of such an object would work with a code that calls the method like this:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

class Program

{

static void Main(string[] args)

{

using (Tracer Logger = new Tracer())

{

Logger.Log();

}

Listing 5

For readers that are not used to the using keyword, the code above is exactly equivalent to this:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

class Program

{

static void Main(string[] args)

{

Tracer Logger = new Tracer();

try

{

Logger.Log();

}

finally

{

Logger.Dispose();

}

Listing 6

By the use of Dispose the caller guarantees that the resources will be released as soon as possible.

Combining both operations

At this point in the evolution of our example code, something is still missing: what happens if the caller does not use the Dispose mechanism, by forgetting to include the using keyword or to call the equivalent method? Resources will not be released, even when the GC recycles the object, and there will be a resource leak.

It is thus necessary to apply both of the mechanisms we have described above, in a combined way:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

public class Tracer : IDisposable

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

Connection.Open();

}

~Tracer()

{

Connection.Close();

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connexion);

Command.ExecuteNonQuery();

}

#region IDisposable Members

public void Dispose()

{

Connection.Close();

}

#endregion

}

Listing 7

This way, the Dispose mechanism can be called explicitly to release the associated resource as soon as possible, but if for some reason this is overlooked, the GC will eventually call the finalizer. This will be done later, but it is still better than never.

Nonetheless, a seasoned developer will notice the code duplication: the finalizer and the Dispose function use the same code, which is contrary to a well-known best practice. As a result, we should combine the resource freeing code, like this:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeinsResources

{

public class Tracer : IDisposable

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

Connection.Open();

}

~Tracer()

{

FreeResources();

}

private void FreeResources()

{

Connection.Close();

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connexion);

Command.ExecuteNonQuery();

}

#region IDisposable Members

public void Dispose()

{

FreeResources();

}

#endregion

}

Listing 8

We are getting there, but there are still a few potential problems we have to deal with:

If Dispose is called explicitly, there is no use for the finalizer anymore, because we know it will not do anything: the resource has already been freed.
We should make sure that calling the method to free resources several times will not cause any problems.
We should take into account the fact that, when Dispose is called, the Dispose method for other managed resources should be called as well. Generally, the CLR takes care of this by using the finalizer, but in this case, we have to do it ourselves.

The final code is:

using System;

using System.Data;

using System.Data.SqlClient;

namespace FreeingResources

{

public class Tracer : IDisposable

{

SqlConnection Connection = null;

public Tracer()

{

Connection = new SqlConnection();

Connection.Open();

}

~Tracer()

{

FreeResources(false);

}

private bool Disposed = false;

private void FreeResources(bool Disposing)

{

// If the object has already released its resources,

// there is no need to continue

if (Disposed) return;

if (Disposing)

{

// This is where the Dispose would be called if there

// were managed resources in this class

}

Connection.Close();

// To avoid coming back to this code several times

Disposed = true;

}

public void Log()

{

SqlCommand Command = new SqlCommand("UPDATE TABLE SET nb = nb + 1",

Connexion);

Command.ExecuteNonQuery();

}

#region IDisposable Members

public void Dispose()

{

FreeResources(true);

// The following lines tell the GC that there is no use

// in calling the finalizer, when it recycles the current object

GC.SuppressFinalize(this);

}

#endregion

}

Listing 9

This code structure is known as the “Dispose” pattern, and is quite a standard form. Despite all the effort we have put into it, it is still not 100% complete. If we want to take care of all the possible situations, we should add one more safety feature: once Dispose has been called, the object cannot have its Log method called. A traditional modification is to set the connection to null, and then check its value in Log or any method that could use it.

Further details can be found by searching for “Dispose” and “Pattern” on the internet. There are numerous discussions on side-effects and how to avoid them, memory performance of each variant of the pattern, etc. The goal of this article is not to provide the reader with a state-of-the-art summary of these discussions, but to show the link between this pattern and the performance of an application. If it is not correctly implemented, there are risks of massively reducing the access to unmanaged resources.

A last note

It is essential to stress that the memory use of a process has absolutely nothing to do with the fact it cannot release it. This is a common misunderstanding of .NET memory management. As long as the OS does not restrict the CLR in its memory consumption, .NET has no reason whatsoever to run the GC at the risk of generating a drop in performance in the application.

It is perfectly normal for an application to grow in memory up until it reaches hundreds of megabytes. Even if one pass of the GC could make this drop to ten megabytes, as long as no other process needs memory, the CLR should not sacrifice even a small percentage of its time to freeing this memory. This is the origin of the reputation of .NET and Java as “memory hogs”. In fact, they are only using available resources as much as possible, while still maintaining a process to release them as much and as quickly as possible should the operating system ask for them.

Application In real life

A developer in my team created an application that processed XML in bulk. Each file was a few hundred kilobytes at most, and the corresponding instance of XmlDocument around one megabyte. The developer, who was watching memory consumption out of curiosity, was alarmed by the fact that is was growing consistently, for each file processed, and asked me whether he should cancel the process before reaching an OutOfMemoryException. After growing to 700 megabytes or so, it suddenly dropped to around 100 megabytes, and this cycle repeated itself like clockwork until the end of the application. This case is a good example of how .NET works: on this machine, that had 2 gigabytes RAM and almost no other active applications, it would have been counter-productive to have more GC activity, since the whole process would have taken a few more minutes, whereas reducing peak memory use would have made no difference at all. It is also revealing about the difficulty of grasping the GC mechanism for a developer that has not had it explained, which can cause performance issues, as explained above.

.NET Memory Management and Finalization

Releasing external resources when the GC is fired

Early release of resources

Combining both operations

A last note

Application In real life

Subscribe for more articles

Rate this article

Jean-Philippe Gouigoux

Related articles

Inline PDF Viewer in an Angular App? Now you can

The Zen of Code Reviews: Review As If You Own the Code

Under the Hood of .NET Memory Management

Tags