Subterranean IL: ThreadLocal revisited

Last year, I looked at the ThreadLocal type as it exists in .NET 4. In .NET 4.5, this type has been completely rewritten. In this post, I’ll be looking at how the new ThreadLocal works in .NET 4.5. I won’t be looking at all the implementation details, but concentrating on how this type works. Again, it’s recommended you have the type open in a decompiler.

No More Generics!

The most obvious change is the lack of generic classes – it no longer uses generic instantiations to store individual thread static variables. Instead, it uses a design similar to that of ConcurrentBag – an array of values held in a thread static array, with each instance of ThreadLocal being assigned its own index into that array, and linked lists between the items in each thread static array to allow access from any thread.

The important variables here are the thread static ts_slotArray, m_idComplement and m_linkedSlot. Each thread has its own ts_slotArray instance, and each instance of ThreadLocal has its own slot index into those arrays as m_idComplement (I’m ignoring the fact that ThreadLocal is generic for now; each generic instantiation of ThreadLocal has its own static variables independant from any other). The list of all values stored in each instance is accessible through the linked list accessible from m_linkedSlot.

However, these extra links between arrays mean that the value to be stored can’t be put straight into ts_slotArray, you need an extra type to provide these links. This is where the LinkedSlot type comes in – it provides a Next and Previous fields to link between slots in different arrays. This graph indicates how these different fields interact – the arrows represent the Next and Previous references between slots:

ThreadLocal45.png

Note that the instance of LinkedSlot directly referenced by the m_linkedSlot field is an empty instance that is not stored in any array; it exists only to be the target of another slot’s Previous field, and simplifies the logic in the other methods.

Setting values

Each instance of ThreadLocal is assigned a unique index by the IdManager class when it is created. When a thread first sets a value in an instance of ThreadLocal, the following happens in SetValueSlow:

  1. If the slot array hasn’t been assigned for this thread (ie this is the first time this thread has accessed any instance of ThreadLocal), it creates a new array to hold enough items for this instance’s slot index.
  2. If the array isn’t big enough for this instance’s slot index, it is resized so it is, and all the containing LinkedSlots are updated to point to the new array (in the GrowTable method).
  3. CreateLinkedSlot is called to create a new LinkedSlot instance and store it in the array at the instance’s slot index. It also adds it to the head of the linked list pointed to by m_linkedSlot in this instance of ThreadLocal.

Subsequently, when values are get & set, it gets and sets the value at the slot index owned by the ThreadLocal being accessed, in the slot array for the accessing thread.

Removing and disposing of ThreadLocals

So that’s what happens when values are set. What about when the thread is no longer running, or the ThreadLocal is disposed? Both require values to be removed or unset in the arrays & untangled in the linked lists, else any values set will just stay there, won’t be collected, and will cause a memory leak.

  1. ThreadLocal.Dispose

    When an instance of ThreadLocal is disposed or finalized, it needs to clear the instances of LinkedSlot in all the referenced slot arrays. Fortunately, this is quite easy to do – it simply iterates through the link list defined by m_linkedSlot, and clears the entries. Finally, it returns the slot index it was using to the IdManager class to be reused when the next instance of ThreadLocal is created.

  2. Thread exit

    Dealing with a thread exit is harder, as there isn’t a global event that fires whenever a thread exits. Fortunately, a little-known feature of thread statics can be used to clear up the slot array belonging to a thread that has exited.

Detecting thread exits

Normal static fields, once the type has been initialized, stay around until the AppDomain exits. That means that any object being referenced by a static field won’t be collected until the field is explicitly cleared.

However, thread static fields are different. The CLR keeps track of which threads are active, and which have exited. It can link this to the various values stored in a thread static field. This means that any value set on a thread static field belonging to a thread that has exited, and that isn’t referenced by anything else, is eligible for garbage collection, and will be collected the next time the garbage collector runs.

This feature is exploited by ThreadLocal to clear up the slot arrays of exited threads. This is primarily performed by the FinalizationHelper class, which is created and assigned to a thread static field when the slot array is first created and assigned.

FinalizationHelper

This class only exists for its finalizer. When a thread exits, the corresponding instance of FinalizationHelper assigned to the ts_finalizationHelper field becomes eligible for collection. If and when the garbage collector runs, this instance gets collected, and the finalizer is run. This finalizer removes any non-empty slots from the linked lists of active ThreadLocal instances, unless the values are needed to be kept if a call to ThreadLocal.Values is made to return all the values ever set on that instance.

Conclusion

So there we are; the upgraded ThreadLocal. It’s an improvement on the old version, in that it allows access to all the values ever set on an instance of ThreadLocal, it doesn’t fallback on the thread local data store, and it doesn’t pollute the namespace with thousands of generic instances of holder classes. Much better!