The concurrent collections, located in the System.Collections.Concurrent
namespace, were introduced in .NET 4 as thread-safe collections that could be used without locking, and there are plenty of posts and articles out on the interwebs giving an overview of the collections and how to use them.
Instead, I’ll be focusing these posts on how the concurrent collections are implemented; how they achieve thread-safety and the principles behind their implementation. Not only because I find that sort of thing quite interesting, but also to give you ideas about how you might use these principles in your own code.
Note however, that writing bulletproof thread-safe collections is hard. Really hard. Think carefully about somehow using one of the existing collections before trying to write or adapt your own.
What is a ‘thread-safe’ collection?
First of all, we need to understand what we mean by ‘thread-safe’. Well, let’s start with the repository of all human knowledge – wikipedia:
A piece of code is thread-safe if it only manipulates shared data structures in a manner that guarantees safe execution by multiple threads at the same time OK, well, as an example, if m_Collection
is some sort of ‘thread-safe’ ICollection
, what will the result of these two threads running concurrently do?
Thread 1 | Thread 2 | ||||
---|---|---|---|---|---|
|
|
The answer depends on exactly which order Thread 1 and Thread 2 run with respect to each other. So, whether m_Item
is in m_Collection
after these two threads have run depends entirely on the whim of the operating system. Thread 2 could try and remove an item that’s not in the collection, setting removed
to false, then Thread 1 adds the item to the collection. Or Thread 1 could run first, adding the item, which is then immediately removed by Thread 2, setting removed
to true. Thread 1 could then merrily carry on executing, assuming m_Item
is in m_Collection
, not knowing it’s been immediately removed by Thread 2.
That, however, is an implementation detail of whoever wrote the code for Thread 1 and Thread 2. The important thing is that, after these two threads have run the above code in some order, m_Collection
will either have the item in it, or not; it won’t be in some corrupted half-state where some internal datastructures think it has the item and some don’t, so that (for example) m_Collection.Contains(m_Item)
returns false but the enumerator still returns the item. So, I propose that this is what is meant by a thread-safe collection:
A thread-safe collection is one that can be modified by multiple threads at the same time without corrupting itself.
Achieving concurrency
There is a very simple way of writing a thread-safe collection – implement ICollection<T>
by wrapping a List<T>
that locks the backing list object on every method call (if you’re unclear about what locks are, I’ll be covering them below). This means that only one thread can access and modify the backing list at once; thread-safety is therefore maintained. However, this isn’t a very good solution; many threads could be blocked by one costly list resize operation. This solution doesn’t have very good concurrency.
Ideally, we want collections that don’t keep threads waiting. As we’ll see, two of the collections (ConcurrentStack and ConcurrentQueue) achieve thread-safety using no locks at all, and the other collections minimise the chances of two threads blocking on the same lock.
Locking and synchronization primitives
You can’t just create a lockless thread-safe datastructure out of nowhere, you need some underlying support from the hardware, operating system and runtime to achieve this. Before we look at the collections themselves, we need to understand these primitives and what guarantees they provide.
- Atomic types
Some types in .NET are specified as being ‘atomic’. That is, if you change the value of a variable of an atomic type, it is guaranteed to appear to change value instantaneously; another thread won’t be able to see the variable ‘half-changed’. Object references and primitive types of 4 bytes or shorter are always atomic (ints, chars, booleans, floats etc), but 8-byte primitives are only atomic when running on a 64-bit process. The atomicity of object references is especially important to lockless collections, as we’ll see in later posts.
- Volatile variables
I discussed volatility in a previous post. To recap, marking a variable as volatile means that:
- The JIT compiler won’t cache the variable in a cpu register; it will always read and write to it using the original memory location. This is important when multiple threads are changing a variable at the same time, so that any changes made to the variable in one thread are immediately picked up by other threads.
- A read or write to a volatile location introduces a memory barrier at that point, so that other reads and writes won’t be reordered with respect to each other. This is important later on, as we’ll see.
- Locks
Mutual-exclusion locks are one of the more basic synchronization primitives. Each object has a lock associated with it, and by locking on an object you only allow one thread to have ‘ownership’ of that lock at a time. This allows only one thread to execute the section of code protected by that lock at a time.
When the currently executing thread ‘releases’ the lock, another thread (and only one other thread) is then allowed to take control of the lock and execute the protected code. Any other threads waiting to take the lock are blocked and cannot continue executing until it is their turn to take the lock.
C# has special syntax for locking on an object:
1234567private readonly object m_LockObj = new object();public void SynchronizedMethod() {lock (m_LockObj) {// protected code}}C# actually compiles this method to something like this:
12345678910111213public void SynchronizedMethod() {bool lockTaken = false;object lockObj = m_LockObj;try {System.Threading.Monitor.Enter(lockObj, ref lockTaken);// protected code}finally {if (lockTaken)System.Threading.Monitor.Exit(lockObj);}}This uses the
System.Threading.Monitor
class to implement the lock. The call toMonitor.Enter
blocks the thread until it can take control of the lock, and the lock is released byMonitor.Exit
. System.Threading.Interlocked
Interlocked
provides various methods for performing operations that wouldn’t normally be atomic in an atomic fashion. For example,Interlocked.Read
allows an 8-bytelong
to be read as an atomic operation (remember, 8-byte primitives are only atomic on 64-bit processes),Interlocked.Add
allows you to performa = a + b
(akaa+=b
) atomically,Interlocked.Decrement
performsa = a - 1
(aka--a
) atomically, etc.The most important of these is
Interlocked.CompareExchange
. This family of methods performs the following operation atomically (using the generic overload as an example):1234567public static T CompareExchange<T>(ref T location, T value, T comparand)where T : class {T originalValue = location;if (location == comparand)location = value;return originalValue;}This might not seem like a particularly useful atomic operation, but it is crucial to the lockless collections, as we’ll see.
System.Threading.SpinWait
Introduced in .NET 4, this structure encapsulates the idea of a ‘spinwait’:
1while (!variableThatShouldBeTrueToContinue) {}This keeps the thread continually spinning round the while loop, repeatedly checking whether another thread has set
variableThatShouldBeTrueToContinue
to true. However, performing such a spinwait gives no guarantees that the thread that is meant to setvariableThatShouldBeTrueToContinue
will be given the chance to execute; the operating system could, if it wishes, simply keep on executing the spinwait and not switch to other threads that actually change this variable.System.Threading.SpinWait
gets round this problem by, as well as simply spinning, occasionally callingThread.Sleep
andThread.Yield
. This will respectively encourage and force the operating system to execute other threads, giving a chance for the spinning condition to be satisfied.
Using the concurrent collections
At this point it’s also worth pointing out that using a thread-safe collection will not make the rest of your code thread-safe and free of race conditions; it is not a panacea. As in the example at the top of this post, using a thread-safe collection does not stop a race condition between adding and removing the same item from the collection. Using a thread-safe collection simply means you have one less thing to worry about when writing threaded code. You still have the rest of your code to worry about!
That’s it for introductions; in the next post, we’ll look at the simplest of the concurrent collections, ConcurrentStack
.
Load comments