The first concurrent collection we’ll look at is ConcurrentStack
. This is conceptually the same as System.Collections.Generic.Stack
, but is geared towards occasional concurrent modifications.
Now, in these posts I won’t be looking to explain what every method does; just like my other explorations of various collection types, I’ll be concentrating on the core implementation and concepts used by the collections, and glossing over the more sordid implementation details and scaffolding required to support an ICollection
.
So, without further ado, let’s get started:
ConcurrentStack
There are three basic operations you can perform on a stack:
Push
: Push a new value onto the top of the stackTryPeek
: Peeking at the top value. InConcurrentStack
, this is quite a simple method.TryPop
: Popping the top value off the stack
ConcurrentStack supplements these with PushRange
and TryPopRange
for pushing and popping an array of values, ToArray
, and the standard ICollection
methods.
The stack is implemented as a singly-linked list, with nodes represented by the ConcurrentStack<T>.Node
private class:
1 2 3 4 |
private class Node { internal T m_Value; internal Node m_Next; } |
The top of the stack is referenced by the volatile m_head
variable. An empty stack is represented by a null value in m_head
. For example, if you push the integers 1, 2 and 3 onto the stack in that order, the nodes will look like this:
If you then pop a single value off the stack, m_head
will point at the ‘2’ node.
Pushing values
So, there are three operations required to push a new value onto the stack:
- Create a new
Node
object to hold the new value - Setting the node’s
m_next
variable to point to the current head node - Changing
m_head
to point to the newly-created node.
And, straight away, we’ve got the possibility of a race condition. Say Thread 1 is pushing the value ‘4’ onto the stack. It’s just executed step 2:
Now, Thread 2 is scheduled on the CPU and is pushing the value ‘5’. Thread 2 executes steps 1, 2 and 3:
Thread 1 then executes step 3:
The node successfully pushed by Thread 2 has disappeared from the stack.
So, what we want to do is perform step 3 only if the head hasn’t changed in the meantime. If it has been changed, then go back to step 2 to use the updated head node as the value of m_next
. Furthermore, this all has to be done atomically so other threads can’t see the intermediate state.
This is where Interlocked.CompareExchange
comes into play. If you recall my previous post, CompareExchange
performs the following as an atomic operation:
1 2 3 4 5 6 7 |
public static T CompareExchange<T>(ref T location, T value, T comparand) where T : class { T origValue = location; if (location == comparand) location = value; return origValue; } |
Or, in words:
I think location
has the same value as comparand
. If it is, replace it with value
. Return me what location
actually is.
This allows us to do the ‘check and assign a new head, else try again’ operation atomically, like so:
1 2 3 4 5 6 7 |
public void Push(T item) { Node node = new Node(item); do { node.m_next = m_head; } while (Interlocked.CompareExchange(ref m_head, node, node.m_next) != node.m_next); } |
Or, as words:
- Create a new node
- Set the node’s next field to the current head
- atomically{I think the current head is the value of the node’s next field. If it is, replace it with the new node}. If the head was’t actually the same as the next field, go back to 2.
This has eliminated the race condition we had between steps 2 and 3. Only if the head hasn’t been changed is the new node pushed onto the stack as the new head. This loop repeats until CompareExchange
successfully sets the head to the new node.
This can easily be generalised to a range of values in PushRange
– we can simply change step 1 to construct the list of nodes holding the values being pushed, then using the m_next
field of the tail node, and the top node as the new head, like so:
So that’s pushing values onto the stack. What about removing them?
Popping values
As you can expect, popping items from the stack can be done in a similar way to pushing them:
1 2 3 4 5 |
Node topNode; do { topNode = m_head; } while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode); |
However, it’s not quite so simple as that. For one thing, the stack may be empty, so m_head
might be null. For another, you need to be able to pop several items off the stack as a single atomic operation, which requires navigating several m_next
references at once. We need to be slightly more clever about it.
Let’s cover checking for an empty stack first. Checking for null is easy, right? if (m_head == null) return false;
So we put this at the start of the pop method:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
public bool TryPop(out T item) { if (m_head == null) { item = default(T); return false; } Node topNode; do { topNode = m_head; } while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode); item = topNode.m_value; return true; } |
However, again, we’ve got a race condition. The null check only happens once. If another thread were to empty the stack while the current thread is inside the while loop, this leaves us with a NullReferenceException
lying in wait when we try and get m_head.m_next
.
The simple way to fix this is to do the null check inside the loop, eliminating the race condition:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
public bool TryPop(out T item) { Node topNode; do { topNode = m_head; if (topNode == null) { item = default(T); return false; } } while (Interlocked.CompareExchange(ref m_head, m_head.m_next, topNode) != topNode); item = topNode.m_value; return true; } |
Sorted. So, what about popping multiple values off the stack? Again, like the null check, this has to be performed inside the while loop using the ‘snapshot’ of the head we’ve taken in topNode
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
public int TryPopRange(T[] items, int count) { int i; Node topNode; do { topNode = m_head; Node curNode = topNode; i=0; while (i<count && curNode != null) { items[i] = curNode.m_value; curNode = curNode.m_next; i++; } } while (Interlocked.CompareExchange(ref m_head, curNode, topNode) != topNode); return i; } |
In fact, this idea of taking the current head, doing some work on it, then atomically swapping it if it hasn’t been changed in the meantime can be generalised:
- Take an atomic snapshot of the current state
- Do some work on the snapshot locally to the current thread
- Atomically make the work visible to all other threads, if the snapshot you took is still a valid representation of the state
- If it isn’t, go back to the start and try again
This is, at its core, how ConcurrentStack
works.
Performing atomic operations
So, what underlying aspects of the system let us perform these operations atomically?
- Object references are atomic
This lets us take an atomic snapshot of the current state simply by storing the value ofm_head
in a local variable;m_head
will always be pointing at a valid node (or null), it won’t be ‘half-changed’. Interlocked.CompareExchange
is atomic
This ensures the state-check-and-switch won’t be afflicted by race conditions with other threads.m_head
is a volatile variablem_head
is marked asvolatile
. This means that any changes to that variable by one thread are instantly visible to all other threads on the system; there’s no caching to get in the way. Any threads that are attempting to modifym_head
will be forced to try again, and they won’t overwrite the changes already made.
Gory implementation details
Now, I said at the start that I wouldn’t be covering gory implementation details of these collections. However, in this case, there are some important considerations.
If you decompile ConcurrentStack
, you’ll notice there’s a Push
method and a PushCore
method (similarly for TryPop
and TryPopRange
). This is a small but significant performance detail; ConcurrentStack
assumes there aren’t going to be any thread collisions. The public methods all optimistically attempt to perform the operation once. Only if another thread gets in the way do the corresponding Core methods get executed. It is these that contain the full loop logic, along with calls to SpinWait.SpinOnce
to try and get the other threads out of the way before they try again.
In fact, TryPopCore
goes one step further, and implements a randomised exponential backoff to reduce the chances of several threads all waiting the same amount of time. It does mean that, although ConcurrentStack
is lockless, and so eliminates the risk of deadlock, the performance will suffer if there are many threads all modifying the stack at the same time, as each method will have to try several times before its operation succeeds.
That’s the essence of how ConcurrentStack
is implemented. Next time, I’ll take a look at the second lockless concurrent collection, ConcurrentQueue
, which goes about achieving thread-safety in quite a different way.
Footnote
I’m now on twitter! If you want to get in touch, comment on the series, or suggest other things I could look at, my username is @simonmcooper
Load comments