{"id":3403,"date":"2011-09-16T18:51:00","date_gmt":"2011-09-16T18:51:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/the-net-dictionary\/"},"modified":"2016-07-28T10:50:34","modified_gmt":"2016-07-28T10:50:34","slug":"the-net-dictionary","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/the-net-dictionary\/","title":{"rendered":"The .NET Dictionary"},"content":{"rendered":"<p>To many people, <code>System.Collections.Generic.Dictionary&lt;TKey,TValue&gt;<\/code> is just a useful collection. In this post, I&#8217;ll be looking inside that collection and see how it really works.<\/p>\n<p><code>Dictionary<\/code> is based on a hashtable; for the rest of this post, I&#8217;ll assume you know roughly how a hashtable works. The <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hashtable\">Wikipedia article<\/a>, as the source of all knowledge algorithmical, provides a good overview. It will also help if you&#8217;ve got the class open in Reflector so you can see what&#8217;s going on yourself.<\/p>\n<h4>The basics<\/h4>\n<p>The core of the <code>Dictionary<\/code> type is a standard hashtable, straight out of an algorithms book. However, it&#8217;s interesting design comes from how it deals with hash collisions, using a variant of <em>chaining<\/em>.<\/p>\n<p>When you add an item to a hash table, the hash code of that item determines which slot in a backing array the item is stored. In the perfect case, every item would have a separate hash code, and so every item would have a separate slot in the array. To find an item, you simply hash it and go to that slot in the backing array.<\/p>\n<p>However, two or more items can hash to the same hash code, and you get a hash collision. When this happens, and you&#8217;re using chaining, the extra item is stored in a linked list hanging off the array slot. How does the .NET Dictionary handle this?<\/p>\n<h4>Storing entries<\/h4>\n<p>To store its data, the Dictionary uses an array of <code>Entry<\/code> structs: <\/p>\n<pre>private struct Entry {\n    public TKey key;\n    public TValue value;\n    public int hashCode;\n    public int next;\n}<\/pre>\n<p><code>key<\/code> and <code>value<\/code> should be self-explanatory. <code>hashCode<\/code> provides a fast lookup to the original hash code for the functions that need it, and <code>next<\/code> is used when dealing with hash collisions.<\/p>\n<p>So, the backing array is a <code>Entry[]<\/code> called <code>entries<\/code>. However, where the entries go in this array <em>isn&#8217;t<\/em> determined by the hash code. There&#8217;s actually a separate array of ints called <code>buckets<\/code> alongside <code>entries<\/code>. How is this used?<\/p>\n<h4>Hashing items<\/h4>\n<p>When adding an item, the hash code is generated modulo the current array size, and that determines the slot the item is stored in. However, that slot is not the one in <code>entries<\/code>, it is actually the one in <code>buckets<\/code>. The value in <code>buckets<\/code> at the hashed index is then the index of the slot in <code>entries<\/code> that the data is actually stored at, and that is simply assigned to the next free slot in the array.<\/p>\n<p>As an example, say you&#8217;ve got a Dictionary with a capacity of 5:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary1.png\" alt=\"Dictionary1.png\" \/><\/p>\n<p>You add an item to it, that has a hash code of 3, modulo the size of the dictionary. The actual item goes in the first available slot in <code>entries<\/code>, and the slot at index 3 in <code>buckets<\/code> then stores the index at which the item is stored in <code>entries<\/code>; in this case, 0:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary2.png\" alt=\"Dictionary2.png\" \/><\/p>\n<p>You then add two more items to it, with hash codes 0 and 2, respectively:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary3.png\" alt=\"Dictionary3.png\" \/><\/p>\n<p>Nice and simple. But, you then add a fourth item that also hashes to 3. Now, we&#8217;ve already added an item with a hash code of 3; we&#8217;ve got a hash collision, so the Dictionary will have to chain it. However, it doesn&#8217;t create a separate linked list, it stores the index of the next item in the chain in the <code>next<\/code> value of the <code>Entry<\/code> struct like so:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary4.png\" alt=\"Dictionary4.png\" \/><\/p>\n<p>If we were to add another item, also with a hash code of 3, the chain would be extended:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary5.png\" alt=\"Dictionary5.png\" \/><\/p>\n<p>This means, that to search for an item with a particular hash code, you go to the <code>Entry<\/code> pointed to by the <code>buckets<\/code> value at that hash code, and follow the <code>next<\/code> pointers until you find the item you need, or the null pointer (signified by a <code>next<\/code> of -1; see the <code>FindEntry<\/code> method).<\/p>\n<h4>Removing items<\/h4>\n<p>So that&#8217;s adding items. What about removing items?<\/p>\n<p>Any item in the dictionary can be removed at any time, which is different to adding items, where they simply get put on the end of the <code>entries<\/code> array. If you&#8217;ve got lots of additions and removals, this can lead to a lot of wasted space as items in the middle of the array get removed, and new items get added to the end. Well, this is where a variable called <code>freeList<\/code> comes into play.<\/p>\n<p>When an item is removed, the slot it occupied gets added to the <em>freelist chain<\/em>. This is a chain of entries, indexed by a single <code>freeList<\/code> variable, which marks array slots where additional items can go. This is chain is also linked by the <code>next<\/code> pointer in the <code>Entry<\/code> structure.<\/p>\n<p>So, this is what it would look like if you removed the blue item:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary6.png\" alt=\"Dictionary6.png\" \/><\/p>\n<p>then the orange item:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary7.png\" alt=\"Dictionary7.png\" \/><\/p>\n<p>Then, if another item is added, instead of adding to the end of the array (and causing a full rehash of the contents), it can simply use the slot pointed to by the <code>freeList<\/code> variable:<\/p>\n<p> <img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/Dictionary8.png\" alt=\"Dictionary8.png\" \/><\/p>\n<p>Using this indirect mechanism means that the internal arrays don&#8217;t need to be resized, and everything re-hashed, until you actually have more items than you can store in the <code>entries<\/code> array. It also doesn&#8217;t cause much performance degredation when the dictionary gets full, which the non-generic <code>System.Collections.Hashtable<\/code> had problems with.<\/p>\n<h4>Some more random observations&#8230;<\/h4>\n<ol>\n<li><code>Clear<\/code> doesn&#8217;t reset the size of the arrays; if you clear a dictionary with 100,000 items in it, the backing arrays will still be sized to store 100,000 items, potentially wasting a lot of space.<\/li>\n<li>\n<p>The dictionary enumerator simply iterates down the <code>entries<\/code> array. This is why, when you&#8217;re just adding items to a dictionary, they are returned in the order they were added. Only when you remove items and the freelist becomes involved are items returned in non-consecutive order.<\/p>\n<p>However, this is very much an implementation detail, and can change in future versions of .NET; do not rely on this behaviour in your own code! Always treat the the enumeration order of any hashtable as random.<\/p>\n<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>To many people, System.Collections.Generic.Dictionary&lt;TKey,TValue&gt; is just a useful collection. In this post, I&#8217;ll be looking inside that collection and see how it really works. Dictionary is based on a hashtable; for the rest of this post, I&#8217;ll assume you know roughly how a hashtable works. The Wikipedia article, as the source of all knowledge algorithmical,&#8230;&hellip;<\/p>\n","protected":false},"author":186659,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[],"class_list":["post-3403","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3403","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/186659"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=3403"}],"version-history":[{"count":1,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3403\/revisions"}],"predecessor-version":[{"id":25366,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3403\/revisions\/25366"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=3403"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=3403"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=3403"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=3403"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}