{"id":2753,"date":"2009-01-16T10:32:00","date_gmt":"2009-01-16T10:32:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/how-big-is-a-string-in-net\/"},"modified":"2017-10-20T14:35:39","modified_gmt":"2017-10-20T14:35:39","slug":"how-big-is-a-string-in-net","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/how-big-is-a-string-in-net\/","title":{"rendered":"How big is a string in .NET?"},"content":{"rendered":"<p class=\"MsoNormal\">Typically the size of an object is 8 bytes for the object header plus the sum of the fields.\u00a0 Consider this simple object:<\/p>\n<pre class=\"lang:tsql decode:true\">\u00a0\u00a0\u00a0 class ThreeFields\r\n\u00a0\u00a0\u00a0 {\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double d;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object o;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int i;\r\n\u00a0\u00a0\u00a0 }<\/pre>\n<p class=\"MsoNormal\">The size of a ThreeFields object is 8 bytes (for header) + 8 bytes (for the double) + 4 bytes (for the object pointer) + 4 bytes (for the integer) = 24 bytes .\u00a0 But what about a string?<\/p>\n<p class=\"MsoNormal\">A string is composed of:<\/p>\n<ol>\n<li class=\"MsoNormal\">An 8-byte object header (4-byte SyncBlock and a 4-byte type descriptor)<\/li>\n<li class=\"MsoNormal\">An int32 field for the length of the string (this is returned by String.Length).<\/li>\n<li class=\"MsoNormal\">An int32 field for the number of chars in the character buffer.<\/li>\n<li class=\"MsoNormal\">The first character of the string in a System.Char.<\/li>\n<li class=\"MsoNormal\">The rest of the string in a character buffer, terminating with a null terminator char.\u00a0\u00a0 If you don&#8217;t believe me about the null-terminator, have a poke around with Reflector.\u00a0 String.AppendInPlace() demonstrates this nicely.\u00a0 One reason this is done is to aid unmanaged interop.\u00a0 Otherwise, everytime you marshal a string you&#8217;d need to copy it and add the &#8221;<\/li>\n<\/ol>\n<p class=\"MsoNormal\">So a string size is 18 + (2 * number of characters) bytes.\u00a0 (In reality, another 2 bytes is sometimes used for packing to ensure 32-bit alignment, but I&#8217;ll ignore that).\u00a0 2 bytes is needed for each character, since .NET strings are UTF-16.\u00a0\u00a0 Thus, a string like so:<\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 String mySimpleString = &#8220;Hello&#8221;;<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">Will be 18 + (2 * 5) = 28 bytes.\u00a0 Splendid.\u00a0 Now consider this snippet of code.\u00a0 It creates a String and a Stringbuilder with the same contents.<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 String atCompile = &#8220;123456789&#8221;;<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 StringBuilder buildingMyString = new StringBuilder(&#8220;123&#8221;);<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 buildingMyString.Append(&#8220;456&#8221;);<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 buildingMyString.Append(&#8220;789&#8221;);<br \/>\n \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 String fromStringBuilder = buildingMyString.ToString();<\/p>\n<p> This should create two strings: atCompile and fromStringBuilder, which both read &#8220;123456789&#8221;.\u00a0 How big are atCompile and fromStringBuilder?\u00a0 You might think:<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 18 + (2 * 9) = 36 bytes<\/p>\n<p class=\"MsoNormal\">The String atCompile is 36 bytes, as expected. fromStringBuilder has the same contents, but is 50 bytes.\u00a0 Eh?\u00a0 What&#8217;s going on there?<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">This weird behaviour is down to how System.StringBuilder does a ToString().\u00a0 Like most people, I believed it allocated a new string and then copied the contents of the StringBuilder in.\u00a0 In reality, it just returns a reference to the string underpinning the StringBuilder.\u00a0 StringBuilders work by using strings backed with char buffers increasing in size in powers of two.\u00a0 A string does not need to be backed by a char buffer matching the string length; &#8216;expansion room&#8217; is permitted.\u00a0 In this case, fromStringBuilder is backed by a 16-byte array, so is 18 + (2*16) = 50 bytes, as observed.<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">But what happens if the StringBuilder is then edited?\u00a0 Doesn&#8217;t the String we just got from ToString() then become invalid?\u00a0\u00a0 Yes it does.\u00a0 When you do this append, StringBuilder copies the existing contents to a new string, and uses this new string.\u00a0 The String we got via ToString() continues to point to the String that has been discarded by the StringBuilder.\u00a0 This String is still backed by an over-sized char[].<\/p>\n<p class=\"MsoNormal\">\u00a0<\/p>\n<p class=\"MsoNormal\">I presume Microsoft made this design decision because it is a common idiom to create a StringBuilder, append to it, ToString() it, and then never use it again.\u00a0 Copying all those bytes from the StringBuilder to a String would be a waste.\u00a0 Even if you then append to the StringBuilder after doing your ToString(), the resulting copy of the StringBuilder&#8217;s underlying string requires no time than would copying it during the ToString().<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Typically the size of an object is 8 bytes for the object header plus the sum of the fields.\u00a0 Consider this simple object: \u00a0\u00a0\u00a0 class ThreeFields \u00a0\u00a0\u00a0 { \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 double d; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 object o; \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 int i; \u00a0\u00a0\u00a0 } The size of a ThreeFields object is 8 bytes (for header) + 8 bytes (for the&#8230;&hellip;<\/p>\n","protected":false},"author":95472,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[8264],"class_list":["post-2753","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2753","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/95472"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=2753"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2753\/revisions"}],"predecessor-version":[{"id":74752,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2753\/revisions\/74752"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=2753"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=2753"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=2753"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=2753"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}