Typically the size of an object is 8 bytes for the object header plus the sum of the fields. Consider this simple object:
The size of a ThreeFields object is 8 bytes (for header) + 8 bytes (for the double) + 4 bytes (for the object pointer) + 4 bytes (for the integer) = 24 bytes . But what about a string?
A string is composed of:
- An 8-byte object header (4-byte SyncBlock and a 4-byte type descriptor)
- An int32 field for the length of the string (this is returned by String.Length).
- An int32 field for the number of chars in the character buffer.
- The first character of the string in a System.Char.
- The rest of the string in a character buffer, terminating with a null terminator char. If you don’t believe me about the null-terminator, have a poke around with Reflector. String.AppendInPlace() demonstrates this nicely. One reason this is done is to aid unmanaged interop. Otherwise, everytime you marshal a string you’d need to copy it and add the ”
So a string size is 18 + (2 * number of characters) bytes. (In reality, another 2 bytes is sometimes used for packing to ensure 32-bit alignment, but I’ll ignore that). 2 bytes is needed for each character, since .NET strings are UTF-16. Thus, a string like so:
String mySimpleString = “Hello”;
Will be 18 + (2 * 5) = 28 bytes. Splendid. Now consider this snippet of code. It creates a String and a Stringbuilder with the same contents.
String atCompile = “123456789”;
StringBuilder buildingMyString = new StringBuilder(“123”);
String fromStringBuilder = buildingMyString.ToString();
This should create two strings: atCompile and fromStringBuilder, which both read “123456789”. How big are atCompile and fromStringBuilder? You might think:
18 + (2 * 9) = 36 bytes
The String atCompile is 36 bytes, as expected. fromStringBuilder has the same contents, but is 50 bytes. Eh? What’s going on there?
This weird behaviour is down to how System.StringBuilder does a ToString(). Like most people, I believed it allocated a new string and then copied the contents of the StringBuilder in. In reality, it just returns a reference to the string underpinning the StringBuilder. StringBuilders work by using strings backed with char buffers increasing in size in powers of two. A string does not need to be backed by a char buffer matching the string length; ‘expansion room’ is permitted. In this case, fromStringBuilder is backed by a 16-byte array, so is 18 + (2*16) = 50 bytes, as observed.
But what happens if the StringBuilder is then edited? Doesn’t the String we just got from ToString() then become invalid? Yes it does. When you do this append, StringBuilder copies the existing contents to a new string, and uses this new string. The String we got via ToString() continues to point to the String that has been discarded by the StringBuilder. This String is still backed by an over-sized char.
I presume Microsoft made this design decision because it is a common idiom to create a StringBuilder, append to it, ToString() it, and then never use it again. Copying all those bytes from the StringBuilder to a String would be a waste. Even if you then append to the StringBuilder after doing your ToString(), the resulting copy of the StringBuilder’s underlying string requires no time than would copying it during the ToString().