{"id":3269,"date":"2011-03-24T11:00:00","date_gmt":"2011-03-24T11:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/anatomy-of-a-net-assembly-methods\/"},"modified":"2016-07-28T10:50:21","modified_gmt":"2016-07-28T10:50:21","slug":"anatomy-of-a-net-assembly-methods","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/anatomy-of-a-net-assembly-methods\/","title":{"rendered":"Anatomy of a .NET Assembly &#8211; Methods"},"content":{"rendered":"<p>Any close look at the method definitions in a .NET assembly has to start off with the method&#8217;s information in the metadata tables &#8211; the <code>MethodDef<\/code>. So lets do that.<\/p>\n<h4><code>MethodDef<\/code><\/h4>\n<p>The <code>MethodDef<\/code> entry for the entrypoint method in my TinyAssembly example used in previous posts has the following bytes:<\/p>\n<pre>D0 20 00 00 00 00 91 00 47 00 0A 00 01 00<\/pre>\n<p> According to the CLR spec, the row is interpreted as follows: <\/p>\n<ol>\n<li><code>D0 20 00 00<\/code>: <strong>RVA<\/strong>. The RVA of the method body within the assembly (0x20d0).<\/li>\n<li><code>00 00<\/code>: <strong>ImplFlags<\/strong>. Various flags indicating how the method is implemented. All-zeros indicate this is a pure-IL managed method.<\/li>\n<li><code>91 00<\/code>: <strong>Flags<\/strong>. In this example, these flags indicate this method is declared <code>private static hidebysig<\/code>. <\/li>\n<li><code>47 00<\/code>: <strong>Name<\/strong>. offset into <code>#Strings<\/code> heap (&#8216;Main&#8217;).<\/li>\n<li><code>0A 00<\/code>: <strong>Signature<\/strong>. Offset into <code>#Blob<\/code> heap containing the signature (return type &amp; parameter types) of the method. I might look at signature encodings in a later post. In this example, the bytes at this offset indicate a method with no parameters and a <code>void<\/code> return type.<\/li>\n<li><code>01 00<\/code>: <strong>ParamList<\/strong>. A RID to the <code>ParamDef<\/code> table with information on the method&#8217;s parameters. As this method has no parameters, and there is no <code>ParamDef<\/code> table in the assembly, this is essentially ignored.<\/li>\n<\/ol>\n<p>You&#8217;ll notice there&#8217;s no reference to the owning <code>TypeDef<\/code> within a <code>MethodDef<\/code>. In an assembly, associations like this are one-way &#8211; in order to find which type owns a particular method, you have to scan the <code>TypeDef<\/code> table until you find a type that includes the method in its method list.<\/p>\n<h4>Method bodies<\/h4>\n<p>So, a <code>MethodDef<\/code> describes the basic properties of a method &#8211; its name, signature, and implementation details. What about the actual body of a method?<\/p>\n<p>Within the assembly, these are all located between the CLI header\/strong name hash and the CLR metadata, as you can see:<\/p>\n<p>  <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/CLI%20Contents%20annotated.png\"><img decoding=\"async\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/CLI%20Contents%20annotated.png\" width=\"500\" alt=\"CLI%20Contents%20annotated.png\" \/><\/a>  <\/p>\n<p>The RVA within a <code>MethodDef<\/code> tells us where the method body can be found. In this case, the RVA is 0x20d0, so the method body comprises the following bytes: <\/p>\n<pre>2E 72 01 00 00 70 28 11 00 00 0A 2A<\/pre>\n<p> How are these bytes interpreted? To start with, we have to understand a bit about CIL opcodes.<\/p>\n<h4>Instruction encoding<\/h4>\n<p>CIL uses a variable-length instruction encoding; each opcode can be represented by 1 or 2 bytes, with all the commonly-used instructions using 1 byte. After the opcode are bytes representing the argument to that instruction, if there is one.<\/p>\n<p>There are usually two versions of instructions that take an argument, a <em>short version<\/em> and a <em>long version<\/em>. For example, the <code>ldarg<\/code> instruction (load argument) is a 2-byte instruction that takes a 2-byte argument with the argument number to be loaded. However, very few methods are going to have argument lists with more than about 10 arguments, so the <code>ldarg.s<\/code> 1-byte instruction takes a single byte with the argument number. And, in fact, IL takes this a step further, as there are separate <code>ldarg.0<\/code>, <code>ldarg.1<\/code>, <code>ldarg.2<\/code> and <code>ldarg.3<\/code> instructions to load the first 4 method arguments in only 1 byte.<\/p>\n<p>If we compare the 3 different instructions encodings, we can see what difference in bytes this can do: <\/p>\n<ul>\n<li><code>ldarg 1<\/code><br \/><code>FE 09 01 00<\/code><\/li>\n<li><code>ldarg.s 1<\/code><br \/><code>0E 01<\/code><\/li>\n<li><code>ldarg.1<\/code><br \/><code>03<\/code><\/li>\n<\/ul>\n<p>Since <code>ldarg.1<\/code> is likely to be used many, many times within an assembly, this saves a lot of space.<\/p>\n<p>Within the instruction stream, metadata items are all referenced using a 4-byte <em>token<\/em>, with the table number in the most significant byte and the RID in the other 3 bytes. For example, the IL instruction <\/p>\n<pre>callvirt instance string [mscorlib]System.Object::ToString()<\/pre>\n<p> could get compiled to <\/p>\n<pre>6F 05 00 00 0A<\/pre>\n<p> 0x6f is the encoding for the <code>callvirt<\/code> instruction, and the token argument is pointing to the 5th row of the <code>MemberRef<\/code> table (table number 0x0a), which will have the name and signature of the <code>Object.ToString()<\/code> method.<\/p>\n<h4>Method headers<\/h4>\n<p>Again, there&#8217;s a header at the start of the method body. And, in a similar way to the instruction encoding, there are two formats for the header depending on what needs to be specified, a <em>thin<\/em> format and a <em>fat<\/em> format. The first two bits of the first byte specify which header this is.<\/p>\n<p>The thin format header takes up a single byte, and is used when a method: <\/p>\n<ul>\n<li>has no local variables<\/li>\n<li>has no exception handling blocks<\/li>\n<li>never has more than 8 items on the stack<\/li>\n<li>has a method body shorter than 63 bytes (not including the header)<\/li>\n<\/ul>\n<p> If any of these do not apply, then the fat header needs to be used, which is 12 bytes long. This header specifies the maximum number of items on the stack, a token to the locals signature in the <code>StandAloneSig<\/code> metadata table, and a flag indicating if there are exception handling tables after the method body.<\/p>\n<p>Thin headers are typically used for simple property getter &amp; setters, which usually do no more than a load field &amp; return or load argument &amp; store field.<\/p>\n<h4>Decoding method bodies<\/h4>\n<p>We can now decode the method body mentioned above. To recap, the bytes are: <\/p>\n<pre>2E 72 01 00 00 70 28 11 00 00 0A 2A<\/pre>\n<p> So, in order: <\/p>\n<ul>\n<li><code>2E<\/code><br \/> This is the method header. The bits are:\n<pre>2    E\n0010 1110<\/pre>\n<p> The least significant bits are <code>10<\/code>, indicating this is a thin method header. The remainder of the bits gives us the size of the method body &#8211; 11 bytes (<code>1011<\/code>)<\/li>\n<li><code>72 01 00 00 70<\/code><br \/><code>ldstr \"Hello World\"<\/code><br \/> 0x72 is the opcode for the <code>ldstr<\/code> instruction, and the argument is a 4-byte token. However, this is no ordinary token; the table number is 0x70, which does not refer to any of the metadata tables. In a method body, a token with a table of 0x70 is actually referring to the <code>#US<\/code> metadata heap, and the RID is actually the zero-based offset within that heap. So this refers to the item in the <code>#US<\/code> heap starting at offset 1 &#8211; the string &#8220;Hello World&#8221;<\/li>\n<li><code>28 11 00 00 0A<\/code><br \/><code>call void [mscorlib]System.Console::WriteLine(string)<\/code><br \/> 0x28 is the opcode for the <code>call<\/code> instruction, again with a 4-byte token argument. This is a normal token, and refers to the 17th entry in the <code>MemberRef<\/code> table. If we have a look back at the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/blogbits\/simon.cooper\/CLR%20Metadata%20annotated.png\">metadata tables<\/a> the 17th row in the <code>MemberRef<\/code> table has the bytes <code>99 00 7E 02 C9 00<\/code>. This points to the <code>TypeRef<\/code> for <code>System.Console<\/code>, &#8220;WriteLine&#8221; in the <code>#Strings<\/code> heap, and the method signature <code>void(string)<\/code>. This is everything the CLR needs to know to work out which method to call.<\/li>\n<li><code>2A<\/code><br \/><code>ret<\/code><br \/> 0x2a is the opcode for the <code>ret<\/code> instruction, which takes no arguments.<\/li>\n<\/ul>\n<p>As you can see, this is actually a very simple &#8220;Hello World&#8221; program.<\/p>\n<h4>Conclusion<\/h4>\n<p>I hope this series of posts has given you an insight into how data is actually stored in a .NET assembly. Please do comment or email me if there&#8217;s anything you want me to have a look at in more detail. That&#8217;s not the end of the series; the next few posts will have a look at the <a href=\"https:\/\/www.simple-talk.com\/community\/blogs\/simonc\/archive\/2011\/03\/28\/100987.aspx\">DOS<\/a> and <a href=\"https:\/\/www.simple-talk.com\/community\/blogs\/simonc\/archive\/2011\/03\/28\/100993.aspx\">CLR loader<\/a> stubs that are part of every assembly &#8211; why they are there, and what they do.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Any close look at the method definitions in a .NET assembly has to start off with the method&#8217;s information in the metadata tables &#8211; the MethodDef. So lets do that. MethodDef The MethodDef entry for the entrypoint method in my TinyAssembly example used in previous posts has the following bytes: D0 20 00 00 00&#8230;&hellip;<\/p>\n","protected":false},"author":186659,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[],"class_list":["post-3269","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3269","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/186659"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=3269"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3269\/revisions"}],"predecessor-version":[{"id":42012,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3269\/revisions\/42012"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=3269"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=3269"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=3269"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=3269"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}