Subterranean IL: Introduction

Today, I’ll be starting a new series of blog posts on ‘Subterranean IL’ – a look at the low-level IL commands available to .NET compilers, what each command does (or at least the more interesting commands) and why each command does what it does. One of the first things I’ll be looking at are the IL commands used in a generic method or a method in a generic class. However, to start off, we need to understand the basic data structures and datatypes used by the VES (Virtual Execution System) when executing a method.

The Execution Stack

The execution stack is where all the action happens in a .NET method. IL instructions all either operate on values currently on the execution stack, or copy values to and from the execution stack. However, the stack isn’t generally used for storage of values throughout a method’s execution; that’s what local variables are for. As an example, here’s a commented outline of the IL corresponding to the following C# method:

Execution stack datatypes

There are actually two sets of datatypes used by the CLR – the datatypes used when storing data in the heap, method arguments, local variables, etc, and those stored on the execution stack whilst a program is executing. The latter are quite different to the CTS datatypes, and comprise the following:

  • int32
  • int64
  • native int
  • Float (F)
  • Managed pointer (&)
  • Object reference (O)
  • Value types

There are several things to note about this list:

  1. There are no datatypes less than 4 bytes. This is for performance reasons, as modern computers are optimized to work with 4-byte-aligned values. CTS datatypes that are less than 4 bytes (eg Byte or Int16) are automatically sign-extended or truncated when copied to and from the execution stack.
  2. The integer datatypes are not specified as signed or unsigned; their ‘signage’ depends on the instructions that act on them. This means that, for example, casting a uint to an int in C# actually results in a no-op.
  3. Boolean values turn into int32 values in a similar way to C – false is zero, true is non-zero.
  4. Object references are conceptually the same as unmanaged pointers, except they are severely limited in the operations that can be performed on them in verifiable code.
  5. Object references don’t store information the type of object they reference, although in verifiable code the operations on O stack elements have to be consistent with the reference type deduced using static analysis.
  6. Single or Double values are represented on the stack using a single datatype that is converted to 32 or 64 bits as necessary, in the same way as 1 and 2-byte integers.

Now we’ve covered the basics of the execution stack and how it operates, my next post will be looking at what happens when you call a method, and the differences between reference and value types.