Today, I’ll be starting a new series of blog posts on ‘Subterranean IL’ – a look at the low-level IL commands available to .NET compilers, what each command does (or at least the more interesting commands) and why each command does what it does. One of the first things I’ll be looking at are the IL commands used in a generic method or a method in a generic class. However, to start off, we need to understand the basic data structures and datatypes used by the VES (Virtual Execution System) when executing a method.
The Execution Stack
The execution stack is where all the action happens in a .NET method. IL instructions all either operate on values currently on the execution stack, or copy values to and from the execution stack. However, the stack isn’t generally used for storage of values throughout a method’s execution; that’s what local variables are for. As an example, here’s a commented outline of the IL corresponding to the following C# method:
1 2 3 4 |
public static int AddFive(int value) { int newValue = value + 5; return newValue; } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
.method public static int32 AddFive(int32 'value') { .maxstack 2 // declare 1 local variable .locals init ([0] int32 newValue) // int newValue = value + 5 ldarg.0 // push method argument 0 ('value') onto stack ldc.i4.5 // push constant 5 onto stack add // pop two values off stack, push result of addition stloc.0 // pop top value off stack // and store in local variable 0 ('newValue') // return newValue ldloc.0 // push local variable 0 onto stack ret // return from method // with value on stack as the method return value } |
Execution stack datatypes
There are actually two sets of datatypes used by the CLR – the datatypes used when storing data in the heap, method arguments, local variables, etc, and those stored on the execution stack whilst a program is executing. The latter are quite different to the CTS datatypes, and comprise the following:
int32
int64
native int
- Float (
F
) - Managed pointer (
&
) - Object reference (
O
) - Value types
There are several things to note about this list:
- There are no datatypes less than 4 bytes. This is for performance reasons, as modern computers are optimized to work with 4-byte-aligned values. CTS datatypes that are less than 4 bytes (eg
Byte
orInt16
) are automatically sign-extended or truncated when copied to and from the execution stack. - The integer datatypes are not specified as signed or unsigned; their ‘signage’ depends on the instructions that act on them. This means that, for example, casting a
uint
to anint
in C# actually results in a no-op. Boolean
values turn intoint32
values in a similar way to C –false
is zero,true
is non-zero.- Object references are conceptually the same as unmanaged pointers, except they are severely limited in the operations that can be performed on them in verifiable code.
- Object references don’t store information the type of object they reference, although in verifiable code the operations on
O
stack elements have to be consistent with the reference type deduced using static analysis. Single
orDouble
values are represented on the stack using a single datatype that is converted to 32 or 64 bits as necessary, in the same way as 1 and 2-byte integers.
Now we’ve covered the basics of the execution stack and how it operates, my next post will be looking at what happens when you call a method, and the differences between reference and value types.
Load comments