Subterranean IL: Introduction

Today, I’ll be starting a new series of blog posts on ‘Subterranean IL’ – a look at the low-level IL commands available to .NET compilers, what each command does (or at least the more interesting commands) and why each command does what it does. One of the first things I’ll be looking at are the IL commands used in a generic method or a method in a generic class. However, to start off, we need to understand the basic data structures and datatypes used by the VES (Virtual Execution System) when executing a method.

The Execution Stack

The execution stack is where all the action happens in a .NET method. IL instructions all either operate on values currently on the execution stack, or copy values to and from the execution stack. However, the stack isn’t generally used for storage of values throughout a method’s execution; that’s what local variables are for. As an example, here’s a commented outline of the IL corresponding to the following C# method:

public static int AddFive(int value) {

int newValue = value + 5;

return newValue;

}

.method public static int32 AddFive(int32 'value') {

.maxstack 2

// declare 1 local variable

.locals init ([0] int32 newValue)

// int newValue = value + 5

ldarg.0 // push method argument 0 ('value') onto stack

ldc.i4.5 // push constant 5 onto stack

add // pop two values off stack, push result of addition

stloc.0 // pop top value off stack

// and store in local variable 0 ('newValue')

// return newValue

ldloc.0 // push local variable 0 onto stack

ret // return from method

// with value on stack as the method return value

}

Execution stack datatypes

There are actually two sets of datatypes used by the CLR – the datatypes used when storing data in the heap, method arguments, local variables, etc, and those stored on the execution stack whilst a program is executing. The latter are quite different to the CTS datatypes, and comprise the following:

int32
int64
native int
Float (F)
Managed pointer (&)
Object reference (O)
Value types

There are several things to note about this list:

There are no datatypes less than 4 bytes. This is for performance reasons, as modern computers are optimized to work with 4-byte-aligned values. CTS datatypes that are less than 4 bytes (eg Byte or Int16) are automatically sign-extended or truncated when copied to and from the execution stack.
The integer datatypes are not specified as signed or unsigned; their ‘signage’ depends on the instructions that act on them. This means that, for example, casting a uint to an int in C# actually results in a no-op.
Boolean values turn into int32 values in a similar way to C – false is zero, true is non-zero.
Object references are conceptually the same as unmanaged pointers, except they are severely limited in the operations that can be performed on them in verifiable code.
Object references don’t store information the type of object they reference, although in verifiable code the operations on O stack elements have to be consistent with the reference type deduced using static analysis.
Single or Double values are represented on the stack using a single datatype that is converted to 32 or 64 bits as necessary, in the same way as 1 and 2-byte integers.

Now we’ve covered the basics of the execution stack and how it operates, my next post will be looking at what happens when you call a method, and the differences between reference and value types.

Register for Simple Talk

Subterranean IL: Introduction

About the author

Simon Cooper's contributions

Articles

Books

Top topics

Simon Cooper's latest contributions:

C# via Java: Arrays

C# via Java: Primitive types

C# via Java: Introduction

Recommended

About the author

Simon Cooper's contributions

Articles

Books

Top topics

Simon Cooper's latest contributions:

C# via Java: Arrays

C# via Java: Primitive types

C# via Java: Introduction