Anatomy of a .NET Assembly – The DOS stub

Comments 0

Share to social media

The DOS stub at the top of the file is the first thing you notice when you open a .NET assembly in a hex editor. But what do those bytes mean, and what do they do? As I discussed in a previous post, the first 64 bytes are the DOS header, and the next 64 bytes is the stub program. What does that program do?

Real mode & DOS

All modern x86-compatible processors, right up to the latest Pentiums and Athlons, boot up into real mode, the addressing mode used in the original 8086, and the mode used in all DOS programs. In this mode, processor registers are all 16-bits long, and the currently running program is entirely responsible for the 1024kB of physical memory available to it (how times have changed…).

Memory addresses are specified by adding an offset to the (shifted) value of one of four segment registers. These specified the ‘base’ addresses of the stack, code (the program loaded into memory), and data segments within memory (the fourth segment register was an ‘extra’ that could be used as the program saw fit).

When a DOS program is run, the contents of the executable are loaded into memory, processor registers are set to values specified in the DOS header, and the bytes immediately following the header are executed on the processor.

Disassembling the DOS stub

PE%20Headers%20annotated.png

Most of the DOS header consists of the string This program cannot be run in DOS mode.rrn$. In the example .NET assembly above, this string starts at file offset 0x4E. Before that, we have the following bytes:

If we run this through a disassembler, we get the following 16-bit x86 code:

Well, what on earth is that doing? Let’s break it down into stages.

  1. push cs
    pop ds
    These two instructions operate on the stack available to real mode programs; the push instruction is essentially shorthand for ‘Copy the specified value to the location pointed to by the stack pointer and increment the pointer’, while pop does the opposite.

    To start off with, the cs register points to where the program code is loaded into memory. These two instructions are copying the value of the cs (code segment) register to the ds (data segment) register via the stack.

  2. mov dx, 0xe
    This is setting the value of the dx register to the constant 0xe. If you have a look at the stub above, this is the offset at which the text string begins from the start of the code.
  3. mov ah, 0x9
    int 0x21
    Let’s start off with int 0x21. This instruction invokes a software interrupt; interrupt 0x21 is the interrupt number for the DOS API (yes, such a thing exists!). The ah register (the high byte of ax) contains the number of the API function to call.

    If you have a look at the list of function codes, 0x9 corresponds to ‘Write string to STDOUT’, with the string to write to pointed to by ds:dx and terminated by $. As ds has been set the same as cs, and the offset 0xe is in dx, this prints the string This program cannot be run in DOS mode.rrn to the console.

  4. mov ax, 0x4c01
    int 0x21
    This is another DOS function call. The high byte of ax, 0x4c, corresponds to the Exit function, and the low byte, 0x01, specifies the return value. So this stops execution of the program and returns to the DOS prompt, with a return value of 1.

Why is this here?

When the PE file format was introduced in Windows NT 3.1, there was a real possibility users would try to execute PE files in a DOS environment. So the existance of the DOS stub was specified in the PE file specification. When .NET 1.0 was introduced in 2002 for Windows 98, ME, NT, 2000 and XP, there was still a possibility that assemblies would be run under DOS, so the DOS stub was also specified in the CLR specification. For backwards compatibility, this stub still exists in .NET 4 assemblies.

That’s it!

There we go, nice and simple. However, the CLR loader stub, which I’ll be looking at in the next post, is significantly more complicated!

Load comments

About the author

Simon Cooper's contributions