C# via Java: Primitive types

So, what is a primitive type? According to the Incompleteness Theorem, there will always be things in any mathematical system, and therefore any computational system, that cannot be defined using the rules of that system. These rules form the axioms of that system.

For Java and C#, the axioms are the rules of the language and runtime, as defined in the respective specifications, and those rules cannot be inferred from within the language itself. They simply exist as a given.

So, what rules are the axioms of Java and C#? There are several possibilities, given the wide scope of both langauges. For the purposes of this post, I’m going to concentrate on the type systems of both languages, and use the primitive types as the axioms. So what are those primitive types?

  • booleans
  • integers
  • floats
  • arrays of primitives

And that’s it. There are several things to note from this definition:

  • Arrays are defined recursively, so you can have an array of arrays of integers.
  • Arrays are a reference type, everything else is a value type.
  • Objects are not a primitive, as an object can be defined using arrays of primitives. As arrays are a reference type, this gives objects, defined using arrays, the semantics of a reference type.
  • Characters are not a primitive either, as those can be defined using integers.
  • Strings can be defined as an array of integers.

Also note that this is not a formal definition – I’m using this definition to learn more about Java and C#, and how they use these primitives to define the rest of the language, not to define and analyse C# and Java using formal type theory or strict mathematics.

The primitive value types

In this post, we’ll be starting off with the primitive value types. What are the primitive value types in Java and C#?

Type Java C#
boolean bool boolean
1-byte signed integer byte sbyte
1-byte unsigned integer byte
2-byte signed integer short short
2-byte unsigned integer ushort
4-byte signed integer int int
4-byte unsigned integer uint
8-byte signed integer long long
8-byte unsigned integer ulong
4-byte float float float
8-byte float double double

Within the runtime, these values all have a predefined representation – for the numbers, simply the byte representation of that number, and for boolean values, a 1-byte value containing zero for false, and non-zero for true. As you can see, C# provides signed and unsigned versions of all the various lengths of integers – 1, 2, 4, and 8 bytes. The c# byte is defined as unsigned, whereas Java only provides signed versions, which means Java’s byte is signed.

When programming using these types, you need to be able to perform operations on them, such as arithmetic operations or comparisons. Due to the Incompleteness Theorem, these operations cannot be defined using code written in the language itself – these operations are defined outside Java or C#. And so the CLR and JVM can perform mathematical operations and comparisons between instances of the primitive value types without using any external libraries. To accomplish this, there are special commands in IL and Java bytecode to perform these built-in operations.

A selection of these commands are:

Operation Java bytecode IL
Add two 4-byte integers iadd add
Multiple two 8-byte floats dmul mul
Branch if equal if_icmpeq beq
Load a constant 8-byte integer ldc2_w ldc.i8

To access these built-in runtime instructions from Java or C#, the language has special syntax that compiles to these instructions, primarily the mathematical operators + - * / < > and ==. So, for example, the following expression:

compiles to the following IL:

and the following Java bytecode:

All these language mappings, and instructions, are built-in and predefined as axioms in the language and runtime.

Methods on primitive types

However, there’s still some operations in the language which don’t map directly onto instructions provided by the runtime. For example, toString, parse, or the implementation of a generic Comparable interface. As Java and C# are both object-oriented languages, these methods need to be defined as part of an object of some kind. For 4-byte integers, these methods are defined on System.Int32 in C#, and on java.lang.Integer in Java.

The difference

It’s these objects and methods that are the key to understanding the differences between primitive types in Java and C#. Lets start off with Java:

java.lang.Integer

Like all other types in Java, java.lang.Integer is a reference type, which contains a single field of the primitive type int. It’s just like any other reference type in Java. It’s this type that contains the various methods that act on an int, like toString, parseInt, compareTo, implemented either as a static method that takes or returns an int argument where appropriate, or as an instance method on java.lang.Integer that operates on the instance’s int field.

Prior to Java 1.5, you had to manually convert to and from int to java.lang.Integer, using the constructor on Integer or calling the instance method Integer.intValue() to get contained int value. In 1.5, the compiler inserts these conversions where appropriate as part of the autoboxing feature.

The important point is that, in Java, an int is a pure 4-byte number, operated on by instructions built-in to the runtime. java.lang.Integer contains all the other operations on integers that can’t be compiled directly to runtime instructions. It’s just like any other reference type in Java. When necessary, you can create an instance of Integer from an int value to pass an integer value to methods expecting an instance of Object or other reference type.

System.Int32

Similar to java.lang.Integer, System.Int32 is the type containing all the methods on integer values that don’t map directly onto operations provided by the runtime. But, where Integer is a reference type, System.Int32 is a value type. This has some quite fundamental consequences to what an integer value is in C#. To understand what these are, we need to take a digression as to how a value type is represented in .NET.

Value types in C#

An instance of a reference type is assigned its own block of memory on the heap. But a value type borrows memory from an object containing that value type. If it is declared as a member of a reference type, it will use a section of memory that belongs to the reference type on the heap. If manipulating it on the stack, it uses a section of the stack.

If the value type is a member of an outer value type, the inner value type becomes part of the value of the outer value type. For example, the following type definitions:

will result in the following memory layout for instances of type ObjectA on the heap:

Recursive definition?

So, back to System.Int32. If you have a look at this type in a disassembler, you’ll see that its definition is, in IL:

This looks like a recursive definition, violating the .NET rule that a struct cannot contain an instance of itself. But it obviously does work, somehow.

The key is the Incompleteness Theorum. int32 is a built-in primitive type that the CLR itself implements using a 4-byte value. The struct System.Int32 is a (more-or-less) standard value type. A value type is comprised of the values of its member fields. System.Int32 is comprised of a single 4-byte value. That means that an instance of System.Int32 is also a pure 4-byte value.

This is the key to understanding primitive types in .NET – any 4-byte value in memory can be interpreted as a primitive int32, that can be manipulated by built-in arithmetic operations, or an instance of System.Int32, on which the CLR can execute all the methods declared on that type. That change in interpretation can occur without any changes in the program’s memory, or any boxing operations, the CLR simply chooses to see a 4-byte value as a primitive type one instant, and a complex value type the next.

What is a primitive?

While primitive types in Java are simple values, values of primitive types in .NET are both a primitive type value and a complex value type value. Byte values of the correct length can be interpreted either as primitive types or complex types, thanks to the rules determining how value types use memory in the CLR.

As Java does not allow complex value types to be declared, the methods performing operations that aren’t built-in to the runtime must be declared on a separate reference type, and the primitive types converted to and from this representation using autoboxing where needed. The CLR simply reinterprets a value as a primitive or complex value type.

That’s started us off with the primitives. In the next post, we’ll be looking at arrays, and how Java and C# arrays can be used, and what they represent.