JacksonDunstan.com

Value types like int, structs, and enums seem simple, but much of what we think we know about them just isn’t true. This article explores how value types actually work in C# and uses that knowledge to improve how they’re implemented in the C++ scripting system.

Table of Contents

Let’s start off with some misconceptions about how value types work. First up, one common phrase programmers say about C# is “everything extends from Object.” C# goes a long way to make it seem as though this is true. Even Microsoft’s MSDN documention says as much when it describes ValueType like so:

Provides the base class for value types.

It also shows this under “Inheritance Hierarchy”:

System.Object
â€‚â€‚System.ValueType
â€‚â€‚â€‚â€‚System.Enum

Likewise, the documentation for Enum says this:

Provides the base class for enumerations.

Then again their documentation on value types says this:

All value types are derived implicitly from the System.ValueType.

And this:

Unlike with reference types, you cannot derive a new type from a value type. However, like reference types, structs can implement interfaces.

Adding weight to the interfaces claim, the documentation for Int32 (a.k.a. int) shows all the interfaces it supposedly implements:

public struct Int32 : IComparable, IFormattable, IConvertible, 
	IComparable<int>, IEquatable<int>

To summarize, the official documentation has told us that value types including primitives, structs, and enums ultimately derive from Object and implement interfaces. Unfortunately, none of that is true.

It would be much more accurate to say that value types can be boxed to a class that ultimately derives from Object and can implement interfaces. Let’s see this in action by treating a value type like an Object:

static class Test
{
	static void Foo()
	{
		ReturnHashCode(123);
	}
 
	static int ReturnHashCode(object o)
	{
		return o.GetHashCode();
	}
}

ReturnHashCode takes an Object, but 123 is a primitive of type System.Int32 (a.k.a. int). If it were really true that all value types derived from ValueType which derives from Object, there’d be no need to do anything here. Since that’s not true, 123 gets boxed into an instance of a class that does actually derive from ValueType and Object. We can see this clearly when looking at the IL that this C# gets compiled to:

.method private hidebysig static void Foo() cil managed
{
	.maxstack  8
	IL_0000:  nop
	IL_0001:  ldc.i4.s   123
	IL_0003:  box        [mscorlib]System.Int32
	IL_0008:  call       int32 Test::ReturnHashCode(object)
	IL_000d:  pop
	IL_000e:  ret
}
 
.method private hidebysig static int32 ReturnHashCode(object o) cil managed
{
	.maxstack  1
	.locals init (int32 V_0)
	IL_0000:  nop
	IL_0001:  ldarg.0
	IL_0002:  callvirt   instance int32 [mscorlib]System.Object::GetHashCode()
	IL_0007:  stloc.0
	IL_0008:  br.s       IL_000a
 
	IL_000a:  ldloc.0
	IL_000b:  ret
}

The key line is this one in Foo where the Int32 is boxed:

IL_0003:  box        [mscorlib]System.Int32

This line is also a hint:

IL_0002:  callvirt   instance int32 [mscorlib]System.Object::GetHashCode()

An Int32, by definition, is just 32 bits. For it to support virtual functions such as GetHashCode, it would need a virtual method table which would add size beyond the integer value. It needs to grow in size by the process of boxing to support virtual functions.

This also means that structs can’t implement interfaces, despite what the documentation says. Methods, properties, events, and indexers in interfaces are all implicitly types of virtual functions. Interface types are also reference types, again by definition. Let’s look at another example to show this:

static class Test
{
	static void Goo()
	{
		Compare(123, 456);
	}
 
	static int Compare(IComparable a, IComparable b)
	{
		return a.CompareTo(b);
	}
}

Compare takes two IComparable parameters. Since the documentation says that Int32 implements IComparable, there shouldn’t be anything to do. Let’s look at the IL to see what actually happens.

.method private hidebysig static void  Goo() cil managed
{
	.maxstack  8
	IL_0000:  nop
	IL_0001:  ldc.i4.s   123
	IL_0003:  box        [mscorlib]System.Int32
	IL_0008:  ldc.i4     0x1c8
	IL_000d:  box        [mscorlib]System.Int32
	IL_0012:  call       int32 Test::Compare(class [mscorlib]System.IComparable,
												class [mscorlib]System.IComparable)
	IL_0017:  pop
	IL_0018:  ret
}
 
.method private hidebysig static int32 Compare(
	class [mscorlib]System.IComparable a,
	class [mscorlib]System.IComparable b) cil managed
{
	.maxstack  2
	.locals init (int32 V_0)
	IL_0000:  nop
	IL_0001:  ldarg.0
	IL_0002:  ldarg.1
	IL_0003:  callvirt   instance int32 [mscorlib]System.IComparable::CompareTo(object)
	IL_0008:  stloc.0
	IL_0009:  br.s       IL_000b
 
	IL_000b:  ldloc.0
	IL_000c:  ret
}

Again we see the Int32 parameters get boxed:

IL_0001:  ldc.i4.s   123
IL_0003:  box        [mscorlib]System.Int32
IL_0008:  ldc.i4     0x1c8
IL_000d:  box        [mscorlib]System.Int32

Then we see the virtual function call for the interface method:

IL_0003:  callvirt   instance int32 [mscorlib]System.IComparable::CompareTo(object)

So how does boxing solve these problem of value types not actually deriving from Object and not actually implementing interfaces? Well the boxed type is free to do that. While the actual name and details of the class our value types are boxed to is hidden from us, we can imagine a class like this:

class BoxedInt32
	: ValueType
	, IComparable
	, IFormattable
	, IConvertible
	, IComparable<int>
	, IEquatable<int>
{
	private Int32 Value;
 
	// Boxing calls this to create an instance of the boxed type class
	public BoxedInt32(int value)
	{
		Value = value;
	}
 
	// Implement this to satisfy IComparable
	public int CompareTo(object other)
	{
		if (!(other is Int32))
		{
			throw new ArgumentException("Object must be of type Int32");
		}
 
		return Value.CompareTo((Int32)other);
	}
 
	// ... methods implementing the other interfaces
}

This class fulfills all the requirements. It’s a class, so it’s a reference type. It extends from ValueType, so it ultimately derives from Object. It implements all the interfaces that an Int32 supposedly does. It also holds the actual value type as an Int32 field. So this class is like a reference type version of a what the documentation says an Int32 is.

One final detail: notice how the boxed type doesn’t do much work of its own. Instead, its interface functions are implemented by calling the actual function with the same name on the value type. This is possible because non-virtual methods don’t require a virtual method table and therefore don’t add any size to the value type. To show this in action, let’s make another little example:

static class Test
{
	struct StructWithInterface : IComparable
	{
		public int CompareTo(object o)
		{
			Debug.Log("StructWithInterface.CompareTo(object) called!");
			return 0;
		}
	}
 
	static void Bar()
	{
		StructWithInterface swi = new StructWithInterface();
		Compare(swi, 123);
	}
 
	// ... same Compare() as before
}

Here we have a struct that supposedly implements an interface. Let’s look at the IL to see what happens:

.method private hidebysig static void  Bar() cil managed
{
	.maxstack  2
	.locals init (valuetype Test/StructWithInterface V_0)
	IL_0000:  nop
	IL_0001:  ldloca.s   V_0
	IL_0003:  initobj    Test/StructWithInterface
	IL_0009:  ldloc.0
	IL_000a:  box        Test/StructWithInterface
	IL_000f:  ldc.i4.s   123
	IL_0011:  box        [mscorlib]System.Int32
	IL_0016:  call       int32 Test::Compare(class [mscorlib]System.IComparable,
										class [mscorlib]System.IComparable)
	IL_001b:  pop
	IL_001c:  ret
}

First we see the boxing of both the StructWithInterface struct and the Int32 primitive:

IL_000a:  box        Test/StructWithInterface
IL_000f:  ldc.i4.s   123
IL_0011:  box        [mscorlib]System.Int32

Then we run the code and see the debug log message get printed:

StructWithInterface.CompareTo(object) called!

This means that whatever class StructWithInterface got boxed to, its CompareTo was implemented by calling our CompareTo in StructWithInterface.

So how does this relate to the C++ scripting system? Well, since part 11 we’ve strived to implement all three forms of value types: primitives, enums, and structs. Boxing and unboxing were added in part 17. Still, there were several discrepencies that needed to be addressed for a more accurate representation in C++ of how value types really work in C#.

Let’s start with primitives. Until now, C# primitives were represented as just C++ primitives. An Int32 would turn into int32_t, Single would turn into float, and Byte would turn into int8_t. This is very close to being right, but needed a tweak to how they’re boxed. Previously, we’d box primitives by calling a constructor on Object:

int32_t i = 123;
Object o(123);

That worked great as long as we just wanted an Object, but it didn’t give us any way to get a ValueType or any interfaces such as IComparable. To get that, we need to wrap the primitive in its own struct so we can provide conversion operators:

namespace System
{
	struct Int32
	{
		// Just holds one field, so this is still just the size of the field
		int32_t Value;
 
		// Default to 0
		Int32()
			: Value(0)
		{
		}
 
		// Implicitly convert from int32_t primitive to Int32 struct
		Int32(int32_t value)
			: Value(value)
		{
		}
 
		// Implicitly convert from Int32 struct to int32_t primitive
		operator int32_t() const
		{
			return Value;
		}
 
		// Explicitly box to all base classes
		explicit operator Object() const
		{
			return Object(BoxInt32(Value));
		}
		explicit operator ValueType() const
		{
			return ValueType(BoxInt32(Value));
		}
 
		// Explicitly box to all interfaces
		explicit operator IComparable() const
		{
			return IComparable(BoxInt32(Value));
		}
		explicit operator IFormattable() const
		{
			return IFormattable(BoxInt32(Value));
		}
		explicit operator IConvertible() const
		{
			return IConvertible(BoxInt32(Value));
		}
	};
}

This struct gives us the ability to interoperate with the primitive type and to box to any base class or interface:

// Implicitly convert between struct and primitive
Int32 i = 123;
int32_t i2 = i;
 
// Box to any base class or interface
ValueType v = (ValueType)i;
IComparable c = (IComparable)i;

Note that boxing is explicit in the C++ scripting system, unlike in C# where it is implicit. This is intentionally different from C# because boxing causes garbage to be created and it is far too easy to accidentally box in C#. Examples like above and many cases involving generics are ample proof that even experienced C# programmers will inadertently trigger GC allocations quite frequently. Forcing an explicit boxing via a cast, similar to unboxing, should provide a little bit of helpful friction and hopefully avoid inadvertent boxing.

Now let’s consider enums. Previously, these were using enum struct as it matched C# enums quite well:

enum struct Name : int32_t
{
	First = 0,
	Middle = 1,
	Last = 2
};

We could use it like this:

// Create and convert enums
Name n = Name::First;
Name n2 = (Name)1;
int32_t i = (int32_t)n;
 
// Box, but only to Object
Object o(n);

This also didn’t provide the ability to box the enum to various base classes and interfaces. So the enum struct was swapped out for a struct with static fields for each enumerator:

struct Name
{
	static const Name First;
	static const Name Middle;
	static const Name Last;
 
	int32_t Value;
 
	// Explicit conversion from the primitive type
	explicit Name(int32_t value)
		: Value(value)
	{
	}
 
	// Explicit conversion to the primitive type
	explicit operator int32_t() const
	{
		return Value;
	}
 
	// Equality and inequality operators
	bool operator==(Name other)
	{
		return Value == other.Value;
	}
	bool operator!=(Name other)
	{
		return Value != other.Value;
	}
 
	// Explicitly box to all base types
	explicit operator Enum()
	{
		return Enum(BoxName(Value));
	}
 
	explicit operator ValueType()
	{
		return ValueType(BoxName(Value));
	}
	explicit operator Object()
	{
		return Object(BoxName(Value));
	}
 
	// Explicitly box to all interface types
	explicit operator IFormattable()
	{
		return IFormattable(BoxName(Value));
	}
	explicit operator IConvertible()
	{
		return IConvertible(BoxName(Value));
	}
	explicit operator IComparable()
	{
		return IComparable(BoxName(Value));
	}
};
 
// Initialize static constants
const Name Name::First(0);
const Name Name::Middle(1);
const Name Name::Last(2);

The enum behaves very similarly to before with the enum struct approach, but it now supports boxing to more types:

// Explicitly convert between enum and primitive
Name n(1);
int32_t i = (int32_t)n;
 
// Overloaded operators make comparison feel natural
if (n == Name::Middle)
{
	String msg = "Middle name: ";
	Debug::Log(msg);
}
 
// Box to any base class or interface
Enum e = (Enum)n;
IComparable c = (IComparable)n;

Finally, let’s look at structs. The C++ scripting system draws a distinction between “full structs” like Vector3 that can be represented in C++ and “managed structs” like RaycastHit that can’t because of some field like transform. They looked like this:

struct Vector3
{
	float x;
	float y;
	float z;
 
	// ... methods
};
 
struct RaycastHit : ValueType
{
	// ... methods
};

And we’d use them like this:

// Box a "full struct" to an Object (and only Object)
Vector3 v(1.0f, 2.0f, 3.0f);
Object o(v);
 
// Pass a "managed struct" as an Object with no boxing
void Foo(Object o)
{
	o.ToString();
}
RaycastHit r;
Foo(r);

Note that the latter case where no boxing is required is actually incorrect and will either cause an exception to be thrown or incorrect behavior. That’s because calling ToString will pass the Handle field into C# where it’ll be used to look up the Object in its ObjectStore. However, the Handle for a “managed struct” actually refers to the StructStore for that type of struct. So either the Object won’t be found in the ObjectStore or the wrong Object will be found. We need to fix this!

We now know that RaycastHit doesn’t really derive from ValueType but instead boxes to a type that derives from ValueType. So we need to adjust the “managed struct” to not derive from ValueType. Boxing operators need to be added to both kinds of structs. So we end up with this:

struct Vector3
{
	float x;
	float y;
	float z;
 
	// Boxing to all base classes
	explicit operator Object() const
	{
		return Object(BoxVector3(*this));
	}
	explicit operator ValueType() const
	{
		return ValueType(BoxVector3(*this));
	}
 
	// ... boxing to all interface types (Vector3 has none)
};
 
struct ManagedType
{
	// C# StructStore handle
	int32_t Handle;
};
 
struct RaycastHit : ManagedType
{
	// Boxing to all base classes
	explicit operator Object() const
	{
		return Object(BoxRaycastHit(Handle));
	}
	explicit operator ValueType() const
	{
		return ValueType(BoxRaycastHit(Handle));
	}
 
	// ... boxing to all interface types (RaycastHit has none)
};

Now we have full boxing support and we’ve fixed the problem of using the wrong kind of handle:

// Box a "full struct" to any base class or interface
Vector3 v(1.0f, 2.0f, 3.0f);
ValueType vt(v);
 
// Box a "managed struct" to any base class or interface
void Foo(Object o)
{
	o.ToString();
}
RaycastHit r;
Object o(r);
Foo(o);

Now that we’re passing the boxed RaycastHit instead of the actual RaycastHit, we’ll be using the boxed struct’s Handle field. That’s the one returned from the boxing function, which actually refers to the ObjectStore in C#. So we’re using the correct handle now and won’t have any incorrect behavior or exceptions as we did before.

That wraps up the discussion of value types for this week. Hopefully this has been helpful both for understanding how value types and boxing work and for how they’re implemented in the C++ scripting system. As usual, this is all pushed now to the GitHub project so feel free to check it out. If you’ve got any questions or comments, feel free to speak up.

C++ Scripting: Part 28 – Value Types Overhaul

Comments