JacksonDunstan.com

This week we continue to look at the C++ that IL2CPP outputs for C# to get a better understanding of what our C# is really doing. Today we’ll look at how abstract methods work, whether casting of sealed classes is faster than non-sealed classes, and what happens when creating a delegate.

Abstract Methods

Last week we looked at function costs but missed one: abstract methods. Let’s rectify that with a quick test to make sure there are no surprises:

abstract class AbstractClass
{
	public abstract void AbstractMethod();
}
 
class DerivedClass : AbstractClass
{
	public override void AbstractMethod()
	{
	}
}
 
static class TestClass
{
	static void CallAbstractMethod(AbstractClass x)
	{
		x.AbstractMethod();
	}
}

Now let’s look at the C++ that IL2CPP generates for CallAbstractMethod:

extern "C"  void TestClass_CallAbstractMethod_m508825974 (RuntimeObject * __this /* static, unused */, AbstractClass_t1241769765 * ___x0, const RuntimeMethod* method)
{
	{
		AbstractClass_t1241769765 * L_0 = ___x0;
		NullCheck(L_0);
		VirtActionInvoker0::Invoke(4 /* System.Void AbstractClass::AbstractMethod() */, L_0);
		return;
	}
}

This is exactly like calling a virtual function. That might not be surprising since abstract functions are essentially just virtual functions, but last week we found that calling interface functions is actually different and more expensive than virtual functions. So it’s good to confirm that abstract methods don’t have any such overhead.

Sealed Classes

Four weeks ago we looked at casting but didn’t take sealed classes into account. Is casting to a sealed class type faster than casting to a non-sealed class type? Let’s see!

static class TestClass
{
	static SealedClass CastSealedClass(AbstractClass x)
	{
		return (SealedClass)x;
	}
 
	static SealedClass AsCastSealedClass(AbstractClass x)
	{
		return x as SealedClass;
	}
}

And here’s the C++:

extern "C"  SealedClass_t653026658 * TestClass_CastSealedClass_m2115918270 (RuntimeObject * __this /* static, unused */, AbstractClass_t1241769765 * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CastSealedClass_m2115918270_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		AbstractClass_t1241769765 * L_0 = ___x0;
		return ((SealedClass_t653026658 *)CastclassSealed((RuntimeObject*)L_0, SealedClass_t653026658_il2cpp_TypeInfo_var));
	}
}
 
extern "C"  SealedClass_t653026658 * TestClass_AsCastSealedClass_m531262610 (RuntimeObject * __this /* static, unused */, AbstractClass_t1241769765 * ___x0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_AsCastSealedClass_m531262610_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		AbstractClass_t1241769765 * L_0 = ___x0;
		return ((SealedClass_t653026658 *)IsInstSealed((RuntimeObject*)L_0, SealedClass_t653026658_il2cpp_TypeInfo_var));
	}
}

In the case of the normal (SealedClass)x cast, everything’s the same except that CastclassSealed does the cast instead of CastclassClass. So let’s look into CastclassSealed:

inline RuntimeObject* CastclassSealed(RuntimeObject *obj, RuntimeClass* targetType)
{
    if (!obj)
        return NULL;
 
    RuntimeObject* result = IsInstSealed(obj, targetType);
    if (result)
        return result;
 
    RaiseInvalidCastException(obj, targetType);
    return NULL;
}

This is also the same as with casting to a non-sealed type except IsInstSealed does the type check instead of IsInstClass. So, once again, let’s go look at it:

inline RuntimeObject* IsInstSealed(RuntimeObject *obj, RuntimeClass* targetType)
{
#if IL2CPP_DEBUG
    IL2CPP_ASSERT((targetType->flags & TYPE_ATTRIBUTE_SEALED) != 0);
    IL2CPP_ASSERT((targetType->flags & TYPE_ATTRIBUTE_INTERFACE) == 0);
#endif
    if (!obj)
        return NULL;
 
    // optimized version to compare sealed classes
    return (obj->klass == targetType ? obj : NULL);
}

The if (!obj) return NULL; part is the same, but the final line has changed from this:

return il2cpp::vm::Class::HasParentUnsafe(obj->klass, targetType) ? obj : NULL;

We previously counted HasParentUnsafe as two CPU cache misses, but now we’re just reading the klass field which is the first field and likely to be in CPU cache. So casting to a sealed class uses an optimized code path that’s quite a bit quicker than with a non-sealed class. The same is true with the as cast except that it goes straight to calling IsInstSealed instead of first calling CastclassSealed, so it remains the faster cast.

Creating Delegates

Last week we looked at the cost of calling delegates, but we didn’t look at the cost of creating delegates. Let’s fill that gap by creating a lambda. The results are the same for creating explicit delegates (return delegate {};) and method groups (return MyMethod;), so we’ll only look at lambdas here.

static class TestClass
{
	static Action CreateLambdaDelegate()
	{
		return () => { };
	}
}

Here’s the C++ that IL2CPP generates:

extern "C"  Action_t1264377477 * TestClass_CreateLambdaDelegate_m2840398643 (RuntimeObject * __this /* static, unused */, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CreateLambdaDelegate_m2840398643_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	{
		Action_t1264377477 * L_0 = ((TestClass_t492893797_StaticFields*)il2cpp_codegen_static_fields_for(TestClass_t492893797_il2cpp_TypeInfo_var))->get_U3CU3Ef__amU24cache0_0();
		if (L_0)
		{
			goto IL_0018;
		}
	}
	{
		intptr_t L_1 = (intptr_t)TestClass_U3CCreateLambdaDelegateU3Em__0_m3090938195_RuntimeMethod_var;
		Action_t1264377477 * L_2 = (Action_t1264377477 *)il2cpp_codegen_object_new(Action_t1264377477_il2cpp_TypeInfo_var);
		Action__ctor_m2994342681(L_2, NULL, L_1, /*hidden argument*/NULL);
		((TestClass_t492893797_StaticFields*)il2cpp_codegen_static_fields_for(TestClass_t492893797_il2cpp_TypeInfo_var))->set_U3CU3Ef__amU24cache0_0(L_2);
	}
 
IL_0018:
	{
		Action_t1264377477 * L_3 = ((TestClass_t492893797_StaticFields*)il2cpp_codegen_static_fields_for(TestClass_t492893797_il2cpp_TypeInfo_var))->get_U3CU3Ef__amU24cache0_0();
		return L_3;
	}
}

Wow, that’s a lot of C++ code for one line of C#! First off, we get the same method initialization overhead that’s present with either type of casting, interface function calls, string literals, throwing exceptions, calling generic methods, or calling methods of generic types. It’s expensive.

Next there’s the call to il2cpp_codegen_static_fields_for:

inline void* il2cpp_codegen_static_fields_for(RuntimeClass* klass)
{
    return klass->static_fields;
}

And there’s the call to get_U3CU3Ef__amU24cache0_0:

inline Action_t1264377477 * get_U3CU3Ef__amU24cache0_0() const { return ___U3CU3Ef__amU24cache0_0; }

These seem really cheap since they just return fields of classes, but the crucial question is whether those class instances are likely to be in CPU cache or not. If this code path is getting run frequently then the answer is probably “yes,” but it probably isn’t being run frequently. If it’s not, then it’s likely that this line misses CPU cache twice, once for static_fields and once for ___U3CU3Ef__amU24cache0_0. If that’s the case, this is quite a slow line.

Then an if checks if there was an Action retrieved from the cache. If there was, the whole next block is skipped and the action is returned. If not, the next block executes starting with a call to il2cpp_codegen_object_new:

inline RuntimeObject* il2cpp_codegen_object_new(RuntimeClass *klass)
{
    return il2cpp::vm::Object::New(klass);
}

Object::New is in libil2cpp, which is located in the Unity installation directory. There’s way too much code to put into this article, but suffice to say it involves performing very expensive work such as memory allocation and class initialization.

Next, the Action constructor is called. This isn’t something we can do in C#, but the generated IL2CPP can call it to set up the delegate. Here’s how it looks:

extern "C"  void Action__ctor_m2994342681 (Action_t1264377477 * __this, RuntimeObject * ___object0, intptr_t ___method1, const RuntimeMethod* method)
{
	__this->set_method_ptr_0(il2cpp_codegen_get_method_pointer((RuntimeMethod*)___method1));
	__this->set_method_3(___method1);
	__this->set_m_target_2(___object0);
}

These fill in the Action class’ fields with pointers to the lambda method to invoke when the Action is invoked.

Finally, the created Action is written back to the il2cpp_codegen_static_fields_for cache, read from the cache, and returned.

All this creation has been very expensive, but it only happens the first time. Subsequent lambda creation involves two cache lookups at the cost of two cache misses without any GC allocations, but the cache misses are still quite expensive.

Creating Closure Delegates

The above lambda was empty, but what happens if we create a “closure” where the lambda captures some local variables so the state of one lambda is different than the state of another lambda. Can the lambdas still be cached? Let’s find out:

static class TestClass
{
	static Action CreateLambdaClosureDelegate(int i)
	{
		return () => { i++; };
	}
}

Here’s the C++ that gets generated:

extern "C"  Action_t1264377477 * TestClass_CreateLambdaClosureDelegate_m38452069 (RuntimeObject * __this /* static, unused */, int32_t ___i0, const RuntimeMethod* method)
{
	static bool s_Il2CppMethodInitialized;
	if (!s_Il2CppMethodInitialized)
	{
		il2cpp_codegen_initialize_method (TestClass_CreateLambdaClosureDelegate_m38452069_MetadataUsageId);
		s_Il2CppMethodInitialized = true;
	}
	U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 * V_0 = NULL;
	{
		U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 * L_0 = (U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 *)il2cpp_codegen_object_new(U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035_il2cpp_TypeInfo_var);
		U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0__ctor_m2228767834(L_0, /*hidden argument*/NULL);
		V_0 = L_0;
		U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 * L_1 = V_0;
		int32_t L_2 = ___i0;
		NullCheck(L_1);
		L_1->set_i_0(L_2);
		U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 * L_3 = V_0;
		intptr_t L_4 = (intptr_t)U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_U3CU3Em__0_m2209333415_RuntimeMethod_var;
		Action_t1264377477 * L_5 = (Action_t1264377477 *)il2cpp_codegen_object_new(Action_t1264377477_il2cpp_TypeInfo_var);
		Action__ctor_m2994342681(L_5, L_3, L_4, /*hidden argument*/NULL);
		return L_5;
	}
}

Notice that the caching code has been removed. Instead, we just see the il2cpp_codegen_object_new that GC allocates the delegate class. That class looks like this:

struct  U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035  : public RuntimeObject
{
public:
	// System.Int32 TestClass/<CreateLambdaClosureDelegate>c__AnonStorey0::i
	int32_t ___i_0;
 
public:
	inline static int32_t get_offset_of_i_0() { return static_cast<int32_t>(offsetof(U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035, ___i_0)); }
	inline int32_t get_i_0() const { return ___i_0; }
	inline int32_t* get_address_of_i_0() { return &___i_0; }
	inline void set_i_0(int32_t value)
	{
		___i_0 = value;
	}
};

Here we see the int that was captured as part of the lambda closure. It’s now a field of the class and a suite of accessors have been generated for it.

Next in the function is a call to construct this class:

extern "C"  void U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0__ctor_m2228767834 (U3CCreateLambdaClosureDelegateU3Ec__AnonStorey0_t3868246035 * __this, const RuntimeMethod* method)
{
	{
		Object__ctor_m297566312(__this, /*hidden argument*/NULL);
		return;
	}
}

This just calls the empty constructor for object:

extern "C"  void Object__ctor_m297566312 (RuntimeObject * __this, const RuntimeMethod* method)
{
	{
		return;
	}
}

Next, there’s an inexplicable null check for the delegate class instance that was just created before its int field is set from the i parameter. Finally, an Action is GC allocated with il2cpp_codegen_object_new and its method pointer fields are set to the methods of the delegate class instance.

So creating a closure is way more expensive than creating a non-closure delegate. This will happen any time the delegate class needs to have any fields, such as by reading or writing local variables. There’s no caching of delegate instances and two GC allocations need to occur: one for the delegate class that’s hidden from us in C# and one for the Action we use to represent it.

Summary

Calling abstract methods is exactly the same as calling virtual methods that aren’t declared on an interface. They’re nowhere near as fast as non-virtual methods, but faster than interface methods. Try to avoid abstract methods in performance-critical code.

Casting to a sealed class type is faster than casting to a non-sealed class type. Add the keyword sealed for any class that doesn’t have classes deriving from it and you’ll get a speedup any time you cast to that type.

Creating delegates that are not closures may involve up to three cache misses, which is quite expensive. The first creation also involves some very expensive GC allocation, but not on subsequent times. Avoid creating delegates, even non-closures, where performance counts.

Creating delegates that are closures always involves two expensive GC allocation that will likely result in a frame hitch later on and memory fragmentation. These should definitely be used very sparingly.

IL2CPP Output: Abstract Classes, Sealed Classes, and Delegates

Abstract Methods

Sealed Classes

Creating Delegates

Creating Closure Delegates

Summary

Comments