JacksonDunstan.com

In the first three parts of this series, we focused on setting up a development environment that makes it easy and safe to write our game code in C++. Today’s article takes a step back to assess where we are in terms of performance. Is what we’ve built so far viable, or are the calls between C# and C++ too expensive? To find out we’ll use the existing framework to write some simple performance tests.

Table of Contents

There are two kinds of function calls we need to performance test today. First are calls from C++ game code to the C# Unity API. This will likely be the bulk of the calls that go across the C#/C++ boundary. Second are calls from C# to C++ to notify it of Unity events such as MonoBehaviour.Update.

Since we already have both kinds of calls set up, it’s easy to modify our existing code to make it into a pair of performance tests. First, let’s make a lot of calls to the Unity API from C++ and see how long it takes. To do so, we first replace the body of MonoBehaviourUpdate with this:

GameObject go;
Transform transform = go.GetTransform();
Vector3 sum;
for (int i = 0; i < 10000000; ++i)
{
	sum += transform.GetPosition();
}
transform.SetPosition(sum);

Here we’re calling the Transform.position getter property 10 million times. Each call internally consists of several steps:

Call the function pointer for the C# delegate holding TransformGetPosition. Pass in an int32_t object handle.
In C#, TransformGetPosition passes the int object handle to ObjectStore.Get
ObjectStore.Get indexes into the array of object via the handle and returns it
TransformGetPosition casts the object to Transform
TransformGetPosition calls the position property getter
TransformGetPosition returns the Vector3 to C++. It (three floats) is copied.

It’s a fair amount of work representative of a typical Unity API call. Some may be cheaper and some may be more expensive, but this one seems pretty typical. It also doesn’t cause the Unity engine to do anything expensive, so the work we’re measuring is mostly in the C#/C++ communication layer.

To get this to work, a couple of tweaks needed to be made to the code from the previous article. First, we only had the setter for Transform.position so the getter needed to be implemented. It’s really boilerplate work to do so. Here are the parts required:

////////
// C# //
////////
 
// Add a parameter to InitDelegate and Init
IntPtr transformPropertyGetPosition
 
// Declare a new delegate
delegate Vector3 TransformGetPositionDelegate(int thisHandle);
 
// Add a new parameter to the Init call
Marshal.GetFunctionPointerForDelegate(
	new TransformGetPositionDelegate(
		TransformGetPosition)),
 
// Add a new function to call the getter
static Vector3 TransformGetPosition(int thisHandle)
{
	Transform thiz = (Transform)ObjectStore.Get(thisHandle);
	Vector3 obj = thiz.position;
	return obj;
}

/////////
// C++ //
/////////
 
// Declare a function pointer
Vector3 (*TransformGetPosition)(int32_t thisHandle);
 
// Declare the method of the Transform class
Vector3 GetPosition();
 
// Define the method of the Transform class
Vector3 Transform::GetPosition()
{
	return Plugin::TransformGetPosition(Handle);
}
 
// Add a parameter to Init
Vector3 (*transformGetPosition)(int32_t thisHandle),
 
// Save the parameter to the global function pointer
TransformGetPosition = transformGetPosition;

The second addition was needed in order to run this test on an Android device instead of just in the Unity editor. In AOT environments like Android and iOS, you’ll need to add an attribute to each function you want to get a function pointer for. It’s quite simple:

// This namespace has the attribute
using AOT;
 
// This is the attribute to add
// Pass it the Type of the delegate for this function
[MonoPInvokeCallback(typeof(TransformGetPositionDelegate))]
static UnityEngine.Vector3 TransformGetPosition(int thisHandle)
{
	Transform thiz = (Transform)ObjectStore.Get(thisHandle);
	Vector3 obj = thiz.position;
	return obj;
}

Next, set up a C# version of the same test. Just put this somewhere that runs once, like Awake:

GameObject go = new GameObject();
Transform transform = go.transform;
Vector3 sum = default(Vector3);
for (int i = 0; i < 10000000; ++i)
{
	sum += transform.position;
}
transform.position = sum;

It’s just a C# port of the C++ code. Now wrap both of these chunks of code with Stopwatch.StartNew at the start and Stopwatch.elapsedMilliseconds at the end. I ran the test on this device:

LG Nexus 5X
Android 7.1.2
Unity 2017.1.0f3

And here are the results I got:

Language	Time
C#	391
C++	761

C++ Unity API Performance Graph

C++ was always going to lose this fight. It’s not that it’s slower than C#, it’s that it has to do additional work on top of just the C#. That includes calling through function pointers and delegates, copying parameters and return values, and mapping object handles to actual object types. In the end, it takes almost twice as long to make the same calls to the Unity API from C++ as it does from C#.

However, it’s very much worth noting that this test includes 10 million Unity API calls. Most games won’t come anywhere near that many calls in a single frame, or they would already have awful performance in C#. So another way to look at it is in terms of the number of Unity API calls you can do in one millisecond:

C++ Unity API Calls/Millisecond

This chart paints a much more realistic picture than the first one. Here we see that C++ can still make 13,140 Unity API calls in a single millisecond. That’s plenty for all but the most complex games, but games of that complexity probably wouldn’t run on this lowly Android device anyhow.

Next, let’s look at the performance of C# calling into C++ code. This is the less-frequently used side as the number of events that C# needs to pass along from the Unity API is probably rather small. At most, there could be a lot of MonoBehaviour.Update or MonoBehaviour.FixedUpdate calls in a complex scene using a lot of attached scripts. Regardless, let’s see how to implement the test code:

/////////
// C++ //
/////////
 
// Modify an existing function into a simple integer adder
DLLEXPORT int32_t MonoBehaviourUpdate(int32_t a, int32_t b)
{
    return a + b;
}

////////
// C# //
////////
 
// Add a C# integer adder for comparison
static int Add(int a, int b){ return a + b; }
 
// Call them both a lot of times
// Use a stopwatch to measure the times
var sw = System.Diagnostics.Stopwatch.StartNew();
int sum = 0;
for (int i = 0; i < 10000000; ++i)
{
	sum += MonoBehaviourUpdate(i, i);
}
long csharpTime = sw.ElapsedMilliseconds;
 
sw.Reset();
sw.Start();
sum = 0;
for (int i = 0; i < 10000000; ++i)
{
	sum += Dummy(i, i);
}
long cppTime = sw.ElapsedMilliseconds;

I ran this test on the same device and got these results:

Language	Time
C#	0
C++	165

Calling C++ Performance

It’s likely that the C++ compiler inlined the C++ function call and thus resulted in virtually no time, hence the measurement of zero milliseconds. This is unfortunate as it’s not the normal case. We all know that C# code does not execute instantaneously. Still, we can analyze the C++ side.

10 million calls from C# into C++ took 165 milliseconds. That’s several times faster than the Unity API calls because much less work was done. There’s no object handle work or Unity engine work, for example. It’s also fast if we look at it in terms of calls per millisecond. In that sense, we’re able to make 60,606 in a single millisecond. Even in a very complex scene full of scripts running Update or FixedUpdate, we’re not likely to use more than 1,000 calls per frame. That should only take a sixth of a millisecond, which is probably negligible compared to the actual work those scripts are doing.

So it seems that the overhead of C#/C++ communication probably isn’t going to pose a performance issue for most games. On the positive side, using C++ for game code will probably result in a game-wide performance boost as we cut out the garbage collector, generated IL2CPP overhead, and gain access to CPU features like SIMD. Whether C++ will be a net performance gain or loss is up to the individual game, but it’s not likely to be a huge loss.

With this confirmation that we’re on the right performance track, we can continue the series next week by making our programming lives easier and eliminating all that boilerplate every time we want to expose a new Unity API function.

#1 by Ed Earl on July 31st, 2017 · Reply

Good stuff. Would be interesting to try this on iOS, too, since the Unity manual says: “Managed-to-unmanaged calls are quite processor intensive on iOS. Try to avoid calling multiple native methods per frame.”

https://docs.unity3d.com/Manual/PluginsForIOS.html

#2 by jackson on July 31st, 2017 · Reply

I think that tip is out of date, but a performance test to verify is a good idea. Thanks!

C++ Scripting: Part 4 – Performance Validation

Comments