C++ Scripting: Part 4 – Performance Validation
In the first three parts of this series, we focused on setting up a development environment that makes it easy and safe to write our game code in C++. Today’s article takes a step back to assess where we are in terms of performance. Is what we’ve built so far viable, or are the calls between C# and C++ too expensive? To find out we’ll use the existing framework to write some simple performance tests.
Table of Contents
- Part 1: C#/C++ Communication
- Part 2: Update C++ Without Restarting the Editor
- Part 3: Object-Oriented Bindings
- Part 4: Performance Validation
- Part 5: Bindings Code Generator
- Part 6: Building the C++ Plugin
- Part 7: MonoBehaviour Messages
- Part 8: Platform-Dependent Compilation
- Part 9: Out and Ref Parameters
- Part 10: Full Generics Support
- Part 11: Collaborators, Structs, and Enums
- Part 12: Exceptions
- Part 13: Operator Overloading, Indexers, and Type Conversion
- Part 14: Arrays
- Part 15: Delegates
- Part 16: Events
- Part 17: Boxing and Unboxing
- Part 18: Array Index Operator
- Part 19: Implement C# Interfaces with C++ Classes
- Part 20: Performance Improvements
- Part 21: Implement C# Properties and Indexers in C++
- Part 22: Full Base Type Support
- Part 23: Base Type APIs
- Part 24: Default Parameters
- Part 25: Full Type Hierarchy
- Part 26: Hot Reloading
- Part 27: Foreach Loops
- Part 28: Value Types Overhaul
- Part 29: Factory Functions and New MonoBehaviours
- Part 30: Overloaded Types and Decimal
There are two kinds of function calls we need to performance test today. First are calls from C++ game code to the C# Unity API. This will likely be the bulk of the calls that go across the C#/C++ boundary. Second are calls from C# to C++ to notify it of Unity events such as MonoBehaviour.Update
.
Since we already have both kinds of calls set up, it’s easy to modify our existing code to make it into a pair of performance tests. First, let’s make a lot of calls to the Unity API from C++ and see how long it takes. To do so, we first replace the body of MonoBehaviourUpdate
with this:
GameObject go; Transform transform = go.GetTransform(); Vector3 sum; for (int i = 0; i < 10000000; ++i) { sum += transform.GetPosition(); } transform.SetPosition(sum);
Here we’re calling the Transform.position
getter property 10 million times. Each call internally consists of several steps:
- Call the function pointer for the C# delegate holding
TransformGetPosition
. Pass in anint32_t
object handle. - In C#,
TransformGetPosition
passes theint
object handle toObjectStore.Get
ObjectStore.Get
indexes into the array ofobject
via the handle and returns itTransformGetPosition
casts theobject
toTransform
TransformGetPosition
calls theposition
property getterTransformGetPosition
returns theVector3
to C++. It (three floats) is copied.
It’s a fair amount of work representative of a typical Unity API call. Some may be cheaper and some may be more expensive, but this one seems pretty typical. It also doesn’t cause the Unity engine to do anything expensive, so the work we’re measuring is mostly in the C#/C++ communication layer.
To get this to work, a couple of tweaks needed to be made to the code from the previous article. First, we only had the setter for Transform.position
so the getter needed to be implemented. It’s really boilerplate work to do so. Here are the parts required:
//////// // C# // //////// // Add a parameter to InitDelegate and Init IntPtr transformPropertyGetPosition // Declare a new delegate delegate Vector3 TransformGetPositionDelegate(int thisHandle); // Add a new parameter to the Init call Marshal.GetFunctionPointerForDelegate( new TransformGetPositionDelegate( TransformGetPosition)), // Add a new function to call the getter static Vector3 TransformGetPosition(int thisHandle) { Transform thiz = (Transform)ObjectStore.Get(thisHandle); Vector3 obj = thiz.position; return obj; }
///////// // C++ // ///////// // Declare a function pointer Vector3 (*TransformGetPosition)(int32_t thisHandle); // Declare the method of the Transform class Vector3 GetPosition(); // Define the method of the Transform class Vector3 Transform::GetPosition() { return Plugin::TransformGetPosition(Handle); } // Add a parameter to Init Vector3 (*transformGetPosition)(int32_t thisHandle), // Save the parameter to the global function pointer TransformGetPosition = transformGetPosition;
The second addition was needed in order to run this test on an Android device instead of just in the Unity editor. In AOT environments like Android and iOS, you’ll need to add an attribute to each function you want to get a function pointer for. It’s quite simple:
// This namespace has the attribute using AOT; // This is the attribute to add // Pass it the Type of the delegate for this function [MonoPInvokeCallback(typeof(TransformGetPositionDelegate))] static UnityEngine.Vector3 TransformGetPosition(int thisHandle) { Transform thiz = (Transform)ObjectStore.Get(thisHandle); Vector3 obj = thiz.position; return obj; }
Next, set up a C# version of the same test. Just put this somewhere that runs once, like Awake
:
GameObject go = new GameObject(); Transform transform = go.transform; Vector3 sum = default(Vector3); for (int i = 0; i < 10000000; ++i) { sum += transform.position; } transform.position = sum;
It’s just a C# port of the C++ code. Now wrap both of these chunks of code with Stopwatch.StartNew
at the start and Stopwatch.elapsedMilliseconds
at the end. I ran the test on this device:
- LG Nexus 5X
- Android 7.1.2
- Unity 2017.1.0f3
And here are the results I got:
Language | Time |
---|---|
C# | 391 |
C++ | 761 |
C++ was always going to lose this fight. It’s not that it’s slower than C#, it’s that it has to do additional work on top of just the C#. That includes calling through function pointers and delegates, copying parameters and return values, and mapping object handles to actual object types. In the end, it takes almost twice as long to make the same calls to the Unity API from C++ as it does from C#.
However, it’s very much worth noting that this test includes 10 million Unity API calls. Most games won’t come anywhere near that many calls in a single frame, or they would already have awful performance in C#. So another way to look at it is in terms of the number of Unity API calls you can do in one millisecond:
This chart paints a much more realistic picture than the first one. Here we see that C++ can still make 13,140 Unity API calls in a single millisecond. That’s plenty for all but the most complex games, but games of that complexity probably wouldn’t run on this lowly Android device anyhow.
Next, let’s look at the performance of C# calling into C++ code. This is the less-frequently used side as the number of events that C# needs to pass along from the Unity API is probably rather small. At most, there could be a lot of MonoBehaviour.Update
or MonoBehaviour.FixedUpdate
calls in a complex scene using a lot of attached scripts. Regardless, let’s see how to implement the test code:
///////// // C++ // ///////// // Modify an existing function into a simple integer adder DLLEXPORT int32_t MonoBehaviourUpdate(int32_t a, int32_t b) { return a + b; }
//////// // C# // //////// // Add a C# integer adder for comparison static int Add(int a, int b){ return a + b; } // Call them both a lot of times // Use a stopwatch to measure the times var sw = System.Diagnostics.Stopwatch.StartNew(); int sum = 0; for (int i = 0; i < 10000000; ++i) { sum += MonoBehaviourUpdate(i, i); } long csharpTime = sw.ElapsedMilliseconds; sw.Reset(); sw.Start(); sum = 0; for (int i = 0; i < 10000000; ++i) { sum += Dummy(i, i); } long cppTime = sw.ElapsedMilliseconds;
I ran this test on the same device and got these results:
Language | Time |
---|---|
C# | 0 |
C++ | 165 |
It’s likely that the C++ compiler inlined the C++ function call and thus resulted in virtually no time, hence the measurement of zero milliseconds. This is unfortunate as it’s not the normal case. We all know that C# code does not execute instantaneously. Still, we can analyze the C++ side.
10 million calls from C# into C++ took 165 milliseconds. That’s several times faster than the Unity API calls because much less work was done. There’s no object handle work or Unity engine work, for example. It’s also fast if we look at it in terms of calls per millisecond. In that sense, we’re able to make 60,606 in a single millisecond. Even in a very complex scene full of scripts running Update
or FixedUpdate
, we’re not likely to use more than 1,000 calls per frame. That should only take a sixth of a millisecond, which is probably negligible compared to the actual work those scripts are doing.
So it seems that the overhead of C#/C++ communication probably isn’t going to pose a performance issue for most games. On the positive side, using C++ for game code will probably result in a game-wide performance boost as we cut out the garbage collector, generated IL2CPP overhead, and gain access to CPU features like SIMD. Whether C++ will be a net performance gain or loss is up to the individual game, but it’s not likely to be a huge loss.
With this confirmation that we’re on the right performance track, we can continue the series next week by making our programming lives easier and eliminating all that boilerplate every time we want to expose a new Unity API function.
#1 by Ed Earl on July 31st, 2017 ·
Good stuff. Would be interesting to try this on iOS, too, since the Unity manual says: “Managed-to-unmanaged calls are quite processor intensive on iOS. Try to avoid calling multiple native methods per frame.”
https://docs.unity3d.com/Manual/PluginsForIOS.html
#2 by jackson on July 31st, 2017 ·
I think that tip is out of date, but a performance test to verify is a good idea. Thanks!