IL2CPP Output for C# 7.3: ref
Return Values and Local Variables
Today we continue the series by looking at a pair of powerful, related features in C# 7.3: ref
return values and local variables. These enable some great optimizations, so let’s look at the IL2CPP output for them to make sure it’s as good as it looks.
Simple Local ref
Variables
Let’s start off the ref
enhancements with local variables that are references to other local variables.
static class TestClass { static int TestRefLocal(int x, int y) { ref int r = ref x; r = 30; r = ref y; r = 40; return x + y; } }
This code creates r
as a reference to x
, not a copy of it. To do this, we declare r
as ref int
instead of just int
and assign it to ref x
instead of just x
. Omitting either ref
is a compiler error, so there’s no need to worry about accidentally making a copy.
We can then assign to the ref
local variable, which changes the value it’s a reference to. In this case we start by assigning 30
to r
which refers to x
, so x
is effectively changed to 30
.
Then we reassign r
to reference another local variable, y
. To do this we simply assign it to ref y
. There’s no need to add ref
before the r
this time and doing so is a compiler error. After this we once again assign to r
, but this time it changes y
because r
now refers to that variable.
Now let’s see the C++ that IL2CPP generates for this in Unity 2018.3.0f2:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestRefLocal_m1B99E0AA6BD471E95C4DF28A0918787003C3D23A (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method) { { *((int32_t*)(&___x0)) = (int32_t)((int32_t)30); *((int32_t*)(&___y1)) = (int32_t)((int32_t)40); int32_t L_0 = ___x0; int32_t L_1 = ___y1; return ((int32_t)il2cpp_codegen_add((int32_t)L_0, (int32_t)L_1)); } }
The first two lines are rather complex, so let’s unpack them one bit at a time. On the right side of the assignment we see the int
literals 30
and 40
casted to int32_t
twice. Since int32_t
is the same as int
on current platforms, this effectively does nothing.
We read the left side of the assignment from right to left. First, we see &___x0
and &___y1
. These get the address of the x
and y
parameters. Then we see (int32_t*)
to cast them to pointers, which is unnecessary because they’re both int32_t
so the address of them is already an int32_t*
pointer. Finally, we see a *
that dereferences the pointers so the assignment is to the memory location they point to.
These could have been simplified to just ___x0 = 30
and ___y0 = 40
, but it looks like IL2CPP is preserving some of the ref
from C#.
At the end of the function is a call to il2cpp_codegen_add
to effectively add together two (unnecessary) copies of the parameters.
Now let’s look at the assembly that Xcode 10.1 generates for this C++ in an ARM64 iOS release build. There’s no need to understand it deeply as I’ll explain after each snippet of assembly code in this article. This particular example is quite simple:
movs r0, #70 bx lr
This is a great result for this function! The compiler has correctly realized that the return value is a constant 70
, so all it does is return that.
Conclusion: Simple ref
local variables result in sub-optimal C++ but the C++ compiler can fix this and emit great machine code.
Local ref
Variables From the Ternary Operator
Next, let’s add some complication and conditionally assign to a ref
local variable:
static class TestClass { static int TestRefLocalFromTernary(int x, int y) { ref int r = ref (x > y ? ref x : ref y); r = -1; return x + y; } }
Here we again make a ref int r
, but this time we assign to it based on the result of a ternary operator expression. This adds even more ref
keywords to the line since we now must specify ref
before the expression as well as ref x
, ref y
, and ref int r
. Omitting any of these will cause a compiler error. After that we can use the ref
local just like before by assigning to it.
Now let’s check the C++ for this:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestRefLocalFromTernary_m99B93CDACECA931AC66C6274AF62995FD2BA83BE (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method) { int32_t* G_B3_0 = NULL; { int32_t L_0 = ___x0; int32_t L_1 = ___y1; if ((((int32_t)L_0) > ((int32_t)L_1))) { goto IL_0008; } } { G_B3_0 = (&___y1); goto IL_000a; } IL_0008: { G_B3_0 = (&___x0); } IL_000a: { *((int32_t*)G_B3_0) = (int32_t)(-1); int32_t L_2 = ___x0; int32_t L_3 = ___y1; return ((int32_t)il2cpp_codegen_add((int32_t)L_2, (int32_t)L_3)); } }
This generated a fair bit more code but, goto
statements aside, it’s pretty straightforward. At the beginning we see an int32_t* G_B3_0
pointer that’ll come into play shortly. First, copies of the parameters are made and compared. If x
is greater than y
then we goto
a block where G_B3_0
is assigned the address of x
. Otherwise, G_B3_0
is assigned the address of y
.
At this point we effectively have G_B3_0
as r
, so we can proceed to use it. That occurs in the final block of the function which starts with another complex line. On the right we see the literal -1
which is needlessly cast to int32_t
. On the left we see G_B3_0
/r
needlessly cast from int32_t*
to int32_t*
before being dereferenced with *
. So this effectively assigns -1
to the memory that r
points to. Then the function ends with the same verbose addition as in the first example.
Let’s move on and see what this compiles to:
sub sp, #8 mov r2, sp add r3, sp, #4 strd r1, r0, [sp] cmp r0, r1 it le movle r3, r2 mov.w r0, #-1 str r0, [r3] ldrd r1, r0, [sp] add r0, r1 add sp, #8 bx lr
This is a fairly literal translation of the C++ with only one nice change. The ternary operator (?:
) in C# that became an if
plus goto
in C++ has been transformed again to a conditional move of either ref x
or ref y
to r
. All of the pointless blocks, redundant casting, and unconditional goto
statements have been kindly removed by the C++ compiler.
Conclusion: Assigning to a ref
local from a ternary operator expression generates straightforward machine code with only one conditional move.
Returning ref
values
Now let’s try returning a ref
value from a function:
static class TestClass { static ref int TestReturnRef(int[] a) { return ref a[0]; } static int TestCallReturnRef(int[] a) { ref int r = ref TestReturnRef(a); r = 10; return r; } }
TestReturnRef
returns a reference to the first element of the a
managed array rather than a copy of it.
TestCallReturnRef
creates a local ref
variable and assigns it from ref TestReturn(a)
instead of a reference to a local variable or parameter as we’ve seen before.
Let’s see what C++ is generated for TestReturnRef
:
extern "C" IL2CPP_METHOD_ATTR int32_t* TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method) { { Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0; NullCheck(L_0); return (int32_t*)(((L_0)->GetAddressAt(static_cast<il2cpp_array_size_t>(0)))); } }
This just has two effective elements: a null check and a call to an accessor that gets a pointer to the element at a given index. The former is simple, but the latter includes some unnecessary casting so it’s a bit harder to read. Let’s look at GetAddressAt
:
inline int32_t* GetAddressAt(il2cpp_array_size_t index) { IL2CPP_ARRAY_BOUNDS_CHECK(index, (uint32_t)(this)->max_length); return m_Items + index; }
This is just a simple offset from the pointer to the first element (m_Items
) to the element at the given index: index
. It includes a bounds-check though, so let’s look at that:
// Performance optimization as detailed here: http://blogs.msdn.com/b/clrcodegeneration/archive/2009/08/13/array-bounds-check-elimination-in-the-clr.aspx // Since array size is a signed int32_t, a single unsigned check can be performed to determine if index is less than array size. // Negative indices will map to a unsigned number greater than or equal to 2^31 which is larger than allowed for a valid array. #define IL2CPP_ARRAY_BOUNDS_CHECK(index, length) do { if (((uint32_t)(index)) >= ((uint32_t)length)) il2cpp::vm::Exception::Raise (il2cpp::vm::Exception::GetIndexOutOfRangeException()); } while (0)
As the comment notes, this macro just does the one if
check and throws an exception if the index is out of bounds. Note that the do-while
loop doesn’t really loop since its condition is 0
.
Here’s how TestReturnRef
ends up looking in assembly:
push {r4, r7, lr} add r7, sp, #4 mov r4, r0 cbnz r4, LBB40_2 movs r0, #0 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint LBB40_2: ldr r0, [r4, #12] cbnz r0, LBB40_4 bl __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv movs r1, #0 movs r2, #0 bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppExceptionP19Il2CppSequencePointP10MethodInfo LBB40_4: add.w r0, r4, #16 pop {r4, r7, pc}
This is nearly all error-checking. First we see the null check and NullReferenceException
then we see the bounds check and IndexOutOfRangeException
. Only at the very end is the actual work: one add
instruction. To remove all this, use the appropriate IL2CPP attributes.
Next, let’s see the C++ for TestCallReturnRef
:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnRef_mD752CDE161171917868820ABCD46C50F4A021471 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method) { { Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0; int32_t* L_1 = TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77(L_0, /*hidden argument*/NULL); int32_t* L_2 = L_1; *((int32_t*)L_2) = (int32_t)((int32_t)10); int32_t L_3 = *((int32_t*)L_2); return L_3; } }
This calls the C++ function for TestReturnRef
and stores the return value in a local int32_t*
pointer. Then we see the usual overcomplicated assignment of 10
to the pointer. Finally, the pointer is dereferenced to get the return value.
Here’s what this looks like in the assembly output:
push {r7, lr} mov r7, sp bl _TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77 movs r1, #10 str r1, [r0] movs r0, #10 pop {r7, pc}
This calls TestReturnRef
, which hasn’t been inlined. It writes 10
to the memory pointed to by the returned pointer and returns 10
. This is a nice optimization that skips dereferencing the pointer since the result is already known to be 10
.
Conclusion: Returning ref
values is also implemented in a straightforward way but, as always, watch out for null- and bounds-checks with managed arrays.
Readonly ref
Closely related to returning ref
values is returning readonly ref
values. These indicate to the caller that they may not change the memory pointed to by the returned reference. Here's how it looks:
class TestClass { static ref readonly int TestReturnReadonlyRef(int[] a) { return ref a[0]; } static int TestCallReturnReadonlyRef(int[] a) { ref readonly int r = ref TestReturnReadonlyRef(a); return r; } }
The changes here are to add readonly
after the ref
keyword in the function definition and call site. It can't be added before ref
in either location. The caller also can't omit readonly
as the function requires this. However, the caller can opt to use ref readonly
with a non-reeadonly
ref
return value if they don't want to ever change the reference. Note that this is not possible with ref
parameters that've always been in the language.
Let's see how TestReturnReadonlyRef
translates to C++:
extern "C" IL2CPP_METHOD_ATTR int32_t* TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method) { { Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0; NullCheck(L_0); return ((L_0)->GetAddressAt(static_cast<il2cpp_array_size_t>(0))); } }
This is identical to TestReturnRef
except that a redundant cast at the end has been omitted. Let's check to make sure the assembly is the same, too:
push {r4, r7, lr} add r7, sp, #4 mov r4, r0 cbnz r4, LBB41_2 movs r0, #0 bl __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint LBB41_2: ldr r0, [r4, #12] cbnz r0, LBB41_4 bl __ZN6il2cpp2vm9Exception27GetIndexOutOfRangeExceptionEv movs r1, #0 movs r2, #0 bl __ZN6il2cpp2vm9Exception5RaiseEP15Il2CppExceptionP19Il2CppSequencePointP10MethodInfo LBB41_4: add.w r0, r4, #16 pop {r4, r7, pc}
Yes, this is identical to the assembly for TestReturnRef
so let's move on to the C++ for TestCallReturnReadonlyRef
:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnReadonlyRef_m516535EE352B6440998A00B5D9F21E6480EDA215 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method) { { Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0; int32_t* L_1 = TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55(L_0, /*hidden argument*/NULL); int32_t L_2 = *((int32_t*)L_1); return L_2; } }
This function is slightly shorter since it doesn't include the r = 10;
line because r
is now readonly
and that would result in a C# compiler error. Let's see what impact that has on the assembly:
push {r7, lr} mov r7, sp bl _TestClass_TestReturnReadonlyRef_mBA7C632040BADB6B0AE690DCDB463E0EF2F30F55 ldr r0, [r0] pop {r7, pc}
This calls TestReturnReadonlyRef
and then dereferences the pointer to get the return value. The dereferencing is necessary here because, unlike in TestCallReturnRef
, we hadn't just assigned 10
to r
so the compiler doesn't already know the value.
Conclusion: Returning readonly ref
or declaring local ref
variables readonly
provides some nice error-checking against inadvertently changing the memory pointed to by ref
variables while not incurring any runtime overhead.
Storing ref
Returns By Value
In today's final example, let's try omitting the ref
keyword in the local variable that stores the return value of a function returning ref
.
static class TestClass { static int TestCallReturnRefByValue(int[] a) { int r = TestReturnRef(a); return r; } }
Here's the IL2CPP output:
extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCallReturnRefByValue_m79A8BC0DCD1104C33F93B2BE5B24B9DBFD926B18 (Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* ___a0, const RuntimeMethod* method) { { Int32U5BU5D_t2B9E4FDDDB9F0A00EC0AC631BA2DA915EB1ECF83* L_0 = ___a0; int32_t* L_1 = TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77(L_0, /*hidden argument*/NULL); int32_t L_2 = *((int32_t*)L_1); return L_2; } }
This looks just like the C++ for TestCallReturnReadonlyRef
. The return value is still stored in an int32_t*
and then dereferenced into just an int32_t
. This has simply been done for us as syntactic sugar for writing ref int r = TestReturnRef();
then int r2 = r;
. Since this C++ is the same, the assembly should be the same too:
push {r7, lr} mov r7, sp bl _TestClass_TestReturnRef_mD5D5F618F5DF6DCF2B0F2655901BF24670268F77 ldr r0, [r0] pop {r7, pc}
Indeed, the generated assembly is identical.
Conclusion: Storing the return value of a function that returns ref
into a non-ref
local variable is convenient, if error-prone, syntactic sugar for storing it in a ref
variable and then dereferencing it into a non-ref
variable.
Conclusion
ref
local variables and return values, including ref readonly
are implemented well without any nasty overhead such as method initialization. The end-result assembly is uniformly optimal in all cases. This is great because, unlike features like tuples and pattern matching, this truly adds new capability to the language. We now have the ability to write faster code without resorting to unsafe
language features like pointers.
As an example of this, consider the following code:
class World { public int ExpensiveGet(int id) { /* ... */ } public void ExpensiveSet(int id, int value) { /* ... */ } } int value = world.ExpensiveGet(myId); world.ExpensiveSet(id, value + 1);
This code simply wanted to increment some value that could only be found through an expensive (i.e. slow) process. What happens is that the slow process is conducted and then the results of that search are lost and only the value is returned. In order to then set the value, the whole expensive search must be conducted again just to get back to the same point.
Alternatives in previous versions of C# have revolved around generic programming:
class World { public void ExpensiveChange(int id, Func<int, int> f) { /* ... */ } } world.ExpensiveChange(myId, val => val + 1);
This conducts the expensive search and, once it's found what it's looking for, calls a delegate with the found value and uses the delegate's return value to set the new value. This avoids two expensive searches, but incurs all the cost of delegates.
Now with C# 7.3 we can make use of ref
returns and ref
local variables to avoid both the cost of double searches and that of delegates:
class World { public ref int ExpensiveGet(int id) { /* ... */ } } ref int r = ref world.ExpensiveGet(myId); r++;
And, from what we've seen in today's C++ and assembly analysis, we can rest assured that the machine code the CPU actually executes for this will be quite fast.