JacksonDunstan.com

Unity 2018.3 officially launched last Thursday and with it comes support for the very latest version of C#: 7.3. This includes four new versions—7.0, 7.1, 7.2, and 7.3—so it’s a big upgrade from the C# 6 that we’ve had since 2018.1. Today we’ll begin an article series to learn what happens when we use some of the new features with IL2CPP. We’ll look at the C++ it outputs and even what the C++ compiles to so we know what the CPU will end up executing. Specifically, we’ll focus on the new tuples feature and talk about creating, naming, deconstructing, and comparing them.

Creating Simple Tuples

Let’s start out by creating a simple tuple with the (1, 2) syntax. Since 1 and 2 have the int type, this tuple has the (int, int) type.

static class TestClass
{
    static (int, int) TestCreateTuple()
    {
        return (1, 2);
    }
}

Now let’s see the C++ that IL2CPP outputs with Unity 2018.3.0f2:

extern "C" IL2CPP_METHOD_ATTR ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606 (const RuntimeMethod* method)
{
    static bool s_Il2CppMethodInitialized;
    if (!s_Il2CppMethodInitialized)
    {
        il2cpp_codegen_initialize_method (TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId);
        s_Il2CppMethodInitialized = true;
    }
    {
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_0;
        memset(&L_0, 0, sizeof(L_0));
        ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((&L_0), 1, 2, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var);
        return L_0;
    }
}

The signature of the function shows the tuple type that’s been created for us: ValueTuple_2_t.... This is essentially equiavelent to ValueTuple<int, int> in C#.

The body of the function starts with the usual method initialization overhead that’s generated any time a generic constructor is called. The first time through il2cpp_codegen_initialize_method is called, but after that we only pay for checking the s_Il2CppMethodInitialized flag.

After that, we see the actual work of the function. The ValueTuple_2_t... variable is declared to hold the return type, cleared to all zeroes with memset, the “constructor” ValueTuple_2__ctor_m... is called with the 1 and 2 values we want to store in the tuple, and finally the variable is returned. Note that the “constructor” isn’t a real C++ constructor, but rather a global function whose purpose is to construct an object.

Let’s take a look at the ValueTuple_2_t... type to see what was generated for us:

struct  ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 
{
public:
    // T1 System.ValueTuple`2::Item1
    int32_t ___Item1_0;
    // T2 System.ValueTuple`2::Item2
    int32_t ___Item2_1;
 
public:
    inline static int32_t get_offset_of_Item1_0() { return static_cast<int32_t>(offsetof(ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186, ___Item1_0)); }
    inline int32_t get_Item1_0() const { return ___Item1_0; }
    inline int32_t* get_address_of_Item1_0() { return &___Item1_0; }
    inline void set_Item1_0(int32_t value)
    {
        ___Item1_0 = value;
    }
 
    inline static int32_t get_offset_of_Item2_1() { return static_cast<int32_t>(offsetof(ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186, ___Item2_1)); }
    inline int32_t get_Item2_1() const { return ___Item2_1; }
    inline int32_t* get_address_of_Item2_1() { return &___Item2_1; }
    inline void set_Item2_1(int32_t value)
    {
        ___Item2_1 = value;
    }
};

This type just contains the two 32-bit integers we’d expect. There are no other fields or base classes, so this is truly optimal.

Now let’s look at the constructor for this type:

inline void ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 * __this, int32_t p0, int32_t p1, const RuntimeMethod* method)
{
    ((  void (*) (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *, int32_t, int32_t, const RuntimeMethod*))ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared)(__this, p0, p1, method);
}

The first part is a big, ugly cast to a function pointer from a global variable: ValueTuple_2__ctor_m..._gshared. Then the function pointer is called with the two int parameters: 1 and 2. A pointer to the instance (__this) and runtime information for the method come along for the ride.

Let’s see what the constructor looks like:

extern "C" IL2CPP_METHOD_ATTR void ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 * __this, int32_t ___item10, int32_t ___item21, const RuntimeMethod* method)
{
    {
        int32_t L_0 = ___item10;
        __this->set_Item1_0(L_0);
        int32_t L_1 = ___item21;
        __this->set_Item2_1(L_1);
        return;
    }
}

As expected, the constructor simply uses the accessor functions to set Item1 and Item2. These are the names of the fields in a ValueTuple<T1, T2> in C# and they carry over to C++.

Finally, let’s look at what the function ends up compiling to with a release build for iOS in Xcode 9.4.1. There’s no need to have a deep understanding of ARM64 assembly to read this.

    push    {r4, r5, r7, lr}
    add     r7, sp, #8
    movw    r5, :lower16:(__ZZ67TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606E25s_Il2CppMethodInitialized-(LPC10_0+4))
    mov     r4, r0
    movt    r5, :upper16:(__ZZ67TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606E25s_Il2CppMethodInitialized-(LPC10_0+4))
LPC10_0:
    add     r5, pc
    ldrb    r0, [r5]
    cbnz    r0, LBB10_2
    movw    r0, :lower16:(L_TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId$non_lazy_ptr-(LPC10_1+4))
    movt    r0, :upper16:(L_TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606_MetadataUsageId$non_lazy_ptr-(LPC10_1+4))
LPC10_1:
    add     r0, pc
    ldr     r0, [r0]
    ldr     r0, [r0]
    bl      __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj
    movs    r0, #1
    strb    r0, [r5]
LBB10_2:
    movw   r0, :lower16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC10_2+4))
    movs   r1, #0
    movt   r0, :upper16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC10_2+4))
    str    r1, [r4, #4]
LPC10_2:
    add    r0, pc
    str    r1, [r4]
    movs   r1, #1
    movs   r2, #2
    ldr    r0, [r0]
    ldr    r3, [r0]
    mov    r0, r4
    bl     _ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared
    pop    {r4, r5, r7, pc}

This begins with a lot of code for method initialization and then we see the 1 and 2 literals being passed to the constructor function. Since that wasn’t inlined, let’s go take a look at it:

str   r2, [r0, #4]
str   r1, [r0]
bx    lr

All this does is set the Item1 and Item2 fields based on the parameters. The accessor functions have been inlined, so this is now minimal.

Conclusion: Simple tuple types are minimal, containing only the necessary fields. Unfortunately, creating them adds method initialization overhead to a function and involves a function call to set the fields.

Creating Named Tuples

Now let’s try creating a tuple with explicit names. Previously we had Item1 and Item2, but we’ll specify new names instead:

static class TestClass
{
    static int TestCreateTupleWithNames()
    {
        var t = (horizontal: 1, vertical: 2);
        return t.horizontal + t.vertical;
    }
}

In C#, we can access the fields of the tuple using the names we gave them during creation: horizontal and vertical, not Item1 and Item2. Let’s see how this carries over to C++:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1 (const RuntimeMethod* method)
{
    static bool s_Il2CppMethodInitialized;
    if (!s_Il2CppMethodInitialized)
    {
        il2cpp_codegen_initialize_method (TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId);
        s_Il2CppMethodInitialized = true;
    }
    ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  V_0;
    memset(&V_0, 0, sizeof(V_0));
    {
        ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *)(&V_0), 1, 2, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var);
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_0 = V_0;
        int32_t L_1 = L_0.get_Item1_0();
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_2 = V_0;
        int32_t L_3 = L_2.get_Item2_1();
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

Again, the function body begins with method initialization. After that, we see the same ValueTuple_2_t... type being used as a local variable. That means C++ is using the same ValueType<int, int> type as in the first example, even though its fields are named Item1 and Item2 instead of horizontal and vertical. Creation continues in the same way: memset to zero then call the constructor. The constructor called is also the same as in the first example.

Immediately afterward, we see that the calls to get the horizontal and vertical fields have been converted to use the accessors for Item1 and Item2. This makes it apparent that the names we give tuple fields in C# are only syntactic sugar and the real names used are always Item1, Item2, and so forth.

The function wraps up with the odd il2cpp_codegen_add call, which we’ve seen before just expands to the + operator.

Now let’s look at the ARM64 assembly to see how the function is compiled:

    push    {r4, r7, lr}
    add     r7, sp, #4
    sub     sp, #8
    movw    r4, :lower16:(__ZZ76TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1E25s_Il2CppMethodInitialized-(LPC11_0+4))
    movt    r4, :upper16:(__ZZ76TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1E25s_Il2CppMethodInitialized-(LPC11_0+4))
LPC11_0:
    add     r4, pc
    ldrb    r0, [r4]
    cbnz    r0, LBB11_2
    movw    r0, :lower16:(L_TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId$non_lazy_ptr-(LPC11_1+4))
    movt    r0, :upper16:(L_TestClass_TestCreateTupleWithNames_m02420FFEBC2762E1F03DF0C07B0B18A1865C91A1_MetadataUsageId$non_lazy_ptr-(LPC11_1+4))
LPC11_1:
    add     r0, pc
    ldr     r0, [r0]
    ldr     r0, [r0]
    bl      __ZN6il2cpp2vm13MetadataCache24InitializeMethodMetadataEj
    movs    r0, #1
    strb    r0, [r4]
LBB11_2:
    movw    r0, :lower16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC11_2+4))
    movs    r1, #1
    movt    r0, :upper16:(L_ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var$non_lazy_ptr-(LPC11_2+4))
    movs    r2, #2
LPC11_2:
    add     r0, pc
    ldr     r0, [r0]
    ldr     r3, [r0]
    movs    r0, #0
    strd    r0, r0, [sp]
    mov     r0, sp
    bl      _ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_gshared
    ldrd    r0, r1, [sp]
    add     r0, r1
    add     sp, #8
    pop     {r4, r7, pc}

This is very similar to the first example. We see the method initialization followed by the call to the constructor function, but this function ends by adding together the two fields. It’s unfortunate that the compiler produced such a literal translation here as there was no need to create the tuple in the first place or perform the addition. It could have just returned 3, or even better inlined all calls to this function with the literal 3.

Conclusion: Tuple field names are syntactic sugar. The same type and constructor are used regardless of field names. Using tuples in general sometimes defeats compiler optimizations.

Creating Tuples with Inferred Names

Next we’ll create a tuple from variables, which allows the C# compiler to infer the field names as the same as the variable names:

static class TestClass
{
    static int TestCreateTupleWithInferredNames(int x, int y)
    {
        var t = (x, y);
        return t.x + t.y;
    }
}

Now let’s look at the IL2CPP output:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestCreateTupleWithInferredNames_mE7F8534AD9E551B268E05DEBB0306B3B72D73A46 (int32_t ___x0, int32_t ___y1, const RuntimeMethod* method)
{
    static bool s_Il2CppMethodInitialized;
    if (!s_Il2CppMethodInitialized)
    {
        il2cpp_codegen_initialize_method (TestClass_TestCreateTupleWithInferredNames_mE7F8534AD9E551B268E05DEBB0306B3B72D73A46_MetadataUsageId);
        s_Il2CppMethodInitialized = true;
    }
    ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  V_0;
    memset(&V_0, 0, sizeof(V_0));
    {
        int32_t L_0 = ___x0;
        int32_t L_1 = ___y1;
        ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF((ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186 *)(&V_0), L_0, L_1, /*hidden argument*/ValueTuple_2__ctor_mCDA3078E87F827C6490EBD90430507642CECC6BF_RuntimeMethod_var);
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_2 = V_0;
        int32_t L_3 = L_2.get_Item1_0();
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_4 = V_0;
        int32_t L_5 = L_4.get_Item2_1();
        return ((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)L_5));
    }
}

This is essentially identical to the C++ that was generated when we explicitly gave names to the fields. That makes sense since the names were syntactic sugar anyhow.

Conclusion: Compiler-inferred tuple field names are syntactic sugar just like explicitly-provided tuple field names.

Deconstructing Simple Tuples

Now let’s start to “deconstruct” tuples. This is syntax to extract the tuple’s fields into local variables all in one line. Here’s how it looks:

static class TestClass
{
    static int TestDeconstructTuple()
    {
        (int x, int y) = TestCreateTuple();
        return x + y;
    }
}

Let’s see what kind of C++ is generated for this:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructTuple_mFD8EA59ED69A94BCCE1CAA0FBB579925C1C34FF1 (const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    int32_t V_1 = 0;
    {
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_0 = TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606(/*hidden argument*/NULL);
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_1 = L_0;
        int32_t L_2 = L_1.get_Item1_0();
        V_0 = L_2;
        int32_t L_3 = L_1.get_Item2_1();
        V_1 = L_3;
        int32_t L_4 = V_0;
        int32_t L_5 = V_1;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_4, (int32_t)L_5));
    }
}

This function doesn’t have method initialization overhead since it didn’t call a generic constructor. Instead, it just calls a function that happens to call a generic constructor, so presumably we’ll take the overhead there.

After the call to TestCreateTuple, we see the same accessors for Item1 and Item2 called to get the tuple’s fields. The return values are stored in local variables: L_2 and L_3. They also get copied, redundantly, to V_0 and V_1 before being finally added together.

So far it looks like deconstructing a tuple is just syntax sugar for copying its fields one-by-one. Let’s look at the assembly the C++ compiler generates to see how this all boils down into what the CPU actually executes:

push   {r7, lr}
mov    r7, sp
sub    sp, #8
mov    r0, sp
bl     _TestClass_TestCreateTuple_mB7F9180F25D30B4F025CB7FB81F5FE9A3EECC606
ldrd   r0, r1, [sp]
add    r0, r1
add    sp, #8
pop    {r7, pc}

After the call to TestCreateTuple, the fields are loaded out of the tuple, added together, and returned.

This is another missed opportunity by the compiler because TestCreateTuple always returns (1, 2) so the return value of this function is known at compile time to be 3. The compiler still generated code that goes through the motions: create the tuple with method initialization overhead, get the fields, and add them together. This indicates that the complexity of tuples may defeat compiler optimizations like these when constant values are used in real game code.

Conclusion: Deconstructing a tuple is syntactic sugar for individually reading its fields.

Deconstructing Classes

Deconstructing also works with our own class types. All we have to do is provide a Deconstruct method that takes out parameters like this:

class DeconstructableClass
{
    public int X;
    public int Y;
 
    public void Deconstruct(out int x, out int y)
    {
        x = X;
        y = Y;
    }
}

The we can place it on the right side of the = to deconstruct it:

static class TestClass
{
    static int TestDeconstructClass(DeconstructableClass dc)
    {
        (int x, int y) = dc;
        return x + y;
    }
}

Let’s see what happens when we do this:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructClass_m57AD9127FF5C070409FE98467A541E3FD141C478 (DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * ___dc0, const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    int32_t V_1 = 0;
    int32_t V_2 = 0;
    {
        DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * L_0 = ___dc0;
        NullCheck(L_0);
        DeconstructableClass_Deconstruct_mABF5441DF3A7F5833AAA4F2D1948489E78E6706D(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL);
        int32_t L_1 = V_1;
        int32_t L_2 = V_2;
        V_0 = L_2;
        int32_t L_3 = V_0;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

The generated C++ begins with a null check of the class and proceeds to call the DeconstructableClass.Deconstruct method we wrote with pointers to local variables as the out parameters. These are then, again redundantly, copied to other local variables before finally being added together.

Let’s look at the Deconstruct method to see what IL2CPP generated for it:

extern "C" IL2CPP_METHOD_ATTR void DeconstructableClass_Deconstruct_mABF5441DF3A7F5833AAA4F2D1948489E78E6706D (DeconstructableClass_t9072FF25F580F648CD2B2657234A18D85E353D12 * __this, int32_t* ___x0, int32_t* ___y1, const RuntimeMethod* method)
{
    {
        int32_t* L_0 = ___x0;
        int32_t L_1 = __this->get_X_0();
        *((int32_t*)L_0) = (int32_t)L_1;
        int32_t* L_2 = ___y1;
        int32_t L_3 = __this->get_Y_1();
        *((int32_t*)L_2) = (int32_t)L_3;
        return;
    }
}

All this does it call the accessors to get X and Y then set them to the out parameters, which were turned into pointers in C++.

Finally, let’s look at the assembly for the test function to see how it all came together:

    push    {r4, r7, lr}
    add    r7, sp, #4
    mov    r4, r0
    cbnz    r4, LBB14_2
    movs    r0, #0
    bl    __ZN6il2cpp2vm9Exception27RaiseNullReferenceExceptionEP19Il2CppSequencePoint
LBB14_2:
    ldr    r0, [r4, #8]
    ldr    r1, [r4, #12]
    add    r0, r1
    pop    {r4, r7, pc}

The first part is the null check and the second part is the actual work of the function. The call to Deconstruct has been inlined and we now have just two reads to get the fields of the class. They’re then added together and returned.

Conclusion: Deconstruct methods provide an efficient and terse way to deconstruct classes.

Deconstructing Structs

If we can add a Deconstruct method to a class, surely we can do the same with a struct. Let’s try:

struct DeconstructableStruct
{
    public int X;
    public int Y;
 
    public void Deconstruct(out int x, out int y)
    {
        x = X;
        y = Y;
    }
}
 
static class TestClass
{
    static int TestDeconstructStruct(DeconstructableStruct ds)
    {
        (int x, int y) = ds;
        return x + y;
    }
}

This works, so let’s check the C++:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructStruct_m723954E47E070D42C482C0546A40BA24BA91D40D (DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4  ___ds0, const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4  V_1;
    memset(&V_1, 0, sizeof(V_1));
    int32_t V_2 = 0;
    int32_t V_3 = 0;
    {
        DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4  L_0 = ___ds0;
        V_1 = L_0;
        DeconstructableStruct_Deconstruct_m197F44C639A3201D98D99FB2CFD9FE4DEAEE5AFE((DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 *)(&V_1), (int32_t*)(&V_2), (int32_t*)(&V_3), /*hidden argument*/NULL);
        int32_t L_1 = V_2;
        int32_t L_2 = V_3;
        V_0 = L_2;
        int32_t L_3 = V_0;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

This is basically the same as the class version except that it doesn’t have the NullCheck call. Let’s look at the Deconstruct function now:

extern "C" IL2CPP_METHOD_ATTR void DeconstructableStruct_Deconstruct_m197F44C639A3201D98D99FB2CFD9FE4DEAEE5AFE (DeconstructableStruct_t8DD49D68F229C042C4490DE678EA7018F17C80A4 * __this, int32_t* ___x0, int32_t* ___y1, const RuntimeMethod* method)
{
    {
        int32_t* L_0 = ___x0;
        int32_t L_1 = __this->get_X_0();
        *((int32_t*)L_0) = (int32_t)L_1;
        int32_t* L_2 = ___y1;
        int32_t L_3 = __this->get_Y_1();
        *((int32_t*)L_2) = (int32_t)L_3;
        return;
    }
}

This is identical to the class version, so let’s see what assembly was generated:

add   r0, r1
bx    lr

With the null check gone, the generated assembly is truly tiny. It now consists of the bare minimum addition and return.

Conclusion: Deconstructing a struct is free and still provides a terse, error-checked way of extracting its fields.

Deconstructing with Extension Methods

Now let’s try moving the Deconstruct method out of the struct and into a static class as an extension method:

struct NonDeconstructableStruct
{
    public int X;
    public int Y;
}
 
static class NonDeconstructableStructExtensions
{
    public static void Deconstruct(
        this NonDeconstructableStruct nds,
        out int x,
        out int y)
    {
        x = nds.X;
        y = nds.Y;
    }
}

Using it looks exactly like when Deconstruct was inside the struct type as an instance method:

static class TestClass
{
    static int TestDeconstructStructExtension(NonDeconstructableStruct ds)
    {
        (int x, int y) = ds;
        return x + y;
    }
}

This compiles just fine, so let’s see what C++ was generated:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructStructExtension_mFD580D0EB417138F4E03F3391B9FB9063526AC4F (NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054  ___ds0, const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    int32_t V_1 = 0;
    int32_t V_2 = 0;
    {
        NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054  L_0 = ___ds0;
        NonDeconstructableStructExtensions_Deconstruct_m029D9FA1627896EF3E8F38AAE1D469D0B713DBD2(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL);
        int32_t L_1 = V_1;
        int32_t L_2 = V_2;
        V_0 = L_2;
        int32_t L_3 = V_0;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

This looks the same as before, except that the call to Deconstruct is now in NonDeconstructableStructExtensions. Let’s see it:

extern "C" IL2CPP_METHOD_ATTR void NonDeconstructableStructExtensions_Deconstruct_m029D9FA1627896EF3E8F38AAE1D469D0B713DBD2 (NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054  ___nds0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method)
{
    {
        int32_t* L_0 = ___x1;
        NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054  L_1 = ___nds0;
        int32_t L_2 = L_1.get_X_0();
        *((int32_t*)L_0) = (int32_t)L_2;
        int32_t* L_3 = ___y2;
        NonDeconstructableStruct_tEA3A22DAAA7EFF04989AB14067DCFC13CC30A054  L_4 = ___nds0;
        int32_t L_5 = L_4.get_Y_1();
        *((int32_t*)L_3) = (int32_t)L_5;
        return;
    }
}

This is slighly longer due to some unnecessary variable copying, but essentially the same. Let’s see if any of this affected the final assembly:

add   r0, r1
bx    lr

No, it’s still the same two instructions.

Conclusion: Placing Deconstruct inside or outside the type doesn’t make any difference.

Deconstructing Enums

If Deconstruct can be an extension method, then we should be able to apply it to all kinds of types. Let’s try adding one for an enum:

enum TestEnum
{
}
 
static class TestEnumExtensions
{
    public static void Deconstruct(
        this TestEnum te,
        out int x,
        out int y)
    {
        x = (int)te;
        y = (int)te + 1;
    }
}
 
static class TestClass
{
    static int TestDeconstructEnum(TestEnum te)
    {
        (int x, int y) = te;
        return x + y;
    }
}

This particular example is just to keep things simple and doesn’t represent a good use of deconstructing an enum. There may be good uses though, such as being able to write (float x, float y, float z) = Axis::X and getting the equivalent of float x = 1.0f; float y = 0.0f; float z = 0.0f; due to a switch in Deconstruct that outputs the appropriate values.

In the meantime, let’s return to the example and see how it looks in C++:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructEnum_m25C34CEC10F6CE4E36C917ABE450E5A43592788D (int32_t ___te0, const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    int32_t V_1 = 0;
    int32_t V_2 = 0;
    {
        int32_t L_0 = ___te0;
        TestEnumExtensions_Deconstruct_m53305AE31678C7BA3BAA729F3B549598F3DB1D2F(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL);
        int32_t L_1 = V_1;
        int32_t L_2 = V_2;
        V_0 = L_2;
        int32_t L_3 = V_0;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

Again, this output looks just like the struct version. Let’s see how the Deconstruct function works:

extern "C" IL2CPP_METHOD_ATTR void TestEnumExtensions_Deconstruct_m53305AE31678C7BA3BAA729F3B549598F3DB1D2F (int32_t ___te0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method)
{
    {
        int32_t* L_0 = ___x1;
        int32_t L_1 = ___te0;
        *((int32_t*)L_0) = (int32_t)L_1;
        int32_t* L_2 = ___y2;
        int32_t L_3 = ___te0;
        *((int32_t*)L_2) = (int32_t)((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)1));
        return;
    }
}

Redundant variables aside, this does just what we wrote in C#: output the enum’s integer value to x and add one to output y. Let’s see what assembly we get for this:

movs    r1, #1
orr.w   r0, r1, r0, lsl #1
bx      lr

This is three instructions because we have to add one, but still absolutely minimal instructions for the CPU to execute.

Conclusion: Extension methods can allow for deconstructing an enum just as quickly and easily as a struct or class.

Deconstructing Primitives

Lastly, let’s even try adding a Deconstruct extension method for a primitive type: int.

static class IntExtensions
{
    public static void Deconstruct(
        this int i,
        out int x,
        out int y)
    {
        x = i;
        y = i + 1;
    }
}
 
static class TestClass
{
    static int TestDeconstructInt(int i)
    {
        (int x, int y) = i;
        return x + y;
    }
}

The code to do this is basically the same as with enum, class, and struct. Its debatable whether there are any good uses for deconstructing a primitive type, but we’ll set that aside for now as we go through this example. Let’s see how much the C++ varies:

extern "C" IL2CPP_METHOD_ATTR int32_t TestClass_TestDeconstructInt_mB98888AF0CA4300F0FFD1BD29673067F26CEFBFA (int32_t ___i0, const RuntimeMethod* method)
{
    int32_t V_0 = 0;
    int32_t V_1 = 0;
    int32_t V_2 = 0;
    {
        int32_t L_0 = ___i0;
        IntExtensions_Deconstruct_mE01DC9A61BFAF35570CFD0176217F98232D2EC5B(L_0, (int32_t*)(&V_1), (int32_t*)(&V_2), /*hidden argument*/NULL);
        int32_t L_1 = V_1;
        int32_t L_2 = V_2;
        V_0 = L_2;
        int32_t L_3 = V_0;
        return ((int32_t)il2cpp_codegen_add((int32_t)L_1, (int32_t)L_3));
    }
}

So far this looks the same as with struct and enum. Let’s look at the C++ for the Deconstruct extension method:

extern "C" IL2CPP_METHOD_ATTR void IntExtensions_Deconstruct_mE01DC9A61BFAF35570CFD0176217F98232D2EC5B (int32_t ___i0, int32_t* ___x1, int32_t* ___y2, const RuntimeMethod* method)
{
    {
        int32_t* L_0 = ___x1;
        int32_t L_1 = ___i0;
        *((int32_t*)L_0) = (int32_t)L_1;
        int32_t* L_2 = ___y2;
        int32_t L_3 = ___i0;
        *((int32_t*)L_2) = (int32_t)((int32_t)il2cpp_codegen_add((int32_t)L_3, (int32_t)1));
        return;
    }
}

This looks the same as with enum. Here’s how the assembly looks:

movs    r1, #1
orr.w   r0, r1, r0, lsl #1
bx      lr

Again, this is just as with enum.

Conclusion: Extension methods even allow for deconstructing primitives. It’s just as efficient as enum and struct, but might not have any good practical uses.

Tuple Equality

For today’s final example, let’s see how we can use the equality (==) and inequality (!=) operators with tuples:

static class TestClass
{
    static bool TestTupleEquality((int, int) t1, (int, int) t2)
    {
        return t1 == t2;
    }
 
    static bool TestTupleInequality((int, int) t1, (int, int) t2)
    {
        return t1 != t2;
    }
}

Here’s the C++ that IL2CPP generates:

extern "C" IL2CPP_METHOD_ATTR bool TestClass_TestTupleEquality_m2FA8D3F7595837C30A9EF891092F87AF36287C5A (ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  ___t10, ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  ___t21, const RuntimeMethod* method)
{
    ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  V_0;
    memset(&V_0, 0, sizeof(V_0));
    ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  V_1;
    memset(&V_1, 0, sizeof(V_1));
    {
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_0 = ___t10;
        V_0 = L_0;
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_1 = ___t21;
        V_1 = L_1;
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_2 = V_0;
        int32_t L_3 = L_2.get_Item1_0();
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_4 = V_1;
        int32_t L_5 = L_4.get_Item1_0();
        if ((!(((uint32_t)L_3) == ((uint32_t)L_5))))
        {
            goto IL_0021;
        }
    }
    {
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_6 = V_0;
        int32_t L_7 = L_6.get_Item2_1();
        ValueTuple_2_t5A24A9AD1EB9E7A9CDB9C168A09E94B94E849186  L_8 = V_1;
        int32_t L_9 = L_8.get_Item2_1();
        return (bool)((((int32_t)L_7) == ((int32_t)L_9))? 1 : 0);
    }
 
IL_0021:
    {
        return (bool)0;
    }
}

This is pretty long compared to the previous functions, but it’s just simple and verbose. To begin with, two tuples are created: V_0 and V_1. Both are memset to zero, just as when we created the tuples directly but unlike when we tested deconstructing a tuple. Then they’re immediately overwritten by the parameters.

Next, Item1 is retrieved from both tuples via its accessor. This is compared for inequality with !(a==b) and goto is used to jump to a block that returns false in that case. Otherwise, Item2 is retrieved and compared with == to return true or false using a conditional (?:) operator. The is far from clean hand-written code, but still rather straightforward in this small example.

Finally, let’s look at the C++ compiler output, translated by the author into pseudo-C#:

eors    r1, r3    # r1 = t1.Item1 ^ t2.Item2; // zero only if equal
eors    r0, r2    # r0 = t1.Item1 ^ t2.Item2; // zero only if equal
orrs    r0, r1    # Z = ((r0 | r1) == 0) ? 1 : 2; // zero only if both equal
mov.w   r0, #0    # r0 = 0;
it      eq        # if (Z == 1) // only if both equal
moveq   r0, #1    #   r0 = 1;
bx      lr        # return r0;

Here we see a more radical transformation than in previous assembly code. The two if statements have been entirely replaced with a more optimal set of instructions. There are no more branches and only one conditional instruction. This should execute much faster than a literal translation of the C++ would.

The C++ and assembly code for inequality is nearly identical to that of equality, so it’s not shown here.

Conclusion: Tuple equality and inequality is syntactic sugar for directly comparing all fields of the tuples. The resulting assembly may have no branches.

Conclusion

Tuples have provided us a fair amount of syntactic sugar to create structs and name, compare, and extract their fields. The resulting assembly is reasonably good in all cases. In some cases like deconstruction and equality, it’s truly minimal and should execute extremely quickly. Creation in particular is marred by method initialization overhead. Likewise, the C++ compiler can generate some sub-optimal code for usage of a tuple.

Additionally, deconstruction is a flexible tool that we can apply to classes and structs via instance methods. We can also apply it to these types as well as enums, primitives, or anything else via extension methods. Sometimes there aren’t any good use cases for this, but others such as (float x, float y, float z) = someVector3; are quite compelling.

Stay tuned for next week when we’ll continue the series by looking at more new language features in C# 7.3!

#1 by Nate Allan on January 10th, 2019 · Reply

This type of measurement and analysis can be so informative, rather than endless speculation. Thank you so much for doing this!

#2 by Draugor on July 10th, 2019 · Reply

uh now i’m interested in knowing what tuples do to List/Dictionaries instead of structs in il2cpp :D
say a Dictionary or a List

#3 by Draugor on July 10th, 2019 · Reply

apparently comments don’t like generics

i meant a Dictionary with (int, int) as key and int as value or a list of (int,int,int) for example

IL2CPP Output for C# 7.3: Tuples