JacksonDunstan.com

The series continues today by picking up where we left off with pointers. We’ll discuss a popularly-used alternative in C++: references. These are quite different from the various concepts of references in C#!

Table of Contents

Pointers

As we saw last week, there is a lot of flexibility in pointers and their closely-associated arrays and strings. Usually, it’s a lot more flexibility than we really want. In the vast majority of cases, we simply want a pointer to refer to a variable. We don’t want that variable to be null, we don’t intend to perform arithmetic on the pointer, and we don’t want to index into it like an array. Consider a function declaration like this:

int GetTotalPoints(Player*);

This makes the reader ask themselves questions like “can the Player pointer be null?” The reader might also wonder “is this a single Player or an array of them?” and “if this is an array, how long can it be?” The answers really depend on the implementation of GetTotalPoints, but we don’t want readers to have to guess or spend their time tracking down and reading the function definition. The function definition might not even be available, such as with a closed-source library.

Lvalue References

To address these issues, C++ introduces “references” as an alternative to pointers. A reference is like an alias to something, usually backed with a pointer in the compiled code. Here’s how one looks:

int x = 123;
int& r = x; // <-- reference
DebugLog(x, r); // 123, 123

There are a several critical aspects of this. First, the syntax for a reference is similar to a pointer except that we add a & instead of a * to the type we want to refer to: int in this case. We can read the resulting int& r as “r is a reference to an int.”

Second, we must initialize the reference when it’s declared. We can’t simply write int& r; or we’ll get a compiler error. This helps avoid undefined behavior since we can’t possibly read or write an unintialized reference.

Third, the thing we initialize the reference to must be a valid “lvalue.” This is generally thought of as “something with a name.” It includes variables and functions. It also means that a reference can never be null since everything with a name has a non-null memory address in C++.

Fourth, we don’t initialize to &x like we’d do with a pointer and we don’t dereference the reference with *x. We simply use it as an alias. Any mention of r is just like we mentioned x. References are aliases, not objects. A pointer is distinct from what it points to and can be manipulated independently, but a reference cannot. This means there’s no re-assignment of a reference because we can’t actually refer to the reference that way:

int x = 123;
int y = 456;
int& r = x;
 
// This is equivalent to:
//   x = y;
// y is read and written to x
// r remains an alias of x
r = y;
 
DebugLog(x, r); // 456, 456

This is usually easier to reason about since the reference, unlike a pointer, can never change what it refers to as the program runs. We can, however, make a second reference by assigning the first reference to it:

int x = 123;
 
// Alias to x
int& r1 = x;
 
// This is equivalent to:
//   int& r2 = x;
// So this is also an alias to x
int& r2 = r1;
 
DebugLog(r1, r2); // 123, 123
x = 456;
DebugLog(r1, r2); // 456, 456

Because a reference isn’t a distinct object, there’s no such thing as a reference to a reference, pointer to a reference, or array of references:

Here are three alternate ways to initialize a reference:

int& r(x);
int& r = {x};
int& r{x};

They may also be initialized by passing them as a argument using two of the above forms:

void AddOne(int& val)
{
    val += 1;
}
 
int x = 1;
 
AddOne(x);
DebugLog(x); // 2
 
AddOne({x});
DebugLog(x); // 3

Likewise, returning a reference also initializes it:

int nextId = 0;
 
int& GetNextId()
{
    nextId++;
    return nextId;
}
 
int& id = GetNextId();
DebugLog(id); // 1
id = 0; // Reset
DebugLog(nextId); // 0

Now let’s see a reference to a function. These look just like pointers to functions, except that there’s a & instead of a *:

// Reference to a function that takes an int and returns a bool
bool (&r)(int) = MyFunc;

We can use them like this:

// Function to find the index according to some matching function
int FindIndex(int array[5], bool (&matcher)(int))
{
    for (int i = 0; i < 5; ++i)
    {
        if (matcher(array[i]))
        {
            return i;
        }
    }
    return -1;
}
 
bool IsEven(int val)
{
      return (val & 1) == 0;
}
 
// Make a reference to our matching function
bool (&isEven)(int) = IsEven;
 
int array[5] = { 1, 2, 3, 4, 5 };
int index = FindIndex(array, isEven); // Pass reference, not function
DebugLog(index);

Because we can initialize a reference by passing an argument, there really isn’t a need to explicitly make isEven as a local reference. Instead, we could do this:

// Passing the name of the function initializes the matcher reference argument
int index = FindIndex(array, IsEven);

A local reference is more useful when we don’t know what we want to reference at compile time and we want to use that runtime choice over and over:

// Decide what to alias at runtime
bool (&matcher)(int) = userWantsEvens ? IsEven : IsOdd;
 
// Use the result of that decision over and over
int index1 = FindIndex(array1, matcher);
int index2 = FindIndex(array2, matcher);
 
bool foundInBothArrays = index1 >= 0 && index2 >= 0;

Here’s a summary of the constraints that references impose compared to pointers:

Must be initialized when declared
Can’t be indexed into to offset a memory address
Not subject to pointer arithmetic
No references to references
No pointers to references
No arrays of references
Can’t be null
Can’t change what it aliases

That seems like a lot of lost flexibility and a lot more rules to live by, but it turns out that satisfying all of these constraints is extremely common. Aside from the last three, these are mostly the constraints that C# references impose on us and they’ve turned out to be quite practical. In practice, C++ references are very heavily used to succinctly convey all of these constraints to readers. Let’s look once more at the function we started with, now using a reference:

int GetTotalPoints(Player&);

It’s now clear that the Player can’t be null because that’s not possible with references. It’s clear that that this isn’t an array of Player objects, because that’s also not possible. The & instead of * means that it’s simply an alias for one non-null Player object.

Rvalue References

So far we’ve seen how references can make an alias for an “lvalue,” which is something with a name. We can also make references to things without a name. These references to “rvalues” were introduced in C++11 and are used quite extensively now.

An rvalue reference has two & after the type it references and is initialized with something that doesn’t have a name:

int&& r = 5;

The literal 5 doesn’t have a name like a variable does. Still, we can reference it and its lifetime is extended to the lifetime of the reference so that the reference never refers to something that no longer exists. It works like this:

{
    // 5 is the rvalue
    // It's not just a temporary on this line
    // Its lifetime is extended to match r
    int&& r = 5;
 
    // 123 is the rvalue, but it's just written to x
    // 123 stops existing after the semicolon
    int x = 123;
 
    // Both the rvalue reference and the variable are still readable
    DebugLog(r, x); // 5, 123
 
    // The temporary that r refers to is still accessible via the alias
    r = 6;
    DebugLog(r, x); // 6, 123
 
    // Don't worry, we didn't overwrite the fundamental concept of 5 :)
    DebugLog(5); // 5
 
// The scope that r is in ends
// r and 5 end their lifetime
// They can no longer be used
}

Liftime extension is much more important with structs and classes than with primitives like int, but the same rules apply. We’ll go much more into structs and classes later in the series.

The same alternate initialization forms are allowed with rvalue references:

int&& r(5);
int&& r = {5};
int&& r{5};

We can also initialize with function arguments:

void PrintRange(int&& from, int&& to)
{
    for (int i = from; i <= to; ++i)
    {
        DebugLog(i);
    }
}
 
PrintRange(1, 3); // 1, 2, 3

Return values can also initialize rvalue references, but these will become “dangling” references when returning a temporary because its liftime is not extended past the end of the function call:

Player&& MakePlayer(int id, int health)
{
    // Create a temporary Player
    // Alias it to an rvalue reference
    // Return that alias
    return { id, health };
}
 
// The returned rvalue reference is "dangling"
// It refers to a temporary Player that no longer exists
// It must not be used or undefined behavior will happen
Player&& player = MakePlayer(123, 100);
 
// We'll get garbage when we read from it
DebugLog(player.Id, player.Health); // 17823804, 12850082

It’s important to keep this in mind and only return rvalue references whose liftime is already going to extend beyond the end of the function call. We’ll see some techniques for doing this later on in the series.

The same constraints that apply to lvalue references apply to rvalue references:

Must be initialized when declared
Can’t be indexed into to offset a memory address
Not subject to pointer arithmetic
No references to references
No pointers to references
No arrays of references
Can’t be null
Can’t change what it aliases

Additionally, despite the naming similarity, lvalue references are different types than rvalue references. For example, consider trying to call the above PrintRange function with lvalues:

int from = 1;
int to = 3;
 
// Compiler error
// Can't pass int& when int&& is required
PrintRange(from, to);

No other kind of initialization of an rvalue reference is possible with an lvalue, even something as simple as this:

int x = 123;
 
// Compiler error
// x is an lvalue when int&& requires an rvalue
int&& r = x;

We can, however, assign an rvalue reference to an lvalue reference when that rvalue reference has a name:

// Compiler error
// 123 is an rvalue when int& requires an lvalue
int& error = 123;
 
int&& rr = 123;
int& lr = rr; // rr has a name, so it's an lvalue
 
DebugLog(rr, lr); // 123, 123
rr = 456;
DebugLog(rr, lr); // 456, 456

The opposite doesn’t work when the lvalue reference has a name, because that makes it not an rvalue:

int x = 123;
int& lr = x;
 
// Compiler error
// lr is an lvalue when int&& requires an rvalue
int&& rr = lr;

C# References

C# has several types of references. Let’s compare them with C++ references.

First, there’s the ref keyword used to pass function arguments “by reference.” This is pretty close to a C++ lvalue reference as the argument must be an lvalue and acts like an alias for the variable that was passed. There are some differences though. First, C++ uses & instead of ref in the function signature and doesn’t require the ref keyword when calling the function. Second, C# ref arguments can only be references to variables, not functions.

The out and in argument modifiers are also described as enabling pass-by-reference functionality in C#. Arguments marked with out are also like C++ lvalue references with the additional requirement that they must be written to at least once by the function. There isn’t a direct correspondence for this in C++ as the language tends to shy away from requiring at compile time that the write will be done, as is also the case with variable initialization. On the other hand, in arguments are essentially the same as a const lvalue reference in C++. We’ll cover const more in depth later, but for now it can be thought of as like an enhanced version of readonly in C#.

Second, there are ref return values and ref local variables. These are also similar to C++ lvalue references since they create an alias to an lvalue. C++ uses the same & syntax instead of ref in both the function signature for ref returns and and variable declaration for local variables. C# also requires ref at the return statement, but C++ doesn’t.

Third, there are ref and readonly ref structs in C# to force allocating them on the stack by enforcing various restrictions. This meaning of “reference” has no correlation to either lvalue or rvalue references in C++.

Fourth, and finally, there are reference types such as classes, interfaces, delegates, dynamic objects, the object type, and strings. All of these are “managed” types subject to garbage collection. As C++ has no “managed” types or garbage collection, there are also no reference types. Instead, references can be made to any type in C++.

The meaning of those references in C++ is different to that of C# references, though. In C#, they are somewhere in between pointers and C++ references. They’re like pointers in that they are an object, as opposed to an alias. They can be null and they and can be reassigned. They’re like references in that no pointer arithmetic is allowed and they can’t be indexed into like an array to offset a memory address.

Another major difference is that managed C# types are subject to garbage collection when there are no more references to them. This implies some behind-the-scenes tracking mechanism to know whether there are any references still available. This is very complicated, sometimes expensive, code that must be thread-safe and deal with esoteric edge cases. C++ references have no such tracking and do not imply any grand resource-management scheme. Besides lifetime extension of rvalue references, which is usually rather brief, there’s no attempt to globally manage all references for any purpose, including deallocation.

Conclusion

C++ references are similar to C++ pointers, C# pointers, and various kinds of C# references, but different in many ways from all of them. Its lvalue references are a unique way of referencing variables as well as functions. Its rvalue references are especially strange as none of these similar concepts offers anything close to the same functionality. As we go on through the series, we’ll see the growing importance and common usage of both kinds of references in many other areas of the language and its Standard Library.

#1 by Ã–kehan on July 6th, 2020 · Reply

Definitely, waiting for the next one. Thank you.

#2 by Domen on September 10th, 2020 · Reply

Just a typo:
The meaning of those references in C# is different to that of C# references, though.

Thanks for this articles :D
Moving from a Unity C# project to a C++ project and it really helps to ready this kind of materials.

#3 by jackson on September 10th, 2020 · Reply

Glad to hear you’ve found the articles useful! And thanks for pointing out the typo. I’ve updated the article with a fix.

#4 by Dimitar on September 9th, 2021 · Reply

Thank you so much for this series. I’m a (fairly decent) C# developer with intermediate knowledge of Unity. This is so easy to read, it’s exactly on par with my skill level, it’s literally perfect. Thank you again!

#5 by Mathias Sønderskov on January 1st, 2023 · Reply

Still slightly unsure when to use rvalue instead of the value itself? What problem does it solve?

Love the guide!

#6 by jackson on January 10th, 2023 · Reply

Perhaps the main use of rvalues is to indicate that the object being referenced is effectively unowned. This enables “move semantics” which can be much cheaper than making copies. For example, a common idiom is for a constructor to take an object that’s expensive to copy by value and then avoid a second copy by the use of rvalue references:

class Player { public: Player(std::string name) : name(std::move(name)) { } private: std::string name; };

The call to std::move just typecasts the lvalue reference (std::string&) to an rvalue reference (std::string&&). This means the std::string constructor taking a std::string&& will be called and that constructor can simply copy the pointer to the string’s characters rather than allocating new memory and copying every character. If we didn’t use rvalue references (via std::move) then the std::string constructor taking a std::string& would have been called and it wouldn’t be able to safely make this assumption, resulting in an unnecessary and expensive allocation and copy.

#7 by Yahorie on March 5th, 2024 · Reply

Hello! Really appreciate your work, these articles are helping me a lot!

There’s a little ambiguity in this part, because the name of the function and the name of a reference in an example above is the same. It’s been hard to spot for me, so I decided to let you know.

C++ For C# Developers: Part 8 – References

Pointers

Lvalue References

Rvalue References

C# References

Conclusion

Comments