C++ For C# Developers: Part 8 – References
The series continues today by picking up where we left off with pointers. We’ll discuss a popularly-used alternative in C++: references. These are quite different from the various concepts of references in C#!
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrapup
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Pointers
As we saw last week, there is a lot of flexibility in pointers and their closely-associated arrays and strings. Usually, it’s a lot more flexibility than we really want. In the vast majority of cases, we simply want a pointer to refer to a variable. We don’t want that variable to be null, we don’t intend to perform arithmetic on the pointer, and we don’t want to index into it like an array. Consider a function declaration like this:
int GetTotalPoints(Player*);
This makes the reader ask themselves questions like “can the Player
pointer be null?” The reader might also wonder “is this a single Player
or an array of them?” and “if this is an array, how long can it be?” The answers really depend on the implementation of GetTotalPoints
, but we don’t want readers to have to guess or spend their time tracking down and reading the function definition. The function definition might not even be available, such as with a closed-source library.
Lvalue References
To address these issues, C++ introduces “references” as an alternative to pointers. A reference is like an alias to something, usually backed with a pointer in the compiled code. Here’s how one looks:
int x = 123; int& r = x; // <-- reference DebugLog(x, r); // 123, 123
There are a several critical aspects of this. First, the syntax for a reference is similar to a pointer except that we add a &
instead of a *
to the type we want to refer to: int
in this case. We can read the resulting int& r
as “r
is a reference to an int
.”
Second, we must initialize the reference when it’s declared. We can’t simply write int& r;
or we’ll get a compiler error. This helps avoid undefined behavior since we can’t possibly read or write an unintialized reference.
Third, the thing we initialize the reference to must be a valid “lvalue.” This is generally thought of as “something with a name.” It includes variables and functions. It also means that a reference can never be null since everything with a name has a non-null memory address in C++.
Fourth, we don’t initialize to &x
like we’d do with a pointer and we don’t dereference the reference with *x
. We simply use it as an alias. Any mention of r
is just like we mentioned x
. References are aliases, not objects. A pointer is distinct from what it points to and can be manipulated independently, but a reference cannot. This means there’s no re-assignment of a reference because we can’t actually refer to the reference that way:
int x = 123; int y = 456; int& r = x; // This is equivalent to: // x = y; // y is read and written to x // r remains an alias of x r = y; DebugLog(x, r); // 456, 456
This is usually easier to reason about since the reference, unlike a pointer, can never change what it refers to as the program runs. We can, however, make a second reference by assigning the first reference to it:
int x = 123; // Alias to x int& r1 = x; // This is equivalent to: // int& r2 = x; // So this is also an alias to x int& r2 = r1; DebugLog(r1, r2); // 123, 123 x = 456; DebugLog(r1, r2); // 456, 456
Because a reference isn’t a distinct object, there’s no such thing as a reference to a reference, pointer to a reference, or array of references:
Here are three alternate ways to initialize a reference:
int& r(x); int& r = {x}; int& r{x};
They may also be initialized by passing them as a argument using two of the above forms:
void AddOne(int& val) { val += 1; } int x = 1; AddOne(x); DebugLog(x); // 2 AddOne({x}); DebugLog(x); // 3
Likewise, returning a reference also initializes it:
int nextId = 0; int& GetNextId() { nextId++; return nextId; } int& id = GetNextId(); DebugLog(id); // 1 id = 0; // Reset DebugLog(nextId); // 0
Now let’s see a reference to a function. These look just like pointers to functions, except that there’s a &
instead of a *
:
// Reference to a function that takes an int and returns a bool bool (&r)(int) = MyFunc;
We can use them like this:
// Function to find the index according to some matching function int FindIndex(int array[5], bool (&matcher)(int)) { for (int i = 0; i < 5; ++i) { if (matcher(array[i])) { return i; } } return -1; } bool IsEven(int val) { return (val & 1) == 0; } // Make a reference to our matching function bool (&isEven)(int) = IsEven; int array[5] = { 1, 2, 3, 4, 5 }; int index = FindIndex(array, isEven); // Pass reference, not function DebugLog(index);
Because we can initialize a reference by passing an argument, there really isn’t a need to explicitly make isEven
as a local reference. Instead, we could do this:
// Passing the name of the function initializes the matcher reference argument int index = FindIndex(array, IsEven);
A local reference is more useful when we don’t know what we want to reference at compile time and we want to use that runtime choice over and over:
// Decide what to alias at runtime bool (&matcher)(int) = userWantsEvens ? IsEven : IsOdd; // Use the result of that decision over and over int index1 = FindIndex(array1, matcher); int index2 = FindIndex(array2, matcher); bool foundInBothArrays = index1 >= 0 && index2 >= 0;
Here’s a summary of the constraints that references impose compared to pointers:
- Must be initialized when declared
- Can’t be indexed into to offset a memory address
- Not subject to pointer arithmetic
- No references to references
- No pointers to references
- No arrays of references
- Can’t be null
- Can’t change what it aliases
That seems like a lot of lost flexibility and a lot more rules to live by, but it turns out that satisfying all of these constraints is extremely common. Aside from the last three, these are mostly the constraints that C# references impose on us and they’ve turned out to be quite practical. In practice, C++ references are very heavily used to succinctly convey all of these constraints to readers. Let’s look once more at the function we started with, now using a reference:
int GetTotalPoints(Player&);
It’s now clear that the Player
can’t be null because that’s not possible with references. It’s clear that that this isn’t an array of Player
objects, because that’s also not possible. The &
instead of *
means that it’s simply an alias for one non-null Player
object.
Rvalue References
So far we’ve seen how references can make an alias for an “lvalue,” which is something with a name. We can also make references to things without a name. These references to “rvalues” were introduced in C++11 and are used quite extensively now.
An rvalue reference has two &
after the type it references and is initialized with something that doesn’t have a name:
int&& r = 5;
The literal 5
doesn’t have a name like a variable does. Still, we can reference it and its lifetime is extended to the lifetime of the reference so that the reference never refers to something that no longer exists. It works like this:
{ // 5 is the rvalue // It's not just a temporary on this line // Its lifetime is extended to match r int&& r = 5; // 123 is the rvalue, but it's just written to x // 123 stops existing after the semicolon int x = 123; // Both the rvalue reference and the variable are still readable DebugLog(r, x); // 5, 123 // The temporary that r refers to is still accessible via the alias r = 6; DebugLog(r, x); // 6, 123 // Don't worry, we didn't overwrite the fundamental concept of 5 :) DebugLog(5); // 5 // The scope that r is in ends // r and 5 end their lifetime // They can no longer be used }
Liftime extension is much more important with structs and classes than with primitives like int
, but the same rules apply. We’ll go much more into structs and classes later in the series.
The same alternate initialization forms are allowed with rvalue references:
int&& r(5); int&& r = {5}; int&& r{5};
We can also initialize with function arguments:
void PrintRange(int&& from, int&& to) { for (int i = from; i <= to; ++i) { DebugLog(i); } } PrintRange(1, 3); // 1, 2, 3
Return values can also initialize rvalue references, but these will become “dangling” references when returning a temporary because its liftime is not extended past the end of the function call:
Player&& MakePlayer(int id, int health) { // Create a temporary Player // Alias it to an rvalue reference // Return that alias return { id, health }; } // The returned rvalue reference is "dangling" // It refers to a temporary Player that no longer exists // It must not be used or undefined behavior will happen Player&& player = MakePlayer(123, 100); // We'll get garbage when we read from it DebugLog(player.Id, player.Health); // 17823804, 12850082
It’s important to keep this in mind and only return rvalue references whose liftime is already going to extend beyond the end of the function call. We’ll see some techniques for doing this later on in the series.
The same constraints that apply to lvalue references apply to rvalue references:
- Must be initialized when declared
- Can’t be indexed into to offset a memory address
- Not subject to pointer arithmetic
- No references to references
- No pointers to references
- No arrays of references
- Can’t be null
- Can’t change what it aliases
Additionally, despite the naming similarity, lvalue references are different types than rvalue references. For example, consider trying to call the above PrintRange
function with lvalues:
int from = 1; int to = 3; // Compiler error // Can't pass int& when int&& is required PrintRange(from, to);
No other kind of initialization of an rvalue reference is possible with an lvalue, even something as simple as this:
int x = 123; // Compiler error // x is an lvalue when int&& requires an rvalue int&& r = x;
We can, however, assign an rvalue reference to an lvalue reference when that rvalue reference has a name:
// Compiler error // 123 is an rvalue when int& requires an lvalue int& error = 123; int&& rr = 123; int& lr = rr; // rr has a name, so it's an lvalue DebugLog(rr, lr); // 123, 123 rr = 456; DebugLog(rr, lr); // 456, 456
The opposite doesn’t work when the lvalue reference has a name, because that makes it not an rvalue:
int x = 123; int& lr = x; // Compiler error // lr is an lvalue when int&& requires an rvalue int&& rr = lr;
C# References
C# has several types of references. Let’s compare them with C++ references.
First, there’s the ref
keyword used to pass function arguments “by reference.” This is pretty close to a C++ lvalue reference as the argument must be an lvalue and acts like an alias for the variable that was passed. There are some differences though. First, C++ uses &
instead of ref
in the function signature and doesn’t require the ref
keyword when calling the function. Second, C# ref
arguments can only be references to variables, not functions.
The out
and in
argument modifiers are also described as enabling pass-by-reference functionality in C#. Arguments marked with out
are also like C++ lvalue references with the additional requirement that they must be written to at least once by the function. There isn’t a direct correspondence for this in C++ as the language tends to shy away from requiring at compile time that the write will be done, as is also the case with variable initialization. On the other hand, in
arguments are essentially the same as a const
lvalue reference in C++. We’ll cover const
more in depth later, but for now it can be thought of as like an enhanced version of readonly
in C#.
Second, there are ref
return values and ref
local variables. These are also similar to C++ lvalue references since they create an alias to an lvalue. C++ uses the same &
syntax instead of ref
in both the function signature for ref
returns and and variable declaration for local variables. C# also requires ref
at the return
statement, but C++ doesn’t.
Third, there are ref
and readonly ref
structs in C# to force allocating them on the stack by enforcing various restrictions. This meaning of “reference” has no correlation to either lvalue or rvalue references in C++.
Fourth, and finally, there are reference types such as classes, interfaces, delegates, dynamic objects, the object
type, and strings. All of these are “managed” types subject to garbage collection. As C++ has no “managed” types or garbage collection, there are also no reference types. Instead, references can be made to any type in C++.
The meaning of those references in C++ is different to that of C# references, though. In C#, they are somewhere in between pointers and C++ references. They’re like pointers in that they are an object, as opposed to an alias. They can be null and they and can be reassigned. They’re like references in that no pointer arithmetic is allowed and they can’t be indexed into like an array to offset a memory address.
Another major difference is that managed C# types are subject to garbage collection when there are no more references to them. This implies some behind-the-scenes tracking mechanism to know whether there are any references still available. This is very complicated, sometimes expensive, code that must be thread-safe and deal with esoteric edge cases. C++ references have no such tracking and do not imply any grand resource-management scheme. Besides lifetime extension of rvalue references, which is usually rather brief, there’s no attempt to globally manage all references for any purpose, including deallocation.
Conclusion
C++ references are similar to C++ pointers, C# pointers, and various kinds of C# references, but different in many ways from all of them. Its lvalue references are a unique way of referencing variables as well as functions. Its rvalue references are especially strange as none of these similar concepts offers anything close to the same functionality. As we go on through the series, we’ll see the growing importance and common usage of both kinds of references in many other areas of the language and its Standard Library.
#1 by Ökehan on July 6th, 2020 ·
Definitely, waiting for the next one. Thank you.
#2 by Domen on September 10th, 2020 ·
Just a typo:
The meaning of those references in C# is different to that of C# references, though.
Thanks for this articles :D
Moving from a Unity C# project to a C++ project and it really helps to ready this kind of materials.
#3 by jackson on September 10th, 2020 ·
Glad to hear you’ve found the articles useful! And thanks for pointing out the typo. I’ve updated the article with a fix.
#4 by Dimitar on September 9th, 2021 ·
Thank you so much for this series. I’m a (fairly decent) C# developer with intermediate knowledge of Unity. This is so easy to read, it’s exactly on par with my skill level, it’s literally perfect. Thank you again!
#5 by Mathias Sønderskov on January 1st, 2023 ·
Still slightly unsure when to use rvalue instead of the value itself? What problem does it solve?
Love the guide!
#6 by jackson on January 10th, 2023 ·
Perhaps the main use of rvalues is to indicate that the object being referenced is effectively unowned. This enables “move semantics” which can be much cheaper than making copies. For example, a common idiom is for a constructor to take an object that’s expensive to copy by value and then avoid a second copy by the use of rvalue references:
The call to
std::move
just typecasts the lvalue reference (std::string&
) to an rvalue reference (std::string&&
). This means thestd::string
constructor taking astd::string&&
will be called and that constructor can simply copy the pointer to the string’s characters rather than allocating new memory and copying every character. If we didn’t use rvalue references (viastd::move
) then thestd::string
constructor taking astd::string&
would have been called and it wouldn’t be able to safely make this assumption, resulting in an unnecessary and expensive allocation and copy.#7 by Yahorie on March 5th, 2024 ·
Hello! Really appreciate your work, these articles are helping me a lot!
There’s a little ambiguity in this part, because the name of the function and the name of a reference in an example above is the same. It’s been hard to spot for me, so I decided to let you know.