Searching for a Non-Nullable Reference Type
The inventor of null references has called them his billion dollar mistake. We’ve all felt that pain so many times. After a while it can seem like null references are inevitable. They’re just a built-in sharp edge we have to carefully avoid cutting ourselves on. But is this true? Is there some way we can avoid the possibility of a null reference in the first place? Today we’ll go searching for such a mythical type.
Let’s write a very simple function in C#:
void DoDamage(Weapon weapon, Player target) { target.Health -= weapon.Damage; }
Because Weapon
and Player
are references (e.g. to classes), they are nullable. This means there are four permutations of null states when this function executes:
- Neither
Weapon
norPlayer
are null Weapon
is null butPlayer
isn’tWeapon
isn’t null butPlayer
is- Both
Weapon
andPlayer
are null
1 is the only case that actually works, but the compiler will happily compile code that triggers the other three cases. The runtime will go right ahead and execute this code. The result will probably be a crash since recovery to a valid program state from a NullReferenceException
falling through the call stack is nigh impossible.
Callers of DoDamage
are unlikely to directly pass null
for one or both of the arguments since it’s pretty obvious that these are not optional parameters. It should be rare to see calls like this that features like nullable reference types can detect problems with at compile time:
DoDamage(null, null);
Much more likely is that variables will be passed in:
DoDamage(weapon, target);
These variables are also references and so they are nullable, too. Where did they come from? It’s rare for them to be created right beforehand like this:
Weapon weapon = new Weapon(10); Player target = new Player(100); DoDamage(weapon, target);
Again, it’s much more likely that these variables are themselves arguments just being passed along to DoDamage
. It’s also common for them to be retrieved from the game state, such as in a LocalPlayer
or CurrentWeapon
field. This means these are also variables and are just as nullable as the parameters to DoDamage
.
What’s happening here is that the nullability of these references is extended through the call stack and the game state. When we get a crash report that indicates DoDamage
dereferenced null, we usually have a major undertaking on our hands. We need to search not only the call stack but also the game state. How did the game state come to have null variables? That may entail a search back in time to several frames or several minutes of gameplay ago. Hours or even days of productivity can be lost tracking down issues like this. It is not a small problem.
The usual impulse is to put a bandage on the problem by adding a null check where the null dereference happens. There are a few usual ways this is done. Let’s start with the “do nothing” approach:
void DoDamage(Weapon weapon, Player target) { if (weapon == null || target == null) { return; } target.Health -= weapon.Damage; }
In this approach we check both parameters for null and simply do nothing if null is that’s the case. This prevents the crash but still has serious problems:
- Passing null is a bug that is now silently ignored and will now be even harder to fix when players start reporting that their attacks don’t do any damage
- Any logic predicated on the function actually reducing the target’s health is now invalid and may cause a cascade of bugs
- The function is slower because it has to do null checks
- The author of the function has to spend their time writing code to check for null
- The function is less readable since it now includes code besides the function’s core purpose
- There’s usually no good option if the function has to produce an output, such as a function that “gets” or “finds” something
Another approach is to throw an exception when null is detected:
void DoDamage(Weapon weapon, Player target) { if (weapon == null) { throw new ArgumentException("weapon can't be null"); } if (target == null) { throw new ArgumentException("target can't be null"); } target.Health -= weapon.Damage; }
All we’re really doing here is renaming NullReferenceException
to ArgumentException
. Like the “do nothing” approach, we have to write more code and the function is noisier now.
Sometimes asserts are employed in an attempt to improve performance:
void DoDamage(Weapon weapon, Player target) { Assert.IsNotNull(weapon, "weapon can't be null"); Assert.IsNotNull(target, "target can't be null"); target.Health -= weapon.Damage; }
The assertions will be stripped out of production builds, leaving the original version of the function. That is to say that the runtime will still perform the null checks so it can throw a NullReferenceException
. We would need to additionally add an attribute to the function to disable these:
[Il2CppSetOption(Option.NullChecks, false)] void DoDamage(Weapon weapon, Player target) { Assert.IsNotNull(weapon, "weapon can't be null"); Assert.IsNotNull(target, "target can't be null"); target.Health -= weapon.Damage; }
The assertions will remain in development builds, leaving the equivalent of the version where we threw an ArgumentException
. This version can be faster with the Il2CppSetOption
attribute, but it doesn’t make null arguments an impossibility. Dereferencing null will no longer throw an exception but instead directly crash the program.
One final approach is to add comments explaining that callers shouldn’t pass null:
/// <param name="weapon">Must not be null</param> /// <param name="target">Must not be null</param> void DoDamage(Weapon weapon, Player target) { target.Health -= weapon.Damage; }
There’s no longer any performance penalty or clutter in the function body, but there’s still a need to write comments so the burden’s not completely gone. Much more importantly, there’s a burden on the caller to read these comments. That is very likely to not happen in the real world, regardless of how many exclamation points and ALL CAPS WARNINGS we use. The compiler certainly doesn’t enforce our comments.
These four approaches each have their trade-offs, but none of them solve the core problem: the function accepts null arguments even though that will cause a crash. Can we instead solve the problem by making it impossible to pass null arguments?
C# provides an alternative reference type: pointers. Instead of managed Weapon
and Player
references, we could take unmanaged Weapon*
and Player*
pointers:
void DoDamage(Weapon* weapon, Player* target) { target->Health -= weapon->Damage; }
This doesn’t help at all because pointers are also nullable. Callers may still pass null
and our function will still crash when dereferencing those arguments.
That’s all we have for built-in reference types in C#. So let’s try to build our own type that isn’t nullable. Here’s an attempt using a struct:
public struct NotNull<T> where T : class { private readonly T m_Val; public NotNull(T val) { if (val == null) { throw new ArgumentException("value can't be null"); } m_Val = val; } public T Value { get { if (m_Val == null) { throw new InvalidOperationException("value can't be null"); } return m_Val; } } }
NotNull
performs the null checks explicitly rather than leaving them up to the runtime. This allows us to choose the approach we want. We could use the exception approach, as shown, or go with assertions, comments, or doing nothing. Just as none of those approaches solved the problem, neither has NotNull
.
Note that we need to check for null in get Value
because we can’t define a default constructor for structs. Users can easily create a NotNull
with the default (null
!) value of m_Val
. Here are a few ways they can do that:
NotNull<Weapon> nn1 = new NotNull<Weapon>(); NotNull<Weapon> nn2 = default; NotNull<Weapon> nn3 = default(NotNull<Weapon>); NotNull<Weapon>[] nn4 = new NotNull<Weapon>[1]; // nn4[0].Val is null
So what if we use a class instead of a struct? Classes allow us to prohibit the default constructor, so let’s try that:
public class NotNull<T> where T : class { private readonly T m_Val; public NotNull(T val) { if (val == null) { throw new ArgumentException("value can't be null"); } m_Val = val; } public T Value { get { return m_Val; } } }
We no longer need to check for null in get Value
because our constructor can guarantee that it’s never null. Let’s see what happens if we try to anyways:
NotNull<Weapon> nn1 = new NotNull<Weapon>(); // compiler error NotNull<Weapon> nn2 = default; // nn2 is null NotNull<Weapon> nn3 = default(NotNull<Weapon>); // nn3 is null NotNull<Weapon>[] nn4 = new NotNull<Weapon>[1]; // nn4[0] is null
Here we see that we’ve just moved the problem around. While the NotNull
can no longer hold a null value, the NotNull
itself can be null! Try to call get Val
will crash just like trying to use a null value. Our DoDamage
function is no better off:
void DoDamage(NotNull<Weapon> weapon, NotNull<Player> target) { // Crashes if either 'weapon' or 'target' is null target.Health -= weapon.Damage; }
C# is out of language tools at this point, so let’s switch gears to that other major game programming language: C++. What options do we have for engines like Unreal that use C++?
There two main language options: pointers and references. Pointers are just like in C#, so we won’t discuss them again. References, however, behave totally differently to references in C#. A C++ reference is an alias for another variable, not a variable itself. It must be initialized when created and can never be changed afterward. Here’s an example:
int i = 123; int& r = i; // r is an alias for i r = 456; // equivalent to "i = 123" DebugLog(i); // 456 DebugLog(r); // equivalent to "DebugLog(i)": 456
Once we’re thinking about it as an alias, it makes sense that it can’t be re-assigned:
int i = 123; int&r = i; int i2 = 456; r = i2; // equivalent to "i = i2" int& r2 = i2; r = r2; // equivalent to "i = i2" DebugLog(i); // 456 DebugLog(i2); // 456
It can never be null because it must alias a variable:
int& r; // compiler error: must be initialized int&r = nullptr; // compiler error: nullptr is not a variable int&r = (int*)0; // compiler error: 0 is not a variable
So let’s use one of these for our function:
void DoDamage(Weapon& weapon, Player& target) { target.Health -= weapon.Damage; }
It is now impossible for this function to dereference null and crash. That’s because we can’t create a null reference to pass as an argument. Further, we’ve put callers on notice that null is not acceptable because C++ programmers know that references can’t be null. We’ve stated unambiguously that the function requires an actual Weapon
and an actual Player
in order to do its job.
Of course there’s always a caveat to these things. Just like how private
fields aren’t really private in C# because you could use reflection to reach in and set them, C++ has its own ways to violate compile-time guarantees. One such caveat is that references can “dangle” by referring to an object whose lifetime has ended. Using such a reference is “undefined behavior” and may lead to severe errors such as data corruption or crashes. Here’s how that might look:
Weapon& GetWeapon(int damage) { Weapon weapon{damage}; return weapon; // returns reference to local variable } // Refers to local variable in function that's been deallocated by // popping the Weapon off the stack when GetWeapon returned Weapon& weapon = GetWeapon(10);
Caveats aside, using references like this is extremely common in C++. It turns out that it’s actually quite rare for null to be an acceptable value, so references are almost always the better fit. Still, their nature as aliases can be a bit cumbersome to work with sometimes. For example, we can’t create an array of references because array elements must be variables and not aliases.
This gives rise to a couple alternatives. First, the Standard Library includes std::reference_wrapper<T>
. This is a class that contains a reference to a T
, i.e. a T&
. Class instances are variables, so they can be used where references can’t. We can’t have an array of int&
but we can have an array of reference_wrapper<int>
:
int i = 123; std::reference_wrapper<int> wrapper{i}; std::reference_wrapper<int> a[1] = {wrapper}; DebugLog(a[0]); // 123
The class also requires a T&
in its constructor, so there’s no way to create one with null:
// compiler error: no default constructor std::reference_wrapper<int> wrapper1{}; // compiler error: nullptr isn't a variable std::reference_wrapper<int> wrapper2{nullptr}; // compiler error: 0 isn't a variable std::reference_wrapper<int> wrapper3{(int*)0};
Another alternative is gsl::not_null<T>
from the Guidelines Support Library. This is a class that contains a pointer to a T
, unlike how reference_wrapper<T>
contains a reference to a T
. It’s similar to the NotNull<T>
we tried to create with C#:
int i = 123; gsl::not_null<int*> p{&i}; DebugLog(*p); // 123
It’s able to enforce that its pointer isn’t null in a couple of cases:
// compiler error: no default constructor gsl::not_null<int*> p1{}; // compiler error: no nullptr constructor gsl::not_null<int*> p2{nullptr};
However, in other cases it relies on runtime null checks:
gsl::not_null<int*> p3{(int*)0}; // runtime assert int* n = nullptr; gsl::not_null<int*> p4{n}; // runtime assert
This at least guarantees that the null checks will be performed by the not_null
: we can’t forget them and they never clutter up our code. C++ has no automatic null-checking, so this is an opt-in feature for it. Similar to references, functions taking a not_null
as a parameter clearly advertise to callers that null is not acceptable:
void DoDamage(gsl::not_null<Weapon*> weapon, gsl::not_null<Player*> target) { target->Health -= weapon->Damage; }
One other advantage of not_null
over references and reference_wrapper
is that not_null
objects can be compared. That is, we can compare the pointers they contain to determine if they’re pointing to the same location in memory. We can’t do this with references and reference_wrapper
because there’s no way to query the mechanism the compiler ends up using to implement the aliasing. Our only option there is to compare the variables being referred to. That may be significantly more expensive than a simple pointer comparison:
// Create vectors with a million elements each std::vector<int> v1{}; std::vector<int> v2{}; v1.resize(1000000); v2.resize(1000000); // Compare references to them // Compares a million elements using overloaded operator== on std::vector<int> std::vector<int>& r1 = v1; std::vector<int>& r2 = v2; DebugLog(r1 == r2); // Compare not_null objects: compares two pointers gsl::not_null<std::vector<int>*> p1{&v1}; gsl::not_null<std::vector<int>*> p2{&v2}; DebugLog(p1 == p2);
As usual, C++ has a lot more options than C#. Crucially, C++ has two ways (references and std::reference_wrapper<T>
) to create references that are guaranteed to not be null at compile time. Using these advertises to callers that null is not acceptable. There is also gsl::not_null<T>
for a more C#-style approach of coping with null rather than preventing it.
C# does not have a solution built into the language to prevent the problem. Both references and pointers are inherently nullable. The language doesn’t provide the tools for us to build our own non-nullable type like std::reference_wrapper<T>
. It’s far too easy to create a NotNull
struct that actually contains null. A NotNull
class can itself be null!
As a result, our C# code is constantly fighting a battle against null. We saw four approaches to deal with this, but they all have serious downsides. This is a rather large problem given the ubiquity of references. Unfortunately, even with C# 9’s nullable reference types we don’t have anywhere near the kinds of guarantees that C++ provides. That’s a pity given that C++ is such a famously “dangerous” language and especially given the emergence of languages like Rust that provide even better safety guarantees without sacrificing runtime performance or usability.
As a bonus, here’s a table comparing reference types in C# and C++:
Language | Name | Example | Nullable? | Assignable? | Comparable? |
---|---|---|---|---|---|
C# | Reference | Weapon |
Yes | Yes | Yes |
C# | Pointer | Weapon* |
Yes | Yes | Yes |
C# | NotNull struct |
NotNull<Weapon> |
Yes | Yes | Yes |
C# | NotNull class |
NotNull<Weapon> |
Yes | Yes | Yes |
C++ | Reference | Weapon& |
No | No | No |
C++ | Pointer | Weapon* |
Yes | Yes | Yes |
C++ | std::reference_wrapper |
std::reference_wrapper<Weapon> |
No | Yes | No |
C++ | gsl::not_null |
gsl::not_null<Weapon*> |
No | Yes | Yes |
And here’s a little glossary:
- Nullable?: can the reference be null or refer to null?
- Assignable?: can the reference be changed to refer to another object?
- Comparable?: can two references be checked to see if they refer to the same object? (note: not comparison of the values they refer to)
#1 by Jesse on September 27th, 2021 ·
I missed your weekly articles! Now I’m going to read this one and tell you how great it is when I finish.
#2 by John on September 28th, 2021 ·
For C#, why not just use structs entirely if you want to avoid null references? Also, I believe C# 10 now has parameterless constructors for structs.
#3 by jackson on September 28th, 2021 ·
Structs are value types, not reference types, so they can’t be used to refer to another object.
C# 10 does have parameterless constructors for structs. Unfortunately, this doesn’t solve the problem for a couple of reasons. First, the
default(NotNull<Weapon>)
expression would still skip the constructor. Second, the constructor wouldn’t be able to initialize its reference field to a non-null value since it has no parameters to initialize from. In C++ terms, what’s needed is todelete
the parameterless constructor likestd::reference_wrapper
does. This would eliminate the possibility that users would, intentionally or not, call the parameterless constructor or otherwise construct an object whose reference field that hasn’t been initialized to non-null. That doesn’t seem to be in the cards for C# 10.#4 by Kamikaze on October 4th, 2021 ·
I looks like structs without generated parameterless constructores might be a thing.
#5 by jackson on October 9th, 2021 ·
Thanks for the link. Record structs would indeed help this situation, but could only go as far as
not_null
in C++. That is, as discussed in the article, there would still be a need for runtime null checks as not all arguments to the constructor could be null-checked at compile time. Still, if this proposal is adopted it would be a great tool to add to our C# toolbox!#6 by Kamikaze on July 30th, 2023 ·
Please come back :(
#7 by Lova Harrison on September 29th, 2023 ·
+1
#8 by a on November 11th, 2023 ·
Where have you been?