The Virtue of Stopping
What do you do when your code finds a bug? We write code to check for null references and out-of-bounds indexes all the time. What’s the proper way to respond when we find a problem? Today we’ll look at two options and see how they pan out.
The Problem
Let’s start by looking at an example of a function that takes a player’s ID and returns the player that has that ID:
Player FindPlayer(int id) { foreach (Player player in m_Players) { if (player.Id == id) { return player; } } // {the last line of the function} }
The important part of this function is the last line. It only executes if the loop didn’t find and return a player. For the sake of example, let’s say that this case is a bug. We want to treat it like indexing into a Dictionary
with a key that isn’t contained: a programmer error. We don’t want to treat it like Dictionary.TryGetValue
where the key being contained is optional. It’s also worth noting that we’re talking about errors that originate in the code itself (i.e. bugs), not handling of external factors such as reading corrupt data.
The crucial question here is this: what should the last line of this function look like? We have three main options:
- Workaround the bug
- Throw an exception
- Exit the application
Workaround the Bug
This answer to the question takes the form of returning an error code:
Player FindPlayer(int id) { foreach (Player player in m_Players) { if (player.Id == id) { return player; } } Debug.LogError($"Player {id} not found"); return null; }
The error log is optional, but doesn’t fundamentally change the approach.
Regardless of whether an error is logged, the program continues to execute. The caller receives a return value just as they would if the player was found. So now we need to look at an example caller to see what happens after FindPlayer
returns:
int GetPlayerScore(int id) { Player player = FindPlayer(id); return player.Score; }
Since null
is returned, player
is null
and player.Score
results in a NullReferenceException
being thrown. In this case, the effects are felt very soon after the bug occurred on the previous line where an invalid player ID was passed to FindPlayer
. In other cases, the null
might be stored to an array and only discovered much later on in the program in a completely different part of the codebase.
At this point we have another decision to make: what do we do about the NullReferenceException
? The approach so far is to keep the program running, so let’s keep with that approach and write this:
int GetPlayerScore(int id) { Player player = FindPlayer(id); if (player != null) { return player.Score; } Debug.LogError($"Player {id} not found"); return -1; }
Again, the Debug.LogError
line is optional. The important part is the last line: return -1;
. The bug is handled like any other program flow, kicking the can to the next caller.
So let’s take a look at a caller of GetPlayerScore
. We’ll write a function to compute the total score of a two-player team. Here’s how it looks:
int GetTeamScore(int idA, int idB) { int scoreA = GetPlayerScore(idA); int scoreB = GetPlayerScore(idB); return scoreA + scoreB; }
Simple addition is all that’s required as long as the players are found. However, since GetPlayerScore
might not find the player via FindPlayer
and instead returns -1
, this logic is flawed. Players that aren’t found are treated as though they have -1
points. So if one player ID is wrong and the other is right, the return value is one less than the right player’s score. That might even allow for -1
to be returned. If both player IDs are wrong, -2
is returned.
So let’s continue the approach of detecting and working around errors and change GetTeamScore
to this:
int GetTeamScore(int idA, int idB) { int scoreA = GetPlayerScore(idA); if (scoreA < 0) { Debug.LogError($"Player A {idA} not found"); scoreA = 0; } int scoreB = GetPlayerScore(idB); if (scoreB < 0) { Debug.LogError($"Player B {idB} not found"); scoreB = 0; } return scoreA + scoreB; }
Now we’ll avoid any negative scores, but may well end up with a 0
even for the winning team.
It’s time to explore another approach.
Throw an Exception
This approach throws an exception when the bug is detected:
Player FindPlayer(int id) { foreach (Player player in m_Players) { if (player.Id == id) { return player; } } throw new ArgumentException( $"Player {id} not found", nameof(id)); }
How does that impact the caller? Here’s what it looks like:
int GetPlayerScore(int id) { Player player = FindPlayer(id); return player.Score; }
If FindPlayer
throws an exception, the last line of the function won’t be executed because the exception will propagate out of GetPlayerScore
into its caller:
int GetTeamScore(int idA, int idB) { int scoreA = GetPlayerScore(idA); int scoreB = GetPlayerScore(idB); return scoreA + scoreB; }
Either of these GetPlayerScore
calls could throw an exception, but the effect on GetTeamScore
is the same: it exits without returning a value as the exception propagates to the next caller.
At some point the exception will either be caught by the game or Unity will catch it. When Unity catches it, it logs an error and continues on with the next frame of the game. This is like a delayed version of the first approach where the code attempts to work around the bug. Unity hopes the bug won’t happen again on the next frame. It has no idea how much damage was caused by the exception falling through an unknown number of functions that only partially executed. Can we be sure that the game is in a valid state after such an event? Can the game code catch the exception and do any better at recovery than Unity did?
Exit the Application
This approach looks like this:
Player FindPlayer(int id) { foreach (Player player in m_Players) { if (player.Id == id) { return player; } } Game.Panic($"Player {id} not found"); return null; }
The return null;
needs to be present to keep the C# compiler happy, but it’ll never execute. That’s because Game.Panic
looks like this:
public static class Game { public static void Panic(string message) { Debugger.Break(); Debug.LogError($"Panic: {message}"); #if UNITY_EDITOR EditorApplication.isPlaying = false; #else Application.Quit(1); #endif } }
This breaks the debugger, logs the error, and then exits the application with an error code. FindPlayer
and its callers won’t continue to execute. An exception won’t fall through the call stack, and the game will absolutely stop at this point.
Alternatively, panicking can be limited to only debug builds in a similar way to asserts:
Player FindPlayer(int id) { foreach (Player player in m_Players) { if (player.Id == id) { return player; } } Game.DebugPanic($"Player {id} not found"); return null; }
DebugPanic
looks like this:
public static class Game { [Conditional("DEVELOPMENT_BUILD")] public static void DebugPanic(string message) { #if DEVELOPMENT_BUILD Panic(message); #endif } }
In a debug build, this panics. In a release build, this function does nothing because it’s empty and all calls to it are rmeoved.
Let’s see what effect this approach has on the callers:
int GetPlayerScore(int id) { Player player = FindPlayer(id); return player.Score; }
Since FindPlayer
won’t return if a bug happens, the player.Score
will never be executed.
Now let’s look at the caller-of-the-caller:
int GetTeamScore(int idA, int idB) { int scoreA = GetPlayerScore(idA); int scoreB = GetPlayerScore(idB); return scoreA + scoreB; }
Similarly, GetPlayerScore
will never return nor will any exception fall through this function if the bug occurs on either player A or B.
Conclusion
When code detects a bug, we have three main options. We can try to work around it with techniques like returning error codes, but this causes major problems. The bug has a ripple effect through the rest of the codebase as more and more functions need to deal with a scenario that should have never happened in the first place. It turns a bug in one function into a bug that almost all functions need to handle. The codebase becomes littered with if
statements to keep checking for bugs that originated in far flung places. All of this slows down the code and makes it dramatically harder to read and understand. Even when the bug is fixed, all of this litter remains and is unlikely to ever be feasibly cleaned up.
Throwing an exception looks like a better solution because there aren’t so many error codes to handle or error logs to write. The exception will eventually be caught. If it’s caught by the immediate caller, it’s just as good as returning an error code. If it’s caught at some root level, such as by Unity, all of the half-executed functions on the call stack have probably caused serious, unrecoverable damage to the game state. The result will likely be a domino effect where errors on one frame cause errors on the next and the next.
That brings us to the final approach: immediately quit the game. This approach doesn’t have the ripple effect of the workaround approach, nor does it have the domino effect of exceptions. Instead, it focuses on immediately surfacing the bug to the programmer so they can debug it either in a debugger or via the error log.
As much as we want to avoid crashing, intentionally quitting the game is actually a very appealing bug-handling strategy. When we realize that none of these code-based solutions actually solve the bug but do create a huge mess, just stopping the program makes a lot of sense.
#1 by Arnaud Jamin on August 19th, 2019 ·
I would love to use the “Exit” approach, but this does not work when working in team. In video games, we have designers, level designers, artists, and other programmers using our executable. They need the executable to be stable to perform their job. They usually work on something that is not related to your current work. They will not be able to test their work, or sometime even work at all, if you exit the game every time there is a potential bug. Sadly, the “Exit” approach reduces the productivity of the team.
#2 by jackson on August 19th, 2019 ·
I definitely understand that it’s important to keep other members of the team productive. I’m definitely not arguing that it’s OK to write bugs or to stop other members of the team from working.
That said, in the article I assume that you’ve written an actual (not potential) bug and some other code in the game has detected it. I argue that none of the bug-handling approaches can fix the bug. I argue that, at best, the game might hobble along enough for a team member to work. I argue that the “workaround” approach causes great damage to the codebase and should be avoided.
The “exception” approach is, however, a decent middle-ground for users that don’t want the game to exit and would prefer to take their chances with the likely broken game state that’s occurred due to the exception falling through the call stack. In the article, I talk about using the preprocessor to control the approach used. I show how it can be used with
DEVELOPMENT_BUILD
to only exit when bugs are found in development builds, but you can easily swap this out to another symbol to use the “exception” approach when the “exit” approach isn’t desired. Here’s an example:This allows you to use the preprocessor choose the bug-handling approach you’d like to take on a user-by-user basis. For example, a programmer might use the “exit” approach while an artist might use the “exception” approach. The “exit” approach can then be a boon to the programmer without being a hindrance to the artist, if you feel that the “exit” approach would have been.
#3 by ChessMax on August 19th, 2019 ·
There is a problem with
Application.Quit
. It doesn’t work on iOS.#4 by jackson on August 19th, 2019 ·
Feel free to use platform-specific code to exit more thoroughly than the example in the article. For example:
#5 by Chris Ender on January 16th, 2020 ·
On iOS you can divide by zero, which immediately ends the app and returns to desktop.
On Android it does not.
#6 by Sergey on August 19th, 2019 ·
If you talk about production run, you can just stop sending error events to your logstash after the exception.
In the Unity editor, you can turn on “pause on error” which will stop an application.
#7 by jackson on August 19th, 2019 ·
Good point about “pause on error.” That can be a good way to stop and inspect the game state. Unfortunately, the exception will have already fallen through the call stack so execution would have already passed the point where the bug was detected and is optimally debuggable. It may be difficult to figure out from that point why the bug occurred. It’s usually better than continuing on to the next frame though, as the bug will be even harder to detect later on in the gameplay.
As for production, stopping logs may prevent the “domino effect” of the exception from filling up the logs with the same error message over and over. However, if that’s the case then the error is happening repeatedly so the game is unlikely to be playable and the game state is likely to be corrupt. That may lead to some really obvious problems such as graphical errors and some really non-obvious ones such as bogus data being reported to a stats database. Still, there’s always room for a middle ground. One approach is to continue the game if an exception occurs on one frame but exit if an exception occurs on two consecutive frames. There are many valid strategies that combine one or more of the “workaround,” “exception,” and “exit” approaches.
#8 by Mike on August 20th, 2019 ·
I totally agree with the sentiment of this article. My personal opinion is, if something goes wrong: Crash, crash immediately and crash hard, so that the error does not go unnoticed. Your approach is a bit more elegant :)
Some people mention cost of experiencing bugs during development, which is a valid point. But that cost has to be weighed against the cost of a bug potentially going unnoticed for a long time. This cost has in my experience been a lot higher.
If it is needed for artists to work on a version that is bug free, they can work on the code in master branch, and we can merge to it when it has been thoroughly tested. (depending on your branching strategy and VCS of course)
#9 by jackson on August 20th, 2019 ·
That’s a good point. There is usually an option to return to a previous build or version of the game in order to restore functionality when a bug inadvertently breaks artists’ workflow.
#10 by Arthur on December 23rd, 2019 ·
With the inline out parameters of C# 7.0, I never return a Player object from the FindPlayer method, I either do
bool GetPlayer(out Player player)
or in case with GetPlayerScore I will return an error enum instead of bool because there might be no player or no record of score for that player or any other imaginable problem. If anything unexpected happens I will log the error to telemetry server. Throwing an exception is too dangerous because of the reasons you mentioned. Quitting the game is just plain stupid. Player should be able to play the game even if some insignificant part of the game has failed.After many years of dealing with problems with games that are live in operation, I understood that writing slightly more code is never a problem. Dealing with low ratings or KPIs is.