JacksonDunstan.com

What do you do when your code finds a bug? We write code to check for null references and out-of-bounds indexes all the time. What’s the proper way to respond when we find a problem? Today we’ll look at two options and see how they pan out.

The Problem

Let’s start by looking at an example of a function that takes a player’s ID and returns the player that has that ID:

Player FindPlayer(int id)
{
    foreach (Player player in m_Players)
    {
        if (player.Id == id)
        {
            return player;
        }
    }
    // {the last line of the function}
}

The important part of this function is the last line. It only executes if the loop didn’t find and return a player. For the sake of example, let’s say that this case is a bug. We want to treat it like indexing into a Dictionary with a key that isn’t contained: a programmer error. We don’t want to treat it like Dictionary.TryGetValue where the key being contained is optional. It’s also worth noting that we’re talking about errors that originate in the code itself (i.e. bugs), not handling of external factors such as reading corrupt data.

The crucial question here is this: what should the last line of this function look like? We have three main options:

Workaround the bug
Throw an exception
Exit the application

Workaround the Bug

This answer to the question takes the form of returning an error code:

Player FindPlayer(int id)
{
    foreach (Player player in m_Players)
    {
        if (player.Id == id)
        {
            return player;
        }
    }
    Debug.LogError($"Player {id} not found");
    return null;
}

The error log is optional, but doesn’t fundamentally change the approach.

Regardless of whether an error is logged, the program continues to execute. The caller receives a return value just as they would if the player was found. So now we need to look at an example caller to see what happens after FindPlayer returns:

int GetPlayerScore(int id)
{
    Player player = FindPlayer(id);
    return player.Score;
}

Since null is returned, player is null and player.Score results in a NullReferenceException being thrown. In this case, the effects are felt very soon after the bug occurred on the previous line where an invalid player ID was passed to FindPlayer. In other cases, the null might be stored to an array and only discovered much later on in the program in a completely different part of the codebase.

At this point we have another decision to make: what do we do about the NullReferenceException? The approach so far is to keep the program running, so let’s keep with that approach and write this:

int GetPlayerScore(int id)
{
    Player player = FindPlayer(id);
    if (player != null)
    {
        return player.Score;
    }
    Debug.LogError($"Player {id} not found");
    return -1;
}

Again, the Debug.LogError line is optional. The important part is the last line: return -1;. The bug is handled like any other program flow, kicking the can to the next caller.

So let’s take a look at a caller of GetPlayerScore. We’ll write a function to compute the total score of a two-player team. Here’s how it looks:

int GetTeamScore(int idA, int idB)
{
    int scoreA = GetPlayerScore(idA);
    int scoreB = GetPlayerScore(idB);
    return scoreA + scoreB;
}

Simple addition is all that’s required as long as the players are found. However, since GetPlayerScore might not find the player via FindPlayer and instead returns -1, this logic is flawed. Players that aren’t found are treated as though they have -1 points. So if one player ID is wrong and the other is right, the return value is one less than the right player’s score. That might even allow for -1 to be returned. If both player IDs are wrong, -2 is returned.

So let’s continue the approach of detecting and working around errors and change GetTeamScore to this:

int GetTeamScore(int idA, int idB)
{
    int scoreA = GetPlayerScore(idA);
    if (scoreA < 0)
    {
        Debug.LogError($"Player A {idA} not found");
        scoreA = 0;
    }
    int scoreB = GetPlayerScore(idB);
    if (scoreB < 0)
    {
        Debug.LogError($"Player B {idB} not found");
        scoreB = 0;
    }
    return scoreA + scoreB;
}

Now we’ll avoid any negative scores, but may well end up with a 0 even for the winning team.

It’s time to explore another approach.

Throw an Exception

This approach throws an exception when the bug is detected:

Player FindPlayer(int id)
{
    foreach (Player player in m_Players)
    {
        if (player.Id == id)
        {
            return player;
        }
    }
    throw new ArgumentException(
        $"Player {id} not found",
        nameof(id));
}

How does that impact the caller? Here’s what it looks like:

int GetPlayerScore(int id)
{
    Player player = FindPlayer(id);
    return player.Score;
}

If FindPlayer throws an exception, the last line of the function won’t be executed because the exception will propagate out of GetPlayerScore into its caller:

int GetTeamScore(int idA, int idB)
{
    int scoreA = GetPlayerScore(idA);
    int scoreB = GetPlayerScore(idB);
    return scoreA + scoreB;
}

Either of these GetPlayerScore calls could throw an exception, but the effect on GetTeamScore is the same: it exits without returning a value as the exception propagates to the next caller.

At some point the exception will either be caught by the game or Unity will catch it. When Unity catches it, it logs an error and continues on with the next frame of the game. This is like a delayed version of the first approach where the code attempts to work around the bug. Unity hopes the bug won’t happen again on the next frame. It has no idea how much damage was caused by the exception falling through an unknown number of functions that only partially executed. Can we be sure that the game is in a valid state after such an event? Can the game code catch the exception and do any better at recovery than Unity did?

Exit the Application

This approach looks like this:

Player FindPlayer(int id)
{
    foreach (Player player in m_Players)
    {
        if (player.Id == id)
        {
            return player;
        }
    }
    Game.Panic($"Player {id} not found");
    return null;
}

The return null; needs to be present to keep the C# compiler happy, but it’ll never execute. That’s because Game.Panic looks like this:

public static class Game
{
    public static void Panic(string message)
    {
        Debugger.Break();
        Debug.LogError($"Panic: {message}");
        #if UNITY_EDITOR
            EditorApplication.isPlaying = false;
        #else
            Application.Quit(1);
        #endif
    }
}

This breaks the debugger, logs the error, and then exits the application with an error code. FindPlayer and its callers won’t continue to execute. An exception won’t fall through the call stack, and the game will absolutely stop at this point.

Alternatively, panicking can be limited to only debug builds in a similar way to asserts:

Player FindPlayer(int id)
{
    foreach (Player player in m_Players)
    {
        if (player.Id == id)
        {
            return player;
        }
    }
    Game.DebugPanic($"Player {id} not found");
    return null;
}

DebugPanic looks like this:

public static class Game
{
    [Conditional("DEVELOPMENT_BUILD")]
    public static void DebugPanic(string message)
    {
        #if DEVELOPMENT_BUILD
            Panic(message);
        #endif
    }
}

In a debug build, this panics. In a release build, this function does nothing because it’s empty and all calls to it are rmeoved.

Let’s see what effect this approach has on the callers:

int GetPlayerScore(int id)
{
    Player player = FindPlayer(id);
    return player.Score;
}

Since FindPlayer won’t return if a bug happens, the player.Score will never be executed.

Now let’s look at the caller-of-the-caller:

int GetTeamScore(int idA, int idB)
{
    int scoreA = GetPlayerScore(idA);
    int scoreB = GetPlayerScore(idB);
    return scoreA + scoreB;
}

Similarly, GetPlayerScore will never return nor will any exception fall through this function if the bug occurs on either player A or B.

Conclusion

When code detects a bug, we have three main options. We can try to work around it with techniques like returning error codes, but this causes major problems. The bug has a ripple effect through the rest of the codebase as more and more functions need to deal with a scenario that should have never happened in the first place. It turns a bug in one function into a bug that almost all functions need to handle. The codebase becomes littered with if statements to keep checking for bugs that originated in far flung places. All of this slows down the code and makes it dramatically harder to read and understand. Even when the bug is fixed, all of this litter remains and is unlikely to ever be feasibly cleaned up.

Throwing an exception looks like a better solution because there aren’t so many error codes to handle or error logs to write. The exception will eventually be caught. If it’s caught by the immediate caller, it’s just as good as returning an error code. If it’s caught at some root level, such as by Unity, all of the half-executed functions on the call stack have probably caused serious, unrecoverable damage to the game state. The result will likely be a domino effect where errors on one frame cause errors on the next and the next.

That brings us to the final approach: immediately quit the game. This approach doesn’t have the ripple effect of the workaround approach, nor does it have the domino effect of exceptions. Instead, it focuses on immediately surfacing the bug to the programmer so they can debug it either in a debugger or via the error log.

As much as we want to avoid crashing, intentionally quitting the game is actually a very appealing bug-handling strategy. When we realize that none of these code-based solutions actually solve the bug but do create a huge mess, just stopping the program makes a lot of sense.

#1 by Arnaud Jamin on August 19th, 2019 · Reply

I would love to use the “Exit” approach, but this does not work when working in team. In video games, we have designers, level designers, artists, and other programmers using our executable. They need the executable to be stable to perform their job. They usually work on something that is not related to your current work. They will not be able to test their work, or sometime even work at all, if you exit the game every time there is a potential bug. Sadly, the “Exit” approach reduces the productivity of the team.

#2 by jackson on August 19th, 2019 · Reply

I definitely understand that it’s important to keep other members of the team productive. I’m definitely not arguing that it’s OK to write bugs or to stop other members of the team from working.

That said, in the article I assume that you’ve written an actual (not potential) bug and some other code in the game has detected it. I argue that none of the bug-handling approaches can fix the bug. I argue that, at best, the game might hobble along enough for a team member to work. I argue that the “workaround” approach causes great damage to the codebase and should be avoided.

The “exception” approach is, however, a decent middle-ground for users that don’t want the game to exit and would prefer to take their chances with the likely broken game state that’s occurred due to the exception falling through the call stack. In the article, I talk about using the preprocessor to control the approach used. I show how it can be used with DEVELOPMENT_BUILD to only exit when bugs are found in development builds, but you can easily swap this out to another symbol to use the “exception” approach when the “exit” approach isn’t desired. Here’s an example:

public static class Game { public static void DebugPanic(string message) { #if BUG_APPROACH_EXIT Panic(message); #elif BUG_APPROACH_EXCEPTION throw new Exception($"Panic: {message}"); #else Debug.LogError($"Unhandled bug: {message}"); #endif } }

This allows you to use the preprocessor choose the bug-handling approach you’d like to take on a user-by-user basis. For example, a programmer might use the “exit” approach while an artist might use the “exception” approach. The “exit” approach can then be a boon to the programmer without being a hindrance to the artist, if you feel that the “exit” approach would have been.

#3 by ChessMax on August 19th, 2019 · Reply

There is a problem with Application.Quit. It doesn’t work on iOS.

#4 by jackson on August 19th, 2019 · Reply

Feel free to use platform-specific code to exit more thoroughly than the example in the article. For example:

public static class Game
{
    public static void Panic(string message)
    {
        Debugger.Break();
        Debug.LogError($"Panic: {message}");
        #if UNITY_EDITOR
            EditorApplication.isPlaying = false;
        #else
			#if UNITY_IOS
				// TODO: exit more thoroughly
			#else
            	Application.Quit(1);
			#endif
        #endif
    }
}

#5 by Chris Ender on January 16th, 2020 · Reply

On iOS you can divide by zero, which immediately ends the app and returns to desktop.
On Android it does not.

#6 by Sergey on August 19th, 2019 · Reply

If you talk about production run, you can just stop sending error events to your logstash after the exception.
In the Unity editor, you can turn on “pause on error” which will stop an application.

#7 by jackson on August 19th, 2019 · Reply

Good point about “pause on error.” That can be a good way to stop and inspect the game state. Unfortunately, the exception will have already fallen through the call stack so execution would have already passed the point where the bug was detected and is optimally debuggable. It may be difficult to figure out from that point why the bug occurred. It’s usually better than continuing on to the next frame though, as the bug will be even harder to detect later on in the gameplay.

As for production, stopping logs may prevent the “domino effect” of the exception from filling up the logs with the same error message over and over. However, if that’s the case then the error is happening repeatedly so the game is unlikely to be playable and the game state is likely to be corrupt. That may lead to some really obvious problems such as graphical errors and some really non-obvious ones such as bogus data being reported to a stats database. Still, there’s always room for a middle ground. One approach is to continue the game if an exception occurs on one frame but exit if an exception occurs on two consecutive frames. There are many valid strategies that combine one or more of the “workaround,” “exception,” and “exit” approaches.

#8 by Mike on August 20th, 2019 · Reply

I totally agree with the sentiment of this article. My personal opinion is, if something goes wrong: Crash, crash immediately and crash hard, so that the error does not go unnoticed. Your approach is a bit more elegant :)

Some people mention cost of experiencing bugs during development, which is a valid point. But that cost has to be weighed against the cost of a bug potentially going unnoticed for a long time. This cost has in my experience been a lot higher.

If it is needed for artists to work on a version that is bug free, they can work on the code in master branch, and we can merge to it when it has been thoroughly tested. (depending on your branching strategy and VCS of course)

#9 by jackson on August 20th, 2019 · Reply

That’s a good point. There is usually an option to return to a previous build or version of the game in order to restore functionality when a bug inadvertently breaks artists’ workflow.

#10 by Arthur on December 23rd, 2019 · Reply

With the inline out parameters of C# 7.0, I never return a Player object from the FindPlayer method, I either do bool GetPlayer(out Player player) or in case with GetPlayerScore I will return an error enum instead of bool because there might be no player or no record of score for that player or any other imaginable problem. If anything unexpected happens I will log the error to telemetry server. Throwing an exception is too dangerous because of the reasons you mentioned. Quitting the game is just plain stupid. Player should be able to play the game even if some insignificant part of the game has failed.

After many years of dealing with problems with games that are live in operation, I understood that writing slightly more code is never a problem. Dealing with low ratings or KPIs is.

The Virtue of Stopping

The Problem

Workaround the Bug

Throw an Exception

Exit the Application

Conclusion

Comments