C++ For C# Developers: Part 5 – Build Model
Today’s article continues the series by introducing C++’s build model, which is very different from C#. We’ll go into preprocessing, compiling, linking, header files, the one definition rule, and many other aspects of how our source code gets built into an executable.
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrapup
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Compiling and Linking
With C#, we compile all our source code files (.cs
) into an assembly such as an executable (.exe
) or a library (.dll
).
With C++, we compile all our translation units (source code files with .cpp
, .cxx
, .cc
, .C
, or .c++
) into object files (.obj
or .o
) and then link them together into an executable (app.exe
or app
), static library (.lib
or .a
), or dynamic library (.dll
or .so
).
If any of the source code files changed, we recompile them to generate a new object files and then run the linker with all the unchanged object files too.
This model brings up a couple of questions. First, what is an object file? This is known as an “intermediate” file since it’s neither the source code nor an output file like an executable. The C++ language standard doesn’t say anything about what the format of this file is. In practice, it’s a binary file that is specific to a particular version of a particular compiler configured with particular settings. If the compiler, version, or settings change, all the code needs to be rebuilt.
Second, what is the difference between a static library and a dynamic library? A dynamic library is very similar to a dynamic library in C#. It’s a library of machine code, just like an executable. However, it can be loaded and unloaded by an executable or other dynamic library at runtime. A static library, on the other hand, can only be loaded at compile time and can never be unloaded. In this way, it functions more like just another object file:
Because static libraries are available at build time, the linker builds them directly into the resulting executable. This means there’s no need to distribute a separate dynamic library file to end users, no need to open it from the file system separately, and no possibility of overriding its location such as by setting the LD_LIBRARY_PATH
environment variable.
Critically for performance, all calls into functions in the static library are just normal function calls. This means there’s no indirection through a pointer that is set at runtime when a dynamic library is loaded. It also means that the linker can perform “link time optimizations” such as inlining these functions.
The main downsides stem from needing the static libraries to be present at compile time. This makes them unsuitable for tasks such as loading user-created plugins. Perhaps most importantly for large projects, they must be linked in every build even if just one small source file was changed. Link times grow proportionally and can hinder rapid iteration. As a result, sometimes dynamic libraries will be used in development builds and static libraries will be used in release builds.
We won’t discuss the specifics of how to run the compiler and linker in this series. This is heavily dependent on the specific compiler, OS, and game engine being used. Usually game engines or console vendors will provide documentation for this. Also typical is to use an IDE like Microsoft Visual Studio or Xcode that provides a “project” abstraction for managing source code files, compiler settings, and so forth.
Header Files and the Preprocessor
In C#, we add using
directives to reference code in other files. C++ has a similar “module” system added in C++20 which we’ll cover in a future article in this series. For now, we’ll pretend like that doesn’t exist and only discuss the way that C++ has traditionally been built.
Header files (.h
, .hpp
, .hxx
, .hh
, .H
, .h++
, or no extension) are by far the most common way for code in one file to reference code in another file. These are simply C++ source code files that are intended to be copy-and-pasted into another C++ source code file. The copy-and-paste operation is performed by the preprocessor.
Just like in C#, preprocessor directives like #if
are evaluated before the main phase of compilation. There is no separate preprocessor executable that must be called to produce an intermediate file that the compiler receives. Preprocessing is simply an earlier step for the compiler.
C++ uses a preprocessor directive called #include
to copy and paste a header file’s contents into another header file (.h
) or a translation unit (.cpp
). Here’s how it looks:
// math.h int Add(int a, int b); // math.cpp #include "math.h" int Add(int a, int b) { return a + b; }
The #include "math.h"
tells the preprocessor to search the directory that math.cpp
is in for a file named math.h
. If it finds such a file, it reads its contents and replaces the #include
directive with them. Otherwise, it searches the “include paths” it’s been configured with. The C++ Standard Library is implicitly searched. If math.h
isn’t found in any of these locations, the compiler produces an error.
Afterward, math.cpp
looks like this:
int Add(int a, int b); int Add(int a, int b) { return a + b; }
Recall from last week’s article that the first Add
is a function declaration and the second is a function definition. Since the signatures match, the compiler knows we’re defining the earlier declaration.
So far we’ve split the declaration and definition across two files, but without much benefit. Now let’s make this pay off by adding another translation unit:
// user.cpp #include "math.h" int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); }
This shows how user.cpp
can add the same #include "math.h"
to access the declaration of Add
, resulting in this:
int Add(int a, int b); int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); }
Now the compiler will encounter the declaration of Add
and be OK with AddThree
calling it even though there’s no definition of Add
yet. It simply makes a note in the object file it outputs (user.obj
) that Add
is an unsatisfied dependency.
When the linker executes, it reads in user.obj
and math.obj
. math.obj
contains the definition of Add
and user.obj
contains the definition of AddThree
. At that point, the linker really needs the definition of Add
, so it uses the one it found in math.obj
.
There is an alternative version of #include
that’s commonly seen:
#include <math.h>
This version is meant to search just for the C++ Standard Library and other header files that the compiler provides. For example, Microsoft Visual Studio allows #include <windows.h>
to make Windows OS calls. This is useful to disambiguate file names that are both in the application’s codebase and provided by the compiler. Imagine this program:
#include "math.h" bool IsNearlyZero(float val) { return fabsf(val) < 0.000001f; }
fabsf
is a function in the C Standard Library to take the absolute value of a float
. When the preprocessor runs with the quotes version of #include
it finds our math.h
, so we get this:
int Add(int a, int b); bool IsNearlyZero(float val) { return fabsf(val) < 0.000001f; }
Then the compiler can’t find fabsf
so it errors. Instead, we should use the angle brackets version of #include
since we’re looking for the compiler-provided math.h
:
#include <math.h> bool IsNearlyZero(float val) { return fabsf(val) < 0.000001f; }
This produces what we wanted:
float fabsf(float arg); // ...and many, many more math function declarations... bool IsNearlyZero(float val) { return fabsf(val) < 0.000001f; }
Also note that we can specify paths in the #include
that correspond to a directory structure:
#include "utils/math.h" #include <nlohmann/json.hpp>
Finally, while it’s esoteric and usually best avoided, there is nothing stopping us from using #include
to pull in non-header files. We can #include
any file as long as the result is legal C++. Sometimes #include
is even placed in the middle of a function to fill in part of its body!
ODR and Include Guards
C++ has what it calls the “one definition rule,” commonly abbreviated to ODR. This says that there may be only one definition of something in a translation unit. This includes variables and functions, which presents us some problems as our codebase grows. Imagine we’ve expanded our math library and added a vector math library on top of it:
// math.h int Add(int a, int b); float PI = 3.14f; // vector.h #include "math.h" float Dot(float aX, float aY, float bX, float bY); // user.cpp #include "math.h" #include "vector.h" int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
Here we have vector.h
using #include
to pull in math.h
. We also have user.cpp
using #include
to pull in both vector.h
and math.h
. This is a good practice since it avoids an implicit dependency on math.h
that would break if vector.h
was ever changed to remove the #include "math.h"
. Still, we’re about to see that this presents a problem. Let’s look at user.cpp
after the preprocessor has replaced the #include "math.h"
directive:
int Add(int a, int b); float PI = 3.14f; #include "vector.h" int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
Now the compiler replaces the #include "vector.h"
:
int Add(int a, int b); float PI = 3.14f; #include "math.h" float Dot(float aX, float aY, float bX, float bY); int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
Finally, it replaces the #include "math.h"
from the contents of vector.h
that it copied in:
int Add(int a, int b); float PI = 3.14f; int Add(int a, int b); float PI = 3.14f; float Dot(float aX, float aY, float bX, float bY); int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
Multiple declarations of the Add
function are OK because they’re not definitions so they don’t violate the ODR. The compiler simply ignores the duplicate declarations.
The definition of PI
, on the other hand, is most certainly a definition. Having two definitions of the same variable name violates the ODR and we get a compiler error.
To work around this, we add what’s called an “include guard” to our header files. There are two basic forms this can take, but both make use of the preprocessor. Here’s the first form in math.h
:
#if (!defined MATH_H) #define MATH_H int Add(int a, int b); float PI = 3.14f; #endif
This makes use of the #if
, #define
, and #endif
directives, which are similar to their C# counterparts. The only real difference in this case is the use of !defined MATH_H
in C++ instead of just !MATH_H
in C#.
One variant of this is to make use of a C++-only #ifndef MATH_H
as a sort of shorthand for #if (!defined MATH_H)
:
#ifndef MATH_H #define MATH_H int Add(int a, int b); float PI = 3.14f; #endif
In either case, we choose a naming convention and apply our file name to it to generate a unique identifier for the file. There are many popular forms for this including these:
math_h MATH_H MATH_H_ MYGAME_MATH_H
To avoid needing to come up with unique names, all common compilers offer the non-standard #pragma once
directive:
#pragma once int Add(int a, int b); float PI = 3.14f;
Regardless of the form chosen, let’s look at how this helps avoid the ODR violation. Here’s how user.cpp
looks after all the #include
directives are resolved: (indentation added for clarity)
#ifndef MATH_H #define MATH_H int Add(int a, int b); float PI = 3.14f; #endif #ifndef VECTOR_H #define VECTOR_H #ifndef MATH_H #define MATH_H int Add(int a, int b); float PI = 3.14f; #endif float Dot(float aX, float aY, float bX, float bY); #endif int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
On the first line (#ifndef MATH_H
), the preprocessor finds that MATH_H
isn’t defined so it keeps all the code until the #endif
. That includes a #define MATH_H
, so now it’s defined.
Likewise, the #ifndef VECTOR_H
succeeds and allows VECTOR_H
to be defined. The nested #ifndef MATH_H
, however, fails because MATH_H
is now defined. Everything until the matching #endif
is stripped out.
In the end, we have this result:
int Add(int a, int b); float PI = 3.14f; float Dot(float aX, float aY, float bX, float bY); int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; }
The duplicate definition of PI
has been effectively removed from the translation unit by the include guard, so we no longer get a compiler error for the ODR violation.
Inline
Even with the ODR compiler error fixed, we still have a problem: a linker error. The reason for this is that the vector.cpp
translation unit also contains a copy of PI
. Here’s how it looks originally:
#include "vector.h" float Dot(float aX, float aY, float bX, float bY) { return Add(aX*bX, aY+bY); }
Here it is after the preprocessor resolves the #include
directives:
#ifndef VECTOR_H #define VECTOR_H #ifndef MATH_H #define MATH_H int Add(int a, int b); float PI = 3.14f; #endif float Dot(float aX, float aY, float bX, float bY); #endif float Dot(float aX, float aY, float bX, float bY) { return Add(aX*bX, aY+bY); }
Remember that each translation unit is compiled separately. In this translation unit, MATH_H
and VECTOR_H
have not been set with #define
as they were in the user.cpp
translation unit. So both of the include guards succeed and we get this:
int Add(int a, int b); float PI = 3.14f; float Dot(float aX, float aY, float bX, float bY); float Dot(float aX, float aY, float bX, float bY) { return Add(aX*bX, aY+bY); }
That’s great for the purposes of compiling this translation unit since there are no duplicate definitions to violate the ODR. Compilation will succeed, but linking will fail.
The reason for the linker error is that, by default, we can’t have duplicate definitions of PI
at link time either. If we want to do that, we need to add the inline
keyword to PI
to tell the compiler that multiple definitions should be allowed. That’ll result in these translation units:
// user.cpp int Add(int a, int b); inline float PI = 3.14f; float Dot(float aX, float aY, float bX, float bY); int AddThree(int a, int b, int c) { return Add(a, Add(b, c)); } bool IsOrthogonal(float aX, float aY, float bX, float bY) { return Dot(aX, aY, bX, bY) == 0.0f; } // vector.cpp int Add(int a, int b); inline float PI = 3.14f; float Dot(float aX, float aY, float bX, float bY); float Dot(float aX, float aY, float bX, float bY) { return Add(aX*bX, aY+bY); }
It may seem strange that inline
is a keyword applied to variables. The historical reason for this is that it was originally a hint to the compiler that it should inline functions but, like the register
keyword, this was non-binding and virtually always ignored. It’s come to mean “multiple definitions are allowed” instead, so it can now be applied to both variables and functions.
For example, we could add a function definition to math.h
as long as it’s inline
:
inline int Sub(int a, int b) { return a - b; }
This is often avoided though because any change to the function will require recompiling all of the translation units that include it, directly or indirectly, which may take quite a while in a big codebase.
Linkage
Finally for today, C++ has the concept of “linkage.” By default, variables like PI
have external linkage. This means it can be referenced by other translation units. For example, say we added a variable to math.cpp
:
float SQRT2 = 1.4f;
Now say we want to reference it from user.cpp
. The #include "math.h"
won’t work because SQRT2
is in math.cpp
, not math.h
. We can still reference it using the extern
keyword:
extern float SQRT2; float GetDiagonalOfSquare(float widthOrHeight) { return SQRT2 * widthOrHeight; }
This is similar to a function declaration in that we’re telling the compiler to trust us and pretend a float
exists with the name SQRT2
. So when it compiles user.cpp
it makes a note in the user.obj
object file that we haven’t yet satisfied the dependency for SQRT2
. When the compiler compiles math.cpp
, it makes a note that there is a float
named SQRT2
available for linking.
Later on, the linker runs and reads in user.obj
as well as all the other object files including math.obj
. While processing user.obj
, it reads that note from the compiler saying that the definition of SQRT2
is missing and it goes looking through the other object files to find it. Lo and behold, it finds a note in math.obj
saying that there’s a float
named SQRT2
so the linker makes GetDiagonalOfSquare
refer to that variable.
Quick note: the extern
keyword can also be applied in math.cpp
, but this has no effect since external linkage is the default. Still, here’s how it’d look:
extern float SQRT2 = 1.4f;
One way to prevent this behavior is to add the static
keyword to SQRT2
. This changes the linkage to “internal” and prevents the compiler from adding that note to math.obj
to say that a float
variable named SQRT2
is available for linking.
static float SQRT2 = 1.4f;
Now if we try to link user.obj
and math.obj
, the linker can’t find any available definition of SQRT2
in any of the object files so it produces an error.
Both extern
and static
can be used with functions, too. For example:
// math.cpp int Sub(int a, int b) { return a - b; } static int Mul(int a, int b) { return a * b; } // user.cpp extern int Sub(int a, int b); int SubThree(int a, int b, int c) { return Sub(Sub(a, b), c); } extern int Mul(int a, int b); // compiler error: Mul is `static`
Conclusion
Today we’ve seen C++’s very different approach to building source code. The “compile then link” approach combined with header files has domino effects into the ODR, linkage, and include guards. We’ll go into C++20’s module system that solves a lot of these problems and results in a much more C#-like build model later on in the series, but header files will still be very relevant even with modules. There’s also a lot more detail to go into with respect to the ODR and linkage, but we’ll cover that incrementally as we introduce more language concepts like templates and thread-local variables.
#1 by Jonathan Pace on June 17th, 2020 ·
“If math.cpp isn’t found in any of these locations, the compiler produces an error.”
I think you meant ‘If math.h isn’t found…’
#2 by jackson on June 17th, 2020 ·
Thanks for pointing this out. I’ve fixed the typo.
#3 by M. S. Farzan on October 7th, 2020 ·
One of the best explanations of header guards that I’ve seen. Thank you!
#4 by typoman on March 24th, 2021 ·
typo: “porportionally”
#5 by jackson on March 26th, 2021 ·
Thanks, Typoman!
#6 by Rick on May 14th, 2021 ·
Misspelling: The correct spelling is “Lo and behold…” (not “Low…”)
https://getproofed.com/writing-tips/idiom-tips-lo-and-behold-or-low-and-behold/
#7 by jackson on May 14th, 2021 ·
Thanks for letting me know. I’ve updated the article to fix the typo.
#8 by typoman's squire on November 4th, 2022 ·
‘…and no possibily of overriding’ you prob meant possibility
#9 by jackson on November 4th, 2022 ·
Fixed. Thanks!
#10 by Chipboard on December 28th, 2022 ·
I just wanted to say thank you for the depth you have gone to properly cover and teach all of these complex C++ topics. I really appreciate this course, and would like to mention that as a C# developer, this is the best course I’ve found so far. Well done, and much appreciated! Seriously! I cannot express the gratitude I have enough!