C++ For C# Developers: Part 24 – Preprocessor
C# and C++ have similar lists of preprocessor directives like #if
, but their features and usage are very different. This is especially the case in C++ with support for “macros” that can replace code. Today we’ll look into everything we can use the preprocessor for in C++ and compare with C#’s preprocessor.
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrapup
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Conditionals
Just like in C#, the C++ preprocessor runs at an early stage of compilation. This is after the bytes of the file are interpreted as characters and comments are removed, but before the main compilation of language concepts like variables and functions. The preprocessor therefore has a very limited understanding of the source code.
It takes this limited understanding of the source code and makes textual substitutions to it. When it’s done, the resulting source code is compiled.
One common use of this in C# are the conditional “directives:” #if
, #else
, #elif
, and #endif
. These allow branching logic to take place during the preprocessing step of compilation. C# allows for logic on boolean preprocessor symbols:
// C# static void Assert(bool condition) { #if DEBUG && (ASSERTIONS_ENABLED == true) if (!condition) { throw new Exception("Assertion failed"); } #endif }
If the #if
expression evaluates to false
then the code between the #if
and #endif
is removed:
// C# static void Assert(bool condition) { }
This helps us reduce the size of the generated executable and improve run-time performance by removing instructions and memory accesses.
One common mistake is to assume the preprocessor understands more about the structure of the source code than it really does. For example, we might assume that it understands what identifiers are:
// C# void Foo() { #if Foo DebugLog("Foo exists"); #else DebugLog("Foo does not exist"); // Gets printed #endif }
C++ has similar support for preprocessor conditionals. They’re even named #if
, #else
, #elif
, and #endif
. The above C# examples are actually valid C++!
The two languages differ in a few minor ways. First, #if ABC
in C++ checks the value of ABC
, not just whether it’s defined.
// Assume the value of ZERO is 0 #if ZERO DebugLog("zero"); #else DebugLog("non-zero"); // Gets printed #endif
There are a couple ways to avoid this. First, we can use the preprocessor defined
operator to check whether the symbol is defined instead of checking its value:
#if defined(ZERO) // evaluates to 1, which is true DebugLog("zero"); // Gets printed #else DebugLog("non-zero"); #endif // Alternate version without parentheses #if defined ZERO // evaluates to 1, which is true DebugLog("zero"); // Gets printed #else DebugLog("non-zero"); #endif
The other way is to use #ifdef
and #ifndef
instead of of #if
:
#ifdef ZERO // ZERO is defined. Its value is irrelevant. DebugLog("zero"); // Gets printed #else DebugLog("non-zero"); #endif #ifndef ZERO // Check if NOT defined DebugLog("zero"); #else DebugLog("non-zero"); // Gets printed #endif
These checks are commonly used to implement header guards. Since C++17, they can also be used with __has_include
which evaluates to 1
if a header exists and 0
if it doesn’t. This is often used to check whether optional libraries are available or to choose from one of several equivalent libraries:
// If the system provides DebugLog via the debug_log.h header #if __has_include(<debug_log.h>) // Use the system-provided DebugLog #include <debug_log.h> // The system does not provide DebugLog #else // Define our own version using puts from the C Standard Library #include <cstdio> void DebugLog(const char* message) { puts(message); } #endif
This __has_include(<header_name>)
check uses the same header file search that #include <header_name>
would. To check the header file search of #include "header_name"
, we can use __has_include("header_name")
.
Macros
Both languages allow defining preprocessor symbols with #define
and un-defining them with #undef
:
// Define a preprocessor symbol #define ENABLE_LOGGING void LogError(const char* message) { // Check if the preprocessing symbol is defined // It is, so the DebugLog remains #ifdef ENABLE_LOGGING DebugLog("ERROR", message); #endif } // Un-define the preprocessor symbol #undef ENABLE_LOGGING void LogTrace(const char* message) { // Check if the preprocessing symbol is defined // It isn't, so the DebugLog is removed #ifdef ENABLE_LOGGING DebugLog("TRACE", message); #endif } void Foo() { LogError("whoops"); // Prints "ERROR whoops" LogTrace("got here"); // Nothing printed }
C# requires #define
and #undef
to appear only at the top of the file, but C++ allows them anywhere.
C++ also goes way beyond these simple preprocessor symbol definitions. It has a full “macro” system that allows for textual substitution. While this is generally discouraged in “Modern C++,” its use is still ubiquitous for certain tasks. Sometimes it’s used when the language doesn’t provide a viable alternative or at least didn’t when the code was written. Regardless, macros are widely used and it’s important to know how they work.
First, we can define an “object-like” macro by providing a value to the preprocessor symbol. Unlike C#, the value doesn’t have to be a boolean:
// Define an object-like macros #define LOG_LEVEL 1 #define LOG_LEVEL_ERROR 3 #define LOG_LEVEL_WARNING 2 #define LOG_LEVEL_DEBUG 1 void LogWarning(const char* message) { // The preprocessor symbol can be used in #if expressions #if LOG_LEVEL <= LOG_LEVEL_WARNING // The preprocessor symbol will be replaced with its value DebugLog(LOG_LEVEL_WARNING, message); // After preprocessing, the previous line becomes: DebugLog(2, message); #endif }
We can also define “function-like” macros that take parameters:
// Define a function-like macro #define MADD(x, y, z) x*y + z void Foo() { int32_t x = 2; int32_t y = 3; int32_t z = 4; // Call the function-like macro int32_t result = MADD(x, y, z); // After preprocessing, the previous line becomes: int32_t result = x*y + z; DebugLog(result); // 10 }
Unlike a runtime function call, calling a function-like macro simply performs textual substitution. It’s easy to forget this, especially when the macro is named like a normal function. This can lead to bugs and performance problems because argument expressions aren’t evaluated before the macro is called:
// Function-like macro named like a normal function, not ALL_CAPS #define square(x) x*x int32_t SumOfRandomNumbers(int32_t n) { int32_t sum = 0; for (int32_t i = 0; i < n; ++i) { sum += rand(); } return sum; } void Foo() { // Call a very expensive function int32_t result = square(SumOfRandomNumbers(1000000)); // After preprocessing, the previous line becomes: int32_t result = SumOfRandomNumbers(1000000)*SumOfRandomNumbers(1000000); DebugLog(result); // {some random number} }
With a normal function call, SumOfRandomNumbers(1000000)
would be evaluated before the function is called. With macros, it’s just textually replaced so square
ends up making two calls to it. The call is very expensive, so we have a performance problem. It’s also a bug because we’re no longer necessarily multiplying the same number by itself since the two calls may return different numbers.
To see more clearly how bugs arise, consider this macro call:
void Foo() { int32_t i = 1; int32_t result = square(++i); // After preprocessing, the previous line becomes: int32_t result = ++i*++i; DebugLog(result, i); // 6, 3 }
Again, the argument (++i
) isn’t evaluated before the macro call but rather just repeated every time the macro refers to the parameter. This means i
is incremented from 1
to 2
then again to 3
before the multiplication (*
) produces the result of 2*3=6
and sets i
to 3
. If this were a function call, we’d expect 2*2=4
and for the value of i
to be 2
afterward. These potential bugs are one reason why macros are discouraged.
Function-like macros have access to a couple of special operators: #
and ##
. The #
operator wraps an argument in quotes to create a string literal:
// Wrap msg in quotes to create "msg" #define LOG_TIMESTAMPED(msg) DebugLog(GetTimestamp(), #msg); void Foo() { // No need for quotes. hello becomes "hello". LOG_TIMESTAMPED(hello) // {timestamp} hello // Extra quotes are added and existing quotes are escaped: ""hello"" LOG_TIMESTAMPED("hello") // {timestamp} "hello" }
The ##
operator is used to concatenate two symbols, which may be arguments:
// Each line concatenates some literal text (e.g. m_) with the value of name // Backslashes are used to make a multi-line macro #define PROP(type, name) \ private: type m_##name; \ public: type Get##name() const { return m_##name; } \ public: void Set##name(const type & val) { m_##name = val; } struct Vector2 { PROP(float, X) PROP(float, Y) // These macro calls are replaced with: private: float m_X; public: float GetX() const { return m_X; } public: void SetX(const float & val) { m_X = val; } private: float m_Y; public: float GetY() const { return m_Y; } public: void SetY(const float & val) { m_Y = val; } }; void Foo() { Vector2 vec; vec.SetX(2); vec.SetY(4); DebugLog(vec.GetX(), vec.GetY()); // 2, 4 }
Macros may also take a variable number of parameters using ...
similar to functions. __VA_ARGS__
is used to access the arguments:
#define LOG_TIMESTAMPED(level, ...) DebugLog(level, GetTimestamp(), __VA_ARGS__); void Foo() { LOG_TIMESTAMPED("DEBUG", "hello", "world") // DEBUG {timestamp} hello world // This macro call is replaced by: DebugLog("DEBUG", GetTimestamp(), "hello", "world"); }
In C++20, __VA_OPT__(x)
is also available. If __VA_ARGS__
is empty, it’s replaced by nothing. If __VA_ARGS__
isn’t empty, it’s replaced by x
. This can be used to make parameters in macros like LOG_TIMESTAMPED
optional:
// __VA_OPT__(,) adds a comma only if __VA_ARGS__ isn't empty, meaning the // caller passed some log messages #define LOG_TIMESTAMPED(...) DebugLog(GetTimestamp() __VA_OPT__(,) __VA_ARGS__); void Foo() { LOG_TIMESTAMPED() // {timestamp} LOG_TIMESTAMPED("hello", "world") // {timestamp} hello world // These macro calls are replaced by: DebugLog(GetTimestamp() ); DebugLog(GetTimestamp() , "hello", "world"); }
Without __VA_OPT__
, we wouldn’t know if the macro should put a ,
or not because we wouldn’t know if there are any arguments to pass after it.
Built-in Macros and Feature-Testing
Just like how C# pre-defines the DEBUG
and TRACE
preprocessor symbols, C++ pre-defines some object-like macros:
Name | Value | Meaning |
---|---|---|
__cplusplus |
199711L (C++98 and C++03)201103L (C++11)201402L (C++14)201703L (C++17)202002L (C++20) |
C++ language version |
__STDC_HOSTED__ |
1 if there is an OS, 0 if not |
|
__FILE__ |
"mycode.cpp" |
Name of the current file |
__LINE__ |
38 |
Current line number |
__DATE__ |
"2020 10 26" |
Date the code was compiled |
__TIME__ |
"02:00:00" |
Time the code was compiled |
__STDCPP_DEFAULT_NEW_ALIGNMENT__ |
8 |
Default alignment of new . Only in C++17 and up. |
Since C++20, there are a ton of "feature test" macros available in the <version>
header file. These are all object-like and their values are the date that the language or Standard Library feature was added to C++. The intention is to compare them to __cplusplus
to determine whether the feature is supported or not. There are way too many to list here, but the following shows a couple in action:
void Foo() { if (__cplusplus >= __cpp_char8_t) { DebugLog("char8_t is supported in the language"); } else { DebugLog("char8_t is NOT supported in the language"); } if (__cplusplus >= __cpp_lib_byte) { DebugLog("std::byte is supported in the Standard Library"); } else { DebugLog("std::byte is NOT supported in the Standard Library"); } }
A complete list is available in the C++ Standard's definition of the <version>
header file.
Miscellaneous Directives
The pre-defined __FILE__
and __LINE__
values can be overridden by another preprocessor directive: #line
. This works just like in C# except that default
and hidden
aren't allowed:
void Foo() { DebugLog(__FILE__, __LINE__); // main.cpp, 38 #line 100 DebugLog(__FILE__, __LINE__); // main.cpp, 100 #line 200 "custom.cpp" DebugLog(__FILE__, __LINE__); // custom.cpp, 200 }
#error
can be used to make the compiler produce an error:
#ifndef _MSC_VER #error Only Visual Studio is supported #endif
#pragma
is used to allow compilers to provide their own preprocessor directives, just like in C#:
// mathutils.h // Compiler-specific alternative to header guards #pragma once float SqrMagnitude(const Vector2& vec) { return vec.X*vec.X + vec.Y*vec.Y; }
_Pragma("expr")
can be used instead of #pragma expr
. It has exactly the same effect:
_Pragma("once")
C#'s #region
and #endregion
aren't supported in C++, but compilers like Visual Studio allow it via #pragma
:
#pragma region Math float SqrMagnitude(const Vector2& vec); float Dot(const Vector2& a, const Vector2& b); #pragma endregion Math
Usage and Alternatives
Each new version of C++ makes usage of the preprocessor less necessary. For example, C++11 introduced constexpr variables which removed a lot of the reasons to use object-like macros:
// Before C++11 #define PI 3.14f // After C++11 constexpr float PI = 3.14f;
This made PI
an actual object so it has a type (float
), its address can be taken (&PI
), and just generally used like other objects rather than as a textually-replaced float
literal. The benefits become much greater with struct
types, lambda classes, and other non-primitives where it's not really possible to make a macro for general use:
// Before C++11 // This isn't usable in many contexts like Foo(EXPONENTIAL_BACKOFF_TIMES) #define EXPONENTIAL_BACKOFF_TIMES { 1000, 2000, 4000, 8000, 16000 } // After C++11 // This works like any array object: constexpr int32_t ExponentialBackoffTimes[] = { 1000, 2000, 4000, 8000, 16000 };
Likewise, constexpr
and consteval
functions have removed a lot of the need for function-like macros:
constexpr int32_t Square(int32_t x) { return x * x; } void Foo() { int32_t i = 1; int32_t result = Square(++i); DebugLog(result); // 4 }
These behave like regular functions rather than textual substitution. We skip all the bugs and performance problems that macros might cause but keep the compile-time evaluation. We can even force compile-time evaluation in C++20 with consteval
. We get strong typing, so Square("FOO")
is an error. We can use the function at run-time, not just compile time. It behaves like any other function: we can take function pointers, we can create member functions, and so forth.
Still, macros provide a sort of escape hatch for when we simply can't express something without raw textual substitution. The PROP
macro example above generates members with access specifiers. There's no way to do that otherwise. That example might not be the best idea, but others really are. A classic example is an assertion macro:
// When assertions are enabled, define ASSERT as a macro that tests a boolean // and logs and terminates the program when it's false. #ifdef ENABLE_ASSERTS #define ASSERT(x) \ if (!(x)) \ { \ DebugLog("assertion failed"); \ std::terminate(); \ } // When assertions are disabled, assert does nothing #else #define ASSERT(x) #endif bool IsSorted(const float* vals, int32_t length) { for (int32_t i = 1; i < length; ++i) { if (vals[i] < vals[i-1]) { return false; } } return true; } float GetMedian(const float* vals, int32_t length) { ASSERT(vals != nullptr); ASSERT(length > 0); ASSERT(IsSorted(vals, length)); if ((length & 1) == 1) { return vals[length / 2]; // odd } float a = vals[length / 2 - 1]; float b = vals[length / 2]; return (a + b) / 2; } void Foo() { float oddVals[] = { 1, 3, 3, 6, 7, 8, 9 }; DebugLog(GetMedian(oddVals, 7)); float evenVals[] = { 1, 2, 3, 4, 5, 6, 8, 9 }; DebugLog(GetMedian(evenVals, 8)); DebugLog(GetMedian(nullptr, 1)); float emptyVals[] = {}; DebugLog(GetMedian(emptyVals, 0)); float notSortedVals[] = { 3, 2, 1 }; DebugLog(GetMedian(notSortedVals, 3)); }
Calling ASSERT
with assertions enabled performs the following replacement:
ASSERT(IsSorted(vals, length)); // Becomes: if (!(IsSorted(vals, length))) { DebugLog("assertion failed"); std::terminate(); }
When disabled, everything's removed including the expressions passed as arguments:
ASSERT(IsSorted(vals, length)); // Becomes:
Now imagine we had used a constexpr
function instead of a macro:
#ifdef ENABLE_ASSERTS constexpr void ASSERT(bool x) { if (!x) { DebugLog("assertion failed"); std::terminate(); } } #else constexpr void ASSERT(bool x) { } #endif
When assertions are disabled, we get the empty constexpr
function:
constexpr void ASSERT(bool x) { }
But when we call ASSERT
the arguments still need to be evaluated even though the function itself does nothing:
ASSERT(IsSorted(vals, length)); // Is equivalent to: bool x = IsSorted(vals, length); Assert(x); // does nothing
The compiler might be able to determine that the call to IsSorted
has no side effects and can be safely removed. In many cases, it won't be able to make this determination and an expensive call to IsSorted
will still take place. We don't want this to happen, so we use a macro.
Macros can also be used to implement a primitive form of C# generics or C++ templates, which we'll cover soon in the series:
// "Generic"/"template" of a Vector2 class #define DEFINE_VECTOR2(name, type) \ struct name \ { \ type X; \ type Y; \ }; // Invoke the macro to generate Vector2 classes DEFINE_VECTOR2(Vector2f, float); DEFINE_VECTOR2(Vector2d, double); // "Generic"/"template" of a function #define DEFINE_MADD(type) \ type Madd(type x, type y, type z) \ { \ return x*y + z; \ } // Invoke the macro to generate Madd functions DEFINE_MADD(float); DEFINE_MADD(int32_t); void Foo() { // Use the generated Vector2 classes // Use sizeof to show that they have different component sizes Vector2f v2f{2, 4}; DebugLog(sizeof(v2f), v2f.X, v2f.Y); // 8, 2, 4 Vector2d v2d{20, 40}; DebugLog(sizeof(v2d), v2d.X, v2d.Y); // 16, 20, 40 // Use the generated Madd functions // Use typeid on the return value to show that they're overloads float xf{2}, yf{3}, zf{4}; auto maddf{Madd(xf, yf, zf)}; DebugLog(typeid(maddf) == typeid(float)); // true DebugLog(typeid(maddf) == typeid(int32_t)); // false int32_t xi{2}, yi{3}, zi{4}; auto maddi{Madd(xi, yi, zi)}; DebugLog(typeid(maddi) == typeid(float)); // false DebugLog(typeid(maddi) == typeid(int32_t)); // true }
This form of code generation is commonly used in C codebases that lack C++ templates. When templates are available, as they are in all versions of C++, they are the preferred option for many reasons. One reason is the ability to "overload" a class name so we just have Vector2
rather than coming up with awkward unique names like Vector2f
and Vector2d
.
Another is that there's no need for, usually large, lists of DEFINE_X
macro calls for every permutation of types needed in every class and function. This really gets out of control when there are several "type parameters." Instead, the compiler generates all the permutations of the class or function based on our usage of them so we don't need to explicitly maintain such lists.
There are many more reasons that we'll get into when we cover templates later in the series.
Conclusion
The two languages have a lot of overlap in their use of the preprocessor. It runs at the same stage of compilation and features many identically-named directives with the same functionality.
The major points of divergence are in #include
, an essential part of the build model before C++20, and in macros created by #define
. Function-like macros represent another form of compile-time programming that runs during preprocessing as opposed to constexpr
which runs during main compilation. They're also another form of generics or templates. While their necessity has diminished over time, they are still essential for some tasks and convenient for others.
#1 by Pratik Chowdhury on January 22nd, 2021 ·
// After C++11
Small typo here
I think you meant
Nice article BTW and thanks for the series and I hope you continue writing it!!!
#2 by jackson on January 23rd, 2021 ·
Thanks for pointing this out. I’ve updated the article with a fix for the typo.
#3 by typoman on March 14th, 2021 ·
typo here: IsSorted(vals, length;
#4 by jackson on March 14th, 2021 ·
Thanks for letting me know! I’ve updated the article to correct the typo.