C++ For C# Developers: Part 35 – Modules, The New Build Model
We’ve already seen C++’s traditional build model based on #include
. Today we’ll look at the all-new build model introduced in C++20. This is built on “modules” and is much more analogous to the C# build model. Read on to learn how to use it by itself and in combination with #include
!
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrap-up
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Module Basics
The new build system is based on a new language concept called a “module.” This system promises to dramatically decrease compile times, both clean and incremental. It also promises to dramatically increase encapsulation by preventing leakage of preprocessor directives and implementation details. Finally, it fully removes the need to specify file system paths in source code like we do with #include
and then use complex directory lookups to find the referenced files.
To convert a translation unit such as a .cpp
file into a “module unit,” we use an export module
statement:
/////////// // math.ixx /////////// export module math;
We’ve done two things here. First, we’ve named the module with the .ixx
extension. Module files can be named with any extension, or no extension at all, just like any other C++ source file. The .ixx
extension is used here simply because it’s the preference of Microsoft Visual Studio 2019, one of the first compilers to support modules.
Second, the line export module math;
begins a module named math
. Like the rest of C++, the source file is read from top to bottom. Everything after this statement is part of the math
module, but everything before it is not.
Currently the module is empty since there’s nothing else in the source file. Let’s add some functions:
/////////// // math.ixx /////////// // Normal function before the "export module" statement float Average(float x, float y) { return (x + y) / 2; } // Exported function before the "export module" statement export float MagnitudeSquared(float x, float y) { return x*x + y*y; } // The module begins here export module math; // Normal function after the "export module" statement float Min(float x, float y) { return x < y ? x : y; } // Exported function after the "export module" statement export float Max(float x, float y) { return x > y ? x : y; }
There are a couple things to notice here, too. First, we can add export
before anything we want to be usable from outside the module. This includes functions like these, variables, types, using aliases, templates, and namespaces. It does not include preprocessor directives such as macros.
Modules can seem analogous to namespaces, but the two are quite distinct. A module can export a namespace and a module doesn’t imply a namespace. Modules aren’t meant to replace namespaces, but they may be used for similar purposes in grouping together related functionality.
We can export anything that doesn’t have internal linkage, such as by being declared static
or inside an unnamed namespace
. Our exports must be directly inside of a namespace
block, outside of any blocks at the top level of the file, or in an export
block:
// Everything in this block is exported export { float Min(float x, float y) { return x < y ? x : y; } // Redundant "export" has no effect export float Max(float x, float y) { return x > y ? x : y; } }
Second, two of these functions are before the export module math;
statement. These are part of the “global module” rather than the math
module, just like everything outside of a namespace
is part of the “global namespace.”
There can be only one module in a module unit source file. This isn’t allowed:
// First module: OK export module math; float Min(float x, float y) { return x < y ? x : y; } // Second module: compiler error export module util; export bool IsNearlyZero(float val) { return val < 0.0001f; }
Assuming we don’t do that, let’s now use this module from another file:
/////////// // main.cpp /////////// // Import the module for usage import math; // OK: Max is found in the "math" module we imported DebugLog(Max(2, 4)); // 4 // Compiler error: none of these are part of the "math" module DebugLog(Average(2, 4)); DebugLog(MagnitudeSquared(2, 4)); DebugLog(Min(2, 4));
We use import
to name the module that we want to use. We get access to everything marked export
in that module. Unlike with header files, we don’t specify the file name of the module unit. This is similar to the C# build system where we simply name a namespace: using System;
.
Partitions and Fragments
We could put all of the code for a module in a single file, but this doesn’t scale well as we add more and more code. Imagine all of System.Collections.Generic
in a single file! C# addresses this by putting one class (List<T>
, Dictionary<K, V>
, etc.) in each file. C++ addresses this in multiple ways. The first is called “module partitions” and they allow us to split code across multiple files while still being part of a single module:
/////////////// // geometry.ixx /////////////// // Specify that this is the "geometry" partition of the "math" module export module math:geometry; export float MagnitudeSquared(float x, float y) { return x * x + y * y; } //////////// // stats.ixx //////////// // Specify that this is the "stats" partition of the "math" module export module math:stats; export float Min(float x, float y) { return x < y ? x : y; } export float Max(float x, float y) { return x > y ? x : y; } export float Average(float x, float y) { return (x + y) / 2; } /////////// // math.ixx /////////// // This is the primary "math" module export module math; // Import the "stats" partition and export it export import :stats; // Import the "geometry" partition and export it export import :geometry; /////////// // main.cpp /////////// // Import the "math" module as normal import math; // Use its exported entities as normal DebugLog(Min(2, 4)); // 2 DebugLog(Max(2, 4)); // 4 DebugLog(Average(2, 4)); // 3 DebugLog(MagnitudeSquared(2, 4)); // 20
We see here that partitions are specified with a :
. The module partition names the primary module (math
) and the name of its partition (stats
). The primary module just uses the name of the partition (:stats
) because its name (math
) has already been stated and doesn’t need to be repeated. It must export all of the partitions so the compiler knows everything that’s available in the module when it’s used.
Unlike other identifiers, module names may include a .
in them. This means we could instead use math.stats
and math.geometry
as our module names:
/////////////// // geometry.ixx /////////////// // This is a primary "math.geometry" module export module math.geometry; export float MagnitudeSquared(float x, float y) { return x * x + y * y; } //////////// // stats.ixx //////////// // This is a primary "math.stats" module export module math.stats; export float Min(float x, float y) { return x < y ? x : y; } export float Max(float x, float y) { return x > y ? x : y; } export float Average(float x, float y) { return (x + y) / 2; } /////////// // math.ixx /////////// // This is the primary "math" module export module math; // Import the "math.stats" module and export it export import math.stats; // Import the "math.geometry" module and export it export import math.geometry; /////////// // main.cpp /////////// // Import the "math" module as normal import math; // Use its exported entities as normal DebugLog(Min(2, 4)); // 2 DebugLog(Max(2, 4)); // 4 DebugLog(Average(2, 4)); // 3 DebugLog(MagnitudeSquared(2, 4)); // 20
The difference here is that math.stats
and math.geometry
aren’t partitions, they’re primary modules. Any of them can be used directly:
// Import the "math.stats" primary module import math.stats; // Use its exported entities as normal DebugLog(Min(2, 4)); // 2 DebugLog(Max(2, 4)); // 4 DebugLog(Average(2, 4)); // 3
It’s important to note that math.stats
and math.geometry
aren’t “submodules” as far as the compiler is concerned. They just happened to be named in a way that makes them appear that way. This is largely the same as C# namespaces since there’s no special relationship between System
, System.Collections
, and System.Collections.Generic
other than the naming.
Lastly, there is an implicit private
“fragment” that can hold only code that can’t possibly effect the module’s interface. This restriction allows compilers to avoid recompiling code that uses the module when only the private
fragment changes:
// Primary module export module math; // Export some function declarations export float Min(float x, float y); export float Max(float x, float y); // This begins the "private fragment" module :private; // Define some non-exported functions float Min(float x, float y) { return x < y ? x : y; } float Max(float x, float y) { return x > y ? x : y; }
Module Implementation Units
So far all of our module files have been “module interface units” since they included the export
keyword. They’re interfaces to be used by code outside the module such as our main.cpp
.
There’s another kind of module unit though: “module implementation units.” These are meant to contain implementation details of the module. They don’t use the export
keyword, but contain internal code that’s accessible from within the module:
/////////////// // geometry.ixx /////////////// // A non-exported module partition module math:geometry; // A non-exported function float MagnitudeSquared(float x, float y) { return x * x + y * y; } /////////// // math.ixx /////////// // Primary module export module math; // Import the module implementation partition import :geometry; // Export a function from the module implementation partition by declaring it // and adding the "export" keyword export float MagnitudeSquared(float x, float y); // Export more functions export float Magnitude(float x, float y) { // Call functions in the imported module implementation partition float magSq = MagnitudeSquared(x, y); return Sqrt(magSq); // TODO: write Sqrt() }
This is similar to how we’d split code across header files (.hpp
) and translation units (.cpp
). In that traditional build system, we’d add declarations of functions in the header files and definitions of those functions in the translation units.
If we don’t need the partitions but still want to separate the interface from the implementation, we can drop the import
and remove the partition name:
/////////////// // geometry.cpp /////////////// // A non-exported module module math; // A non-exported function float MagnitudeSquared(float x, float y) { return x * x + y * y; } /////////// // math.ixx /////////// export module math; // Note: no need to "import math;" since this is already the "math" module export float MagnitudeSquared(float x, float y); export float Magnitude(float x, float y) { float magSq = MagnitudeSquared(x, y); return Sqrt(magSq); // TODO: write Sqrt() }
Notice that we now have geometry.cpp
, not geometry.ixx
. This is because it can’t be imported anymore and must be used implicitly like we did in the math.ixx
module unit.
Module Linkage
In the traditional build model, there is “internal linkage” and “external linkage.” This means that something is either the same internally in a translation unit or externally across translation units. With modules, there is now “module linkage.” This means that something is the same across all module units and users of the module:
/////////////////// // statsglobals.ixx /////////////////// export module stats:globals; // Variable with "module linkage" export int NumEnemiesKilled = 0; //////////// // stats.ixx //////////// export module stats; import :globals; export void CountEnemyKilled() { // Refers to the same variable as in statsglobal.ixx NumEnemiesKilled++; } export int GetNumEnemiesKilled() { // Refers to the same variable as in statsglobal.ixx return NumEnemiesKilled; } /////////// // main.cpp /////////// import stats; DebugLog(GetNumEnemiesKilled()); // 0 CountEnemyKilled(); DebugLog(GetNumEnemiesKilled()); // 1 // Refers to the same variable as in statsglobal.ixx DebugLog(NumEnemiesKilled); // 1
Compatibility
Given the 40+ year history of C++, the new build system must be compatible with the old build system. There are a ton of existing header files that we’ll want to use with modules. Thankfully, C++ provides a new preprocessor directive to do just that:
import "mylibrary.h"; // ...or... import <mylibrary.h>;
Despite not starting with a #
and requiring a ;
at the end, this is really a preprocessor directive. It’s distinct from a regular module import
because it either has double quotes ("mylibrary.h"
) or angle brackets (<mylibrary.h>
) depending on the header search rules desired.
The effect of this directive is to export everything that’s exportable in the header file just like we added export
to its source code. We typically use it to create a “header unit” that wraps a header file in a module:
//////////////// // mylibrary.ixx //////////////// // Module that wraps mylibrary.h export module mylibrary; // Export everything in the header file that can be exported import "mylibrary.h";
There are a couple of key differences between this import
directive and #include
and import
with a module. First, contrary to #include
, preprocessor symbols defined before the import
directive are not visible to the imported header file:
////////////// // mylibrary.h ////////////// int ReadVersion() { int version = ReadTextFileAsInteger("version.txt"); #if ENABLE_LOGGING DebugLog("Version: ", version); #endif return version; } /////////// // main.cpp /////////// #include "mylibrary.h" int version = ReadVersion(); // Does not log // ...equivalent to... int ReadVersion() { int version = ReadTextFileAsInteger("version.txt"); #if ENABLE_LOGGING // Note: not defined DebugLog("Version: ", version); #endif return version; } int version = ReadVersion(); ///////////////// // mainlogged.cpp ///////////////// // Define a preprocessor symbol before #include #define ENABLE_LOGGING 1 #include "mylibrary.h" int version = ReadVersion(); // Does log // ...equivalent to... #define ENABLE_LOGGING 1 int ReadVersion() { int version = ReadTextFileAsInteger("version.txt"); #if ENABLE_LOGGING // Note: is defined DebugLog("Version: ", version); #endif return version; } int version = ReadVersion();
C++ provides a facility to work around this limitation. We can use module;
before our named module and put preprocessor directives between these two statements. Everything here will be part of the “global module” and accessible from inside the module:
/////////////// // metadata.ixx /////////////// // No module name means "global module" module; // Define a preprocessor symbol before #include // Only preprocessor symbols are allowed in this section #define ENABLE_LOGGING 1 // Use #include instead of the import directive #include "mylibrary.h" // Our named module export module metadata; // Export a function from the header file export int ReadVersion(); /////////// // main.cpp /////////// // Use the module as normal import metadata; DebugLog(ReadVersion()); // 6
The second difference between the import
directive and import
with a module is that preprocessor macros in the header file are exported:
/////////////// // legacymath.h /////////////// // Macro defined in the header file #define PI 3.14 /////////// // math.ixx /////////// export module math; // Import directive exposes the PI macro import "legacymath.h"; export double GetCircumference(double radius) { // Macros from the import directive are usable return 2.0 * PI * radius; } /////////// // main.cpp /////////// import math; // OK DebugLog(GetCircumference(10.0)); // Compiler error: macros from import directives are not exported DebugLog(PI);
Notice how the PI
macro is available for use in the header unit that used the import
directive but not in users of that module. This prevents macros from transitively “leaking” throughout an entire program.
Conclusion
C++20’s new module build system is much more analogous to C# than its own legacy header files and #include
. In C++ terms, C# mixes namespaces and modules together somewhat. We write the name of a namespace (using Math;
) in order to gain access to its contents. C++ separates these two features. We can write import math;
without math
being a namespace. We can layer namespaces on top of modules and even export
them.
C# provides support for splitting code across multiple files by adding one member of a namespace in each file. The same is possible in C++, but we can also go further by adding multiple members in a single file and splitting the interface from the implementation. Partitions and fragments are flexible tools that allow us to sub-divide large modules across many source files.
As a C++20 feature that was only standardized recently, modules are not commonly used as of this writing. However, they’re destined to eventually become the dominant build system and bring their many improvements over header files to the vast majority of codebases. In the meantime, we have tools such as the new import "header.h"
directive and access to the global module to ease the transition. New code using modules can use these tools to package legacy code into modules, just as if it was written that way from the start. Old code can simply continue to use the header files.
#1 by Jake on January 12th, 2021 ·
C# places no restrictions on the number of types that can be defined at the top level (outermost scope).
#2 by jackson on January 12th, 2021 ·
Thanks for pointing this out. I’ve removed that statement from the article.