C++ For C# Developers: Part 32 – Thread-Local Storage and Volatile
There is language-level support in C# for per-thread storage of variables. The same goes for the volatile
keyword. C++ also supports per-thread variables, but with per-thread initialization and de-initialization. It has a volatile
keyword too, but it’s meaning is quite different from C#. Read on to learn how to properly use these features in each language.
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrap-up
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Thread-Local Storage
Thread-Local Storage is a way of storing one variable per thread. Both C# and C++ have support for this. In C#, we add the [ThreadStatic]
attribute to a static
field. A common bug results from the field’s initializer being run only once, like other static
fields, not once per thread.
// C# public class Counter { // One int stored per thread // Initialized once, not one per thread [ThreadStatic] public static int Value = 1; } Action a = () => DebugLog(Counter.Value); Thread t1 = new Thread(new ThreadStart(a)); Thread t2 = new Thread(new ThreadStart(a)); t1.Start(); t2.Start(); t1.Join(); t2.Join(); // First thread runs and the first use of Counter initializes Value to 1 // Second thread runs and doesn't initialize Value. Uses the default of 0. // Output: 1 then 0
C++ uses the keyword thread_local
instead of an attribute. This keyword can be applied to static data members like in C#. It can also be applied to variables at global scope, namespace scope, or any level of block scope:
// Global variable thread_local int global = 1; namespace Counters { // Namespace variable thread_local int ns = 1; } struct Counter { // Static data member // Inline initialization isn't allowed for non-const static data members static thread_local int member; }; // Initialization outside the class is OK thread_local int Counter::member = 1; void Foo() { // Local variable thread_local int local = 1; { // Variable in any nested block thread_local int block = 1; } }
Additionally, thread_local
variables can be marked static
or extern
to control linkage:
// Globals can be static or extern static thread_local int global1 = 1; extern thread_local int global2 = 2; namespace Counters { // Namespace variables can be static or extern static thread_local int ns1 = 1; extern thread_local int ns2 = 2; } void Foo() { // Local variables can be static, but not extern static thread_local int local = 1; { // Nested block variables can be static, but not extern static thread_local int block = 1; } }
Note that static
doesn’t affect their storage duration. All thread_local
variables are allocated and initialized when the thread begins. The exact order of initialization isn’t specified, so it shouldn’t be relied on. This is a change from C# where the intialization doesn’t occur per-thread at all.
// Initialized for each thread, not just once as in C# static thread_local int counter = 1; auto a = []{ DebugLog(counter); }; std::thread t1{a}; std::thread t2{a}; t1.join(); t2.join(); // First thread runs and the initializes counter to 1 // Second thread runs and the initializes counter to 1 // Output: 1 then 1
If initialization throws an exception, std::terminate
is called to shut down the program.
struct Throws { Throws() { throw 123; } }; // Initializing throws an exception which calls std::terminate static thread_local Throws t{};
Thread-local variables are deallocated and de-initialized when the thread ends:
struct LogLifecycle { int Value = 1; LogLifecycle() { DebugLog("ctor"); } ~LogLifecycle() { DebugLog("dtor"); } }; thread_local LogLifecycle x{}; auto a = []{ DebugLog(x.Value); }; std::thread t1{a}; std::thread t2{a}; t1.join(); t2.join(); // Possible annotated output, depending on thread execution order: // ctor // first thread initializes x // ctor // second thread initializes x // 1 // first thread prints x.Value // dtor // first thread de-initializes x // 1 // second thread prints x.Value // dtor // second thread de-initializes x
Any such per-thread initialization and de-initialization needs to be implemented manually in C#.
Volatile
C# and C++ both have a volatile
keyword, but the meaning is different between the languages. In C#, volatile
is intended to be used for thread synchronization. It guarantees atomic reads and writes to volatile
variables, meaning they can’t be interrupted by other threads. In order to guarantee atomicity, only certain types can be volatile
in C#:
- Reference types such as class instances
- Generic type parameters that are reference types such as class instances
- Pointers
sbyte
,byte
,short
,ushort
,int
,uint
,char
,float
, andbool
- Enums based on
byte
,sbyte
,short
,ushort
,int
, anduint
IntPtr
andUIntPtr
All other types, including double
, long
, and all structs, can’t be volatile
:
// C# public class Name { public string First; public string Last; } public struct IntWrapper { public int Value; } public enum IntEnum : int { } public enum LongEnum : long { } unsafe public class Volatiles<T> where T : class { // OK: reference type volatile Name RefType; // OK: type parameter known to be a reference type due to where constraint volatile T TypeParam; // OK: pointer volatile int* Pointer; // OK: permitted primitive type volatile int GoodPrimitive; // Compiler error: denied primitive type volatile long BadPrimitive; // OK: enum based on permitted primitive type volatile IntEnum GoodEnum; // Compiler error: enum based on denied primitive type volatile LongEnum BadEnum; // Compiler error: structs can't be volatile // No exception for structs that only have one field that can be volatile volatile IntWrapper Struct; // OK: Special-case for IntPtr and UIntPtr structs volatile IntPtr SpecialPtr1; volatile UIntPtr SpecialPtr2; }
The only variables that can be volatile
in C# are fields of classes and structs. Local variables and parameters can’t be volatile
.
C# also implicitly adds memory fences to disable instruction reordering and data caching that might be performed by CPUs that execute “out of order.” An “acquire fence” is inserted for every read of the volatile
variable and a “release fence” is inserted for every write:
// C# public struct Counter { public volatile int Value; public void Increment() { // Reads get an implicit acquire fence int cur = this.Value; // acquire-fenced int next = cur + 1; // Writes get an implicit release fence this.Value = next; // release-fenced } }
C++, on the other hand, implements volatile
differently. The keyword is the same, but it’s not meant to be used for thread synchronization. Instead, it’s meant to implement memory-mapped hardware access:
// The hardware device reports its status with a 32-bit integer enum class DeviceStatus : int32_t { OK = 0, Stuck = 1, Fault = 2, }; // "Pointer to a volatile DeviceStatus" // It's stored at a fixed location: the memory-mapped address volatile DeviceStatus* pDeviceStatus = (volatile DeviceStatus*)100; while (true) { // Read and print the device status DebugLog("Device status:", *pDeviceStatus); // Wait for one second std::this_thread::sleep_for(std::chrono::seconds{1}); }
The volatile
keyword is applied here to the DeviceStatus
that pDeviceStatus
points to. This tells the compiler that it cannot assume that it has full visibility into the readers and writers of that 32-bit integer. It has to assume that it may be accessed externally, such as when a device driver writes the device’s status to memory address 100
.
As a consequence, the compiler isn’t allowed to “optimize” our loop like this:
// Only read the pointer once // Store it as a local variable, likely backed by a register DeviceStatus status = *pDeviceStatus; while (true) { // Print the device status from the local variable // No chance of a cache miss! DebugLog("Device status:", status); // Wait for one second std::this_thread::sleep_for(std::chrono::seconds{1}); }
The above “optimization” makes the code faster because there’s no chance of a cache miss when reading through the pDeviceStatus
pointer. The status is just read once and stored in a CPU register, which is essentially free to read from. The compiler can’t see the kernel driver’s writes to memory address 100
, so it can assume this is a safe optimization.
The only problem is that the device status that we log can no longer change. By marking the value that pDeviceStatus
points to as volatile
, the compiler is prohibited from making this optimization. It has to assume that there’s an external writer that might change the device status.
Another effect of volatile
is that the compiler isn’t allowed to reorder reads and writes to volatile
variables with respect to other volatile
variables:
// Status from the device enum class DeviceStatus : int32_t { OK = 0, Stuck = 1, Fault = 2, CommandAccepted = 3, CommandRejected = 4, }; // Commands to the device enum class DeviceCommand : int32_t { Retry = 1, }; // Memory-mapped device I/O volatile DeviceStatus* pDeviceStatus = (volatile DeviceStatus*)100; volatile DeviceCommand* pDeviceCommand = (volatile DeviceCommand*)200; while (true) { if (*pDeviceStatus == DeviceStatus::Stuck) // read { *pDeviceCommand = DeviceCommand::Retry; // write while (*pDeviceStatus != DeviceStatus::CommandAccepted) // read { } if (*pDeviceStatus == DeviceStatus::CommandRejected || // read *pDeviceStatus == DeviceStatus::Stuck) // read { throw std::runtime_error{"Failed to get device un-stuck"}; } } // Wait for one second std::this_thread::sleep_for(std::chrono::seconds{1}); }
Without volatile
, the compiler would be free to reorder these reads and writes so long as it obeys the “as-if” rule where the code works “as if” the compiler hadn’t done the reordering. Here’s how that might look:
// Memory-mapped device I/O without volatile DeviceStatus* pDeviceStatus = (DeviceStatus*)100; DeviceCommand* pDeviceCommand = (DeviceCommand*)200; while (true) { if (*pDeviceStatus == DeviceStatus::Stuck) { // Read status first DeviceStatus status = *pDeviceStatus; // Write command second *pDeviceCommand = DeviceCommand::Retry; // Check status while (status != DeviceStatus::CommandAccepted) { status = *pDeviceStatus; } if (*pDeviceStatus == DeviceStatus::CommandRejected || // read *pDeviceStatus == DeviceStatus::Stuck) // read { throw std::runtime_error{"Failed to get device un-stuck"}; } } // Wait for one second std::this_thread::sleep_for(std::chrono::seconds{1}); }
In this non-volatile
version, the compiler has decided that we should read the status before we write the command. This might cause us to read and old CommandRejected
status for a prior command and then throw an exception even when our Retry
command was accepted. By applying the volatile
keyword, we disable such reordering and guarantee that our volatile
reads and writes occur in the order they’re written in.
So far we haven’t seen any guarantees from C++ that volatile
reads and writes are atomic or fenced, as they are in C#. That’s because this is simply not the case in C++. This is a critical difference that has implications for how they’re used in situations such as multi-threading and for their performance.
Due to this lack of an atomicity guarantee, any type may be volatile
in C++. There’s no need to prohibit structs, double
, and long
just because accessing them might not be atomic. As we’ve already seen, pointers (and references) to volatile
variables can also be taken:
struct Vector3d { double X; double Y; double Z; }; volatile Vector3d V{2, 4, 6}; // Struct volatile uint64_t L; // Long volatile double D; // Double volatile int A[1000]; // Array
Additionally, any variable can be volatile
in C++. We’re not limited to just data members. We can make local variables, nested block variables, parameters, globals, and namespace members volatile
:
volatile int global; namespace Volatiles { volatile int ns; } void Foo(volatile int param) { volatile int local; { volatile int block; } }
The volatile
keyword is a “type qualifier” like const
. The shorthands “cv” and “cv-qualified” are commonly used to talk about these two qualifiers. Like const
, a non-volatile
type may be implicitly treated as a volatile
type but not the other way around. The same goes for non-volatile
const
types being treated as const
and volatile
types:
int nc_nv = 100; const int c_nv = 200; volatile int nc_v = 300; const volatile int c_v = 400; { int& i1 = nc_nv; // OK int& i2 = c_nv; // Compiler error: removes const int& i3 = nc_v; // Compiler error: removes volatile int& i4 = c_v; // Compiler error: removes const and volatile } { const int& i1 = nc_nv; // OK const int& i2 = c_nv; // OK const int& i3 = nc_v; // Compiler error: removes volatile const int& i4 = c_v; // Compiler error: removes volatile } { volatile int& i1 = nc_nv; // OK volatile int& i2 = c_nv; // Compiler error: removes const volatile int& i3 = nc_v; // OK volatile int& i4 = c_v; // Compiler error: removes const } { const volatile int& i1 = nc_nv; // OK const volatile int& i2 = c_nv; // OK const volatile int& i3 = nc_v; // OK const volatile int& i4 = c_v; // OK }
The general rule here is that we can treat variables as “more const
” or “more volatile
” but not “less const
” or “less volatile
” since this would remove important restrictions.
Note that the mutable
keyword we apply to data members is not a type qualifier like const
. It is instead a “storage-class-specifier” like static
, extern
, or register
, and thread_local
that only applies to data members. That’s why we can’t declare a local or global variable with type mutable int
like we can with const int
.
Conclusion
Both languages have thread-local storage and a volatile
keyword, but they have significant differences. Thread-local storage in C++ can be applied to more kinds of variables, such as locals and globals. It also guarantees per-thread initialization where C# only initializes once ever. It also features de-initialization when the thread terminates. C# code needs to manually implement both per-thread initialization and per-thread de-initialization.
As for the volatile
keyword, it’s intended usage and implementation varies significantly between C# and C++. In C#, we get guaranteed atomic accesses and memory fences which is great for synchronizing multiple threads. In C++, we just disable some compiler optimizations that would get in the way of memory-mapped I/O. Thread synchronization is usually solved with other tools, such as mutexes and the Standard Library’s std::atomic
class template. Due to the identical naming of the keyword in both languages, many programmers assume identical functionality. It’s important to know that this is not the case and to use the keyword appropriately in each language.
#1 by radwan on September 6th, 2022 ·
Hey!
I think there are some misconceptions about C#’s volatile keyword.
First of all, it doesn’t guarantee atomic read or writes. All types that are allowed to be volatile already have atomic read and writes since they fit in 32 bits. The very reason why volatile cannot be used on longs and doubles is that they have no atomic read/writes guarantee.
Secondly, fences you’re discussing are nothing but acquire/release semantics that prevent read and writes reordering. The only difference between C# and C++ I see here is the lack of consensus between C++ compilers on how acquire/release should be treated in regard to non-volatile variables.
Then you also mention that in C++ volatile prevents compiler from optimizing memory access to variables that could be modified externally. C#’s volatile does exactly the same. This code will run endlessly without volatile on the boolean (due to compiler’s optimization; might need release mode):
Just like in C++, C# also uses atomicity (Interlocked class) and mutexes (locks) for thread synchronization. Volatile is definitely not a thread synchronization tool!
To me, it looks like C# and C++ volatile are not that different.