JacksonDunstan.com

There is language-level support in C# for per-thread storage of variables. The same goes for the volatile keyword. C++ also supports per-thread variables, but with per-thread initialization and de-initialization. It has a volatile keyword too, but it’s meaning is quite different from C#. Read on to learn how to properly use these features in each language.

Table of Contents

Thread-Local Storage

Thread-Local Storage is a way of storing one variable per thread. Both C# and C++ have support for this. In C#, we add the [ThreadStatic] attribute to a static field. A common bug results from the field’s initializer being run only once, like other static fields, not once per thread.

// C#
public class Counter
{
    // One int stored per thread
    // Initialized once, not one per thread
    [ThreadStatic] public static int Value = 1;
}
 
Action a = () => DebugLog(Counter.Value);
Thread t1 = new Thread(new ThreadStart(a));
Thread t2 = new Thread(new ThreadStart(a));
t1.Start();
t2.Start();
t1.Join();
t2.Join();
 
// First thread runs and the first use of Counter initializes Value to 1
// Second thread runs and doesn't initialize Value. Uses the default of 0.
// Output: 1 then 0

C++ uses the keyword thread_local instead of an attribute. This keyword can be applied to static data members like in C#. It can also be applied to variables at global scope, namespace scope, or any level of block scope:

// Global variable
thread_local int global = 1;
 
namespace Counters
{
    // Namespace variable
    thread_local int ns = 1;
}
 
struct Counter
{
    // Static data member
    // Inline initialization isn't allowed for non-const static data members
    static thread_local int member;
};
// Initialization outside the class is OK
thread_local int Counter::member = 1;
 
void Foo()
{
    // Local variable
    thread_local int local = 1;
 
    {
        // Variable in any nested block
        thread_local int block = 1;
    }
}

Additionally, thread_local variables can be marked static or extern to control linkage:

// Globals can be static or extern
static thread_local int global1 = 1;
extern thread_local int global2 = 2;
 
namespace Counters
{
    // Namespace variables can be static or extern
    static thread_local int ns1 = 1;
    extern thread_local int ns2 = 2;
}
 
void Foo()
{
    // Local variables can be static, but not extern
    static thread_local int local = 1;
 
    {
        // Nested block variables can be static, but not extern
        static thread_local int block = 1;
    }
}

Note that static doesn’t affect their storage duration. All thread_local variables are allocated and initialized when the thread begins. The exact order of initialization isn’t specified, so it shouldn’t be relied on. This is a change from C# where the intialization doesn’t occur per-thread at all.

// Initialized for each thread, not just once as in C#
static thread_local int counter = 1;
 
auto a = []{ DebugLog(counter); };
std::thread t1{a};
std::thread t2{a};
t1.join();
t2.join();
 
// First thread runs and the initializes counter to 1
// Second thread runs and the initializes counter to 1
// Output: 1 then 1

If initialization throws an exception, std::terminate is called to shut down the program.

struct Throws
{
    Throws()
    {
        throw 123;
    }
};
 
// Initializing throws an exception which calls std::terminate
static thread_local Throws t{};

Thread-local variables are deallocated and de-initialized when the thread ends:

struct LogLifecycle
{
    int Value = 1;
 
    LogLifecycle()
    {
        DebugLog("ctor");
    }
 
    ~LogLifecycle()
    {
        DebugLog("dtor");
    }
};
 
thread_local LogLifecycle x{};
 
auto a = []{ DebugLog(x.Value); };
std::thread t1{a};
std::thread t2{a};
t1.join();
t2.join();
 
// Possible annotated output, depending on thread execution order:
//   ctor     // first thread initializes x
//   ctor     // second thread initializes x
//   1        // first thread prints x.Value
//   dtor     // first thread de-initializes x
//   1        // second thread prints x.Value
//   dtor     // second thread de-initializes x

Any such per-thread initialization and de-initialization needs to be implemented manually in C#.

Volatile

C# and C++ both have a volatile keyword, but the meaning is different between the languages. In C#, volatile is intended to be used for thread synchronization. It guarantees atomic reads and writes to volatile variables, meaning they can’t be interrupted by other threads. In order to guarantee atomicity, only certain types can be volatile in C#:

Reference types such as class instances
Generic type parameters that are reference types such as class instances
Pointers
sbyte, byte, short, ushort, int, uint, char, float, and bool
Enums based on byte, sbyte, short, ushort, int, and uint
IntPtr and UIntPtr

All other types, including double, long, and all structs, can’t be volatile:

// C#
public class Name
{
    public string First;
    public string Last;
}
 
public struct IntWrapper
{
    public int Value;
}
 
public enum IntEnum : int
{
}
 
public enum LongEnum : long
{
}
 
unsafe public class Volatiles<T>
    where T : class
{
    // OK: reference type
    volatile Name RefType;
 
    // OK: type parameter known to be a reference type due to where constraint
    volatile T TypeParam;
 
    // OK: pointer
    volatile int* Pointer;
 
    // OK: permitted primitive type
    volatile int GoodPrimitive;
 
    // Compiler error: denied primitive type
    volatile long BadPrimitive;
 
    // OK: enum based on permitted primitive type
    volatile IntEnum GoodEnum;
 
    // Compiler error: enum based on denied primitive type
    volatile LongEnum BadEnum;
 
    // Compiler error: structs can't be volatile
    // No exception for structs that only have one field that can be volatile
    volatile IntWrapper Struct;
 
    // OK: Special-case for IntPtr and UIntPtr structs
    volatile IntPtr SpecialPtr1;
    volatile UIntPtr SpecialPtr2;
}

The only variables that can be volatile in C# are fields of classes and structs. Local variables and parameters can’t be volatile.

C# also implicitly adds memory fences to disable instruction reordering and data caching that might be performed by CPUs that execute “out of order.” An “acquire fence” is inserted for every read of the volatile variable and a “release fence” is inserted for every write:

// C#
public struct Counter
{
    public volatile int Value;
 
    public void Increment()
    {
        // Reads get an implicit acquire fence
        int cur = this.Value; // acquire-fenced
 
        int next = cur + 1;
 
        // Writes get an implicit release fence
        this.Value = next; // release-fenced
    }
}

C++, on the other hand, implements volatile differently. The keyword is the same, but it’s not meant to be used for thread synchronization. Instead, it’s meant to implement memory-mapped hardware access:

// The hardware device reports its status with a 32-bit integer
enum class DeviceStatus : int32_t
{
    OK = 0,
    Stuck = 1,
    Fault = 2,
};
 
// "Pointer to a volatile DeviceStatus"
// It's stored at a fixed location: the memory-mapped address
volatile DeviceStatus* pDeviceStatus = (volatile DeviceStatus*)100;
 
while (true)
{
    // Read and print the device status
    DebugLog("Device status:", *pDeviceStatus);
 
    // Wait for one second
    std::this_thread::sleep_for(std::chrono::seconds{1});
}

The volatile keyword is applied here to the DeviceStatus that pDeviceStatus points to. This tells the compiler that it cannot assume that it has full visibility into the readers and writers of that 32-bit integer. It has to assume that it may be accessed externally, such as when a device driver writes the device’s status to memory address 100.

As a consequence, the compiler isn’t allowed to “optimize” our loop like this:

// Only read the pointer once
// Store it as a local variable, likely backed by a register
DeviceStatus status = *pDeviceStatus;
 
while (true)
{
    // Print the device status from the local variable
    // No chance of a cache miss!
    DebugLog("Device status:", status);
 
    // Wait for one second
    std::this_thread::sleep_for(std::chrono::seconds{1});
}

The above “optimization” makes the code faster because there’s no chance of a cache miss when reading through the pDeviceStatus pointer. The status is just read once and stored in a CPU register, which is essentially free to read from. The compiler can’t see the kernel driver’s writes to memory address 100, so it can assume this is a safe optimization.

The only problem is that the device status that we log can no longer change. By marking the value that pDeviceStatus points to as volatile, the compiler is prohibited from making this optimization. It has to assume that there’s an external writer that might change the device status.

Another effect of volatile is that the compiler isn’t allowed to reorder reads and writes to volatile variables with respect to other volatile variables:

// Status from the device
enum class DeviceStatus : int32_t
{
    OK = 0,
    Stuck = 1,
    Fault = 2,
    CommandAccepted = 3,
    CommandRejected = 4,
};
 
// Commands to the device
enum class DeviceCommand : int32_t
{
    Retry = 1,
};
 
// Memory-mapped device I/O
volatile DeviceStatus* pDeviceStatus = (volatile DeviceStatus*)100;
volatile DeviceCommand* pDeviceCommand = (volatile DeviceCommand*)200;
 
while (true)
{
    if (*pDeviceStatus == DeviceStatus::Stuck) // read
    {
        *pDeviceCommand = DeviceCommand::Retry; // write
        while (*pDeviceStatus != DeviceStatus::CommandAccepted) // read
        {
        }
        if (*pDeviceStatus == DeviceStatus::CommandRejected || // read
            *pDeviceStatus == DeviceStatus::Stuck) // read
        {
            throw std::runtime_error{"Failed to get device un-stuck"};
        }
    }
 
    // Wait for one second
    std::this_thread::sleep_for(std::chrono::seconds{1});
}

Without volatile, the compiler would be free to reorder these reads and writes so long as it obeys the “as-if” rule where the code works “as if” the compiler hadn’t done the reordering. Here’s how that might look:

// Memory-mapped device I/O without volatile
DeviceStatus* pDeviceStatus = (DeviceStatus*)100;
DeviceCommand* pDeviceCommand = (DeviceCommand*)200;
 
while (true)
{
    if (*pDeviceStatus == DeviceStatus::Stuck)
    {
        // Read status first
        DeviceStatus status = *pDeviceStatus;
 
        // Write command second
        *pDeviceCommand = DeviceCommand::Retry;
 
        // Check status
        while (status != DeviceStatus::CommandAccepted)
        {
            status = *pDeviceStatus;
        }
        if (*pDeviceStatus == DeviceStatus::CommandRejected || // read
            *pDeviceStatus == DeviceStatus::Stuck) // read
        {
            throw std::runtime_error{"Failed to get device un-stuck"};
        }
    }
 
    // Wait for one second
    std::this_thread::sleep_for(std::chrono::seconds{1});
}

In this non-volatile version, the compiler has decided that we should read the status before we write the command. This might cause us to read and old CommandRejected status for a prior command and then throw an exception even when our Retry command was accepted. By applying the volatile keyword, we disable such reordering and guarantee that our volatile reads and writes occur in the order they’re written in.

So far we haven’t seen any guarantees from C++ that volatile reads and writes are atomic or fenced, as they are in C#. That’s because this is simply not the case in C++. This is a critical difference that has implications for how they’re used in situations such as multi-threading and for their performance.

Due to this lack of an atomicity guarantee, any type may be volatile in C++. There’s no need to prohibit structs, double, and long just because accessing them might not be atomic. As we’ve already seen, pointers (and references) to volatile variables can also be taken:

struct Vector3d
{
    double X;
    double Y;
    double Z;
};
 
volatile Vector3d V{2, 4, 6}; // Struct
volatile uint64_t L; // Long
volatile double D; // Double
volatile int A[1000]; // Array

Additionally, any variable can be volatile in C++. We’re not limited to just data members. We can make local variables, nested block variables, parameters, globals, and namespace members volatile:

volatile int global;
 
namespace Volatiles
{
    volatile int ns;
}
 
void Foo(volatile int param)
{
    volatile int local;
 
    {
        volatile int block;
    }
}

The volatile keyword is a “type qualifier” like const. The shorthands “cv” and “cv-qualified” are commonly used to talk about these two qualifiers. Like const, a non-volatile type may be implicitly treated as a volatile type but not the other way around. The same goes for non-volatile const types being treated as const and volatile types:

int nc_nv = 100;
const int c_nv = 200;
volatile int nc_v = 300;
const volatile int c_v = 400;
 
{
    int& i1 = nc_nv; // OK
    int& i2 = c_nv; // Compiler error: removes const
    int& i3 = nc_v; // Compiler error: removes volatile
    int& i4 = c_v; // Compiler error: removes const and volatile
}
 
{
    const int& i1 = nc_nv; // OK
    const int& i2 = c_nv; // OK
    const int& i3 = nc_v; // Compiler error: removes volatile
    const int& i4 = c_v; // Compiler error: removes volatile
}
 
{
    volatile int& i1 = nc_nv; // OK
    volatile int& i2 = c_nv; // Compiler error: removes const
    volatile int& i3 = nc_v; // OK
    volatile int& i4 = c_v; // Compiler error: removes const
}
 
{
    const volatile int& i1 = nc_nv; // OK
    const volatile int& i2 = c_nv; // OK
    const volatile int& i3 = nc_v; // OK
    const volatile int& i4 = c_v; // OK
}

The general rule here is that we can treat variables as “more const” or “more volatile” but not “less const” or “less volatile” since this would remove important restrictions.

Note that the mutable keyword we apply to data members is not a type qualifier like const. It is instead a “storage-class-specifier” like static, extern, or register, and thread_local that only applies to data members. That’s why we can’t declare a local or global variable with type mutable int like we can with const int.

Conclusion

Both languages have thread-local storage and a volatile keyword, but they have significant differences. Thread-local storage in C++ can be applied to more kinds of variables, such as locals and globals. It also guarantees per-thread initialization where C# only initializes once ever. It also features de-initialization when the thread terminates. C# code needs to manually implement both per-thread initialization and per-thread de-initialization.

As for the volatile keyword, it’s intended usage and implementation varies significantly between C# and C++. In C#, we get guaranteed atomic accesses and memory fences which is great for synchronizing multiple threads. In C++, we just disable some compiler optimizations that would get in the way of memory-mapped I/O. Thread synchronization is usually solved with other tools, such as mutexes and the Standard Library’s std::atomic class template. Due to the identical naming of the keyword in both languages, many programmers assume identical functionality. It’s important to know that this is not the case and to use the keyword appropriately in each language.

#1 by radwan on September 6th, 2022 · Reply

Hey!

I think there are some misconceptions about C#’s volatile keyword.

First of all, it doesn’t guarantee atomic read or writes. All types that are allowed to be volatile already have atomic read and writes since they fit in 32 bits. The very reason why volatile cannot be used on longs and doubles is that they have no atomic read/writes guarantee.

Secondly, fences you’re discussing are nothing but acquire/release semantics that prevent read and writes reordering. The only difference between C# and C++ I see here is the lack of consensus between C++ compilers on how acquire/release should be treated in regard to non-volatile variables.

Then you also mention that in C++ volatile prevents compiler from optimizing memory access to variables that could be modified externally. C#’s volatile does exactly the same. This code will run endlessly without volatile on the boolean (due to compiler’s optimization; might need release mode):

static void Run()
{
    bool stop = false;
    Task.Factory.StartNew( () => { Thread.Sleep( 1000 ); stop = true; } );
    while ( !stop ) ;
}

Just like in C++, C# also uses atomicity (Interlocked class) and mutexes (locks) for thread synchronization. Volatile is definitely not a thread synchronization tool!

To me, it looks like C# and C++ volatile are not that different.

#2 by Lee on February 10th, 2025 · Reply

Hey, great series. I’m learning a lot.

I’ll just back up what radwan highlighted above. Reads/writes to the mentioned types are guaranteed to be atomic in c#; volatile is not for this purpose nor for thread synchronisation in the way suggested in the article.

C++ For C# Developers: Part 32 – Thread-Local Storage and Volatile

Thread-Local Storage

Volatile

Conclusion

Comments