C++ For C# Developers: Part 7 – Pointers, Arrays, and Strings
Today we’ll continue the series with a look into pointers and, very differently from C#, the related concepts of arrays and strings. We’ll cover some interesting C++-only features, such as function pointers along the way.
Table of Contents
- Part 1: Introduction
- Part 2: Primitive Types and Literals
- Part 3: Variables and Initialization
- Part 4: Functions
- Part 5: Build Model
- Part 6: Control Flow
- Part 7: Pointers, Arrays, and Strings
- Part 8: References
- Part 9: Enumerations
- Part 10: Struct Basics
- Part 11: Struct Functions
- Part 12: Constructors and Destructors
- Part 13: Initialization
- Part 14: Inheritance
- Part 15: Struct and Class Permissions
- Part 16: Struct and Class Wrapup
- Part 17: Namespaces
- Part 18: Exceptions
- Part 19: Dynamic Allocation
- Part 20: Implicit Type Conversion
- Part 21: Casting and RTTI
- Part 22: Lambdas
- Part 23: Compile-Time Programming
- Part 24: Preprocessor
- Part 25: Intro to Templates
- Part 26: Template Parameters
- Part 27: Template Deduction and Specialization
- Part 28: Variadic Templates
- Part 29: Template Constraints
- Part 30: Type Aliases
- Part 31: Deconstructing and Attributes
- Part 32: Thread-Local Storage and Volatile
- Part 33: Alignment, Assembly, and Language Linkage
- Part 34: Fold Expressions and Elaborated Type Specifiers
- Part 35: Modules, The New Build Model
- Part 36: Coroutines
- Part 37: Missing Language Features
- Part 38: C Standard Library
- Part 39: Language Support Library
- Part 40: Utilities Library
- Part 41: System Integration Library
- Part 42: Numbers Library
- Part 43: Threading Library
- Part 44: Strings Library
- Part 45: Array Containers Library
- Part 46: Other Containers Library
- Part 47: Containers Library Wrapup
- Part 48: Algorithms Library
- Part 49: Ranges and Parallel Algorithms
- Part 50: I/O Library
- Part 51: Missing Library Features
- Part 52: Idioms and Best Practices
- Part 53: Conclusion
Pointers
C# pointers are allowed as long as we configure the compiler to enable “unsafe” code. We then need to only use pointers within an unsafe context, such as an unsafe
method, unsafe
class, or unsafe
block within a function.
C++ has no concept of “safe” or “unsafe” code. There’s no such thing as an “unsafe” context, a “safe” context, or a compiler option to enable “unsafe” code. Pointers are allowed everywhere and are commonly used in many codebases. It turns out that their syntax works very similarly to the C# pointer syntax:
int x = 123; // Declare a pointer type: int* is a "pointer to an int" // Get the address of x with &x int* p = &x; // Dereference the pointer to get its value DebugLog(*p); // 123 // Dereference and assign the pointer to set its value *p = 456; DebugLog(x); // 456 // x->y is a convenient shorthand for (*x).y Player* p = &localPlayer; p->Health = 100;
Multiple levels of indirection are also supported by adding more *
characters to the type:
int x = 123; int* p = &x; int** pp = &p; DebugLog(**pp); // 123 **pp = 456; DebugLog(x); // 456 int y = 1000; *pp = &y; **pp = 2000; DebugLog(x); // 456 DebugLog(y); // 2000
We also have void*
, which is a pointer to any type. A cast is required to dereference a void*
since the compiler has no idea what type it should do the read or write on. As in C#, such a cast is not checked at runtime to ensure that the pointer really points to the type being cast to.
int x = 123; // &x is an int*, but void* is compatible with all pointer types void* pVoid = &x; // Cast back to int* so we can dereference int* pInt = (int*)pVoid; DebugLog(*pInt); // 123 // Cast to float* so we can treat the memory as though it held another type float* pFloat = (float*)pVoid; *pFloat = 3.14f; DebugLog(x); // 1078523331
The last line could be considered data corruption of an int
since 3.14f
is not a valid int
, but it’s a valid way to get the bits of a float
. This is part of the reason that these casts are unchecked.
Note that this is called “type punning” and it is technically undefined behavior, meaning the compiler might generate arbitrary machine code for this C++. At least in this simple case though, all compilers will generate the machine code that we’d expect so that we’re simply treating the same memory as though it were a different type.
As in C#, pointers may be null. There are three main ways this is written in C++:
// nullptr is compatible with all pointer types, but not integer arithmetic // This is generally the preferred way since C++11 int* p1 = nullptr; // NULL is commonly defined to be zero, but works with integer arithmetic int* p2 = NULL; // The zero integer int* p3 = 0;
Arrays
It may seem strange to see arrays lumped into the same article as pointers, but they’re very similar in C++. Unlike in C#, arrays are not an object that’s “managed” and subject to garbage collection. They are instead simply a fixed-size contiguous allocation of the same type of data:
// Declare an array of 3 int elements // The elements of the array are uninitialized int a[3]; // Initialize the first element of the array by writing to it a[0] = 123; // Read the first element of the array DebugLog(a[0]); // 123
When we create an array variable, it’s just like we individually created its elements via variables:
int a0; int a1; int a2;
This means that there is no overhead for an array. It is literally just its elements. It doesn’t even have an integer keeping track of its length like the Length
field in C#. This means that the C# stackalloc
keyword is unnecessary as C++ arrays are already allocated on the stack when declared as local variables. Likewise, the fixed
keyword to create a fixed-size buffer as a struct or class field is unnecessary as a C++ array’s elements are already stored inside the struct or class.
There is also no bounds-checking on indexes into the array, just like indexing into a pointer in C# or C++. It’s very important to be careful not to read beyond the beginning or end of the array as there’s usually no way to know what data will be read or overwritten.
The lines blur even more because we can implicitly convert arrays into pointers:
int a[3]; a[0] = 123; // Implicitly convert the int[3] array to an int* // We get a pointer to the first element int* p = a; DebugLog(*p); // 123 // Indexing into pointers works just like in C# DebugLog(p[0]); // 123
The opposite does not work though: we can’t write int b[3] = p
.
Short arrays are commonly initialized with curly braces:
int a[3] = { 123, 456, 789 }; DebugLog(a[0], a[1], a[2]); // 123, 456, 789
If we specify more elements than will fit in the array’s size, we get a compiler error:
int a[3] = { 123, 456, 789, 1000 }; // compiler error
If we specify fewer elements, only the ones we specify will be initialized. Note that a trailing comma is allowed:
int a[3] = { 123, 456, }; DebugLog(a[0], a[1]); // 123, 456 DebugLog(a[2]); // Uninitialized. Could be anything!
It’s common to omit the array size when using curly braces to initialize the array. This tells the compiler to count the number of elements in the curly braces and make the array that long.
int a[] = { 123, 456, 789 }; // The a array has 3 elements DebugLog(a[0], a[1], a[2]); // 123, 456, 789
Finally, we have multi-dimensional arrays. These are arrays of arrays, both with fixed lengths. This means they are never “jagged” but always “rectangular.” Just as with one-dimensional arrays, we end up with a contiguous sequence of contiguous sequences of the same type of data. There’s still no overhead:
int a[2][3] = {{1, 2, 3}, {4, 5, 6}}; DebugLog(a[0][0], a[0][1], a[0][2]); // 1, 2, 3 DebugLog(a[1][0], a[1][1], a[1][2]); // 4, 5, 6
These are implicitly converted into a pointer to the first dimension of the array:
int a[2][3] = {{1, 2, 3}, {4, 5, 6}}; // Implicitly convert to a pointer to an array of 3 int // Read the type name as "p is a pointer to an array of 3 int elements" int (*p)[3] = a; // Dereference that pointer to get a pointer to the first element int* pp = *p; for (int i = 0; i < 6; ++i) { DebugLog(pp[i]); // 1, 2, 3, 4, 5, 6 }
Indexing into a multi-dimensional array with fewer subscripts than its dimensions just yields the remaining dimensions of the array. We can capture this in a pointer using the same implicit conversion:
int a[2][3] = {{1, 2, 3}, {4, 5, 6}}; int* firstRow = a[0]; // Index 1 of 2 dimensions to get the second dimension as a pointer DebugLog(firstRow[0], firstRow[1], firstRow[2]); // 1, 2, 3
Pointers to Arrays and Arrays of Pointers
Sometimes we want to have a pointer to an array. This is essentially what a C# array is since we only have a reference to it, not its actual contents. Here’s how we’d do that in C++:
int a[] = { 1, 2, 3 }; // Add a * to make this a pointer to an array instead of just an array // This is similar to how int* is a pointer to an int int (*p)[3] = &a; // Dereference the pointer to get the array, which we can index into DebugLog((*p)[0], (*p)[1], (*p)[2]); // 1, 2, 3
Pointers to arrays aren’t supported by C# since pointers can’t point to managed types like arrays.
If we want an array of pointers, just add a *
to the type of the array element:
int x = 1; int y = 2; int z = 3; // Add a * to int to get int*: a pointer to an int int* a[] = { &x, &y, &z }; // Index into the array to get the pointer then dereference it to get the int DebugLog(*a[0], *a[1], *a[2]); // 1, 2, 3
Arrays of pointers are supported by C#, but the array is a managed object that we only have a reference to.
Strings
The difference with strings is similar to that of arrays. In C# we have managed System.String
objects that are garbage-collected. In C++, we essentially have null-terminated arrays of characters:
// The string literal "hello" has type const char[6] // Its contents are the characters 'h', 'e', 'l', 'l', 'o', 0 const char hello[] = "hello"; // Like any other array, it's implicitly converted a pointer const char* p = hello; for (int i = 0; i < 6; ++i) { DebugLog(p[i]); // h, e, l, l, o, <NUL> }
We’ll go into const
more later, but for now it’s just important to know that the characters of the array can’t be changed. For instance, this would produce a compiler error:
p[0] = 'H';
In part 3, we saw that there are various kinds of character literals. The same is true for strings as each corresponds to the type of character elements in its array:
String Type | Syntax | Meaning |
---|---|---|
char[] |
“hello” | ASCII string |
wchar_t[] |
L”hello” | “Wide character” string |
char8_t[] |
u8″hello” | UTF-8 string |
char16_t[] |
u”hello” | UTF-16 string |
char32_t[] |
U”hello” | UTF-32 string |
Regardless of the character type, we can concatenate together string literals just by placing them together. No +
operator is needed, as in C#.
char msg[] = "Hello, " "world!"; DebugLog(msg); // Hello, world!
As long as just one of the string literals has an encoding prefix, the others will get it too:
const char16_t msg[] = "Hello, " u"world!"; DebugLog(msg); // Hello, world!
Support for mixing encoding prefixes varies by compiler.
Raw strings like this are commonly used when literals suffice, such as log message text. When more advanced functionality is desired, and it very commonly is, wrapper classes such as the C++ Standard Library’s string
or Unreal’s FString
are used instead. We’ll go into string
later in the series.
Pointer Arithmetic
Like in C#, arithmetic may be performed on pointers:
int a[3] = { 0, 0, 0 }; int* p = a; // Make p point to the first element of a *p = 1; p += 2; // Make p point to the third element of a *p = 3; --p; // Make p point to the second element of a *p = 2; DebugLog(a[0], a[1], a[2]); // 1, 2, 3
Pointers may also be compared:
int a[3] = { 0, 0, 0 }; int* theStart = a; int* theEnd = theStart + 3; while (theStart < theEnd) // Compare pointers { *theStart = 1; theStart++; } DebugLog(a[0], a[1], a[2]); // 1, 1, 1
Recall from part six that this satisfies the criteria for a range-based for
loop:
int a[3] = { 1, 2, 3 }; for (int val : a) { DebugLog(val); // 1, 2, 3 }
The compiler transforms this into a normal for
loop:
{ int*&& range = a; int* cur = range; int* theEnd = range + 3; for ( ; cur != theEnd; ++cur) { int val = *cur; DebugLog(val); } }
Note that the begin
and end
functions aren’t required in the special case of arrays because the compiler knows the beginning and ending pointers since the size of the array is fixed at compile time.
Function Pointers
Unlike C#, in C++ we are allowed to make pointers to functions:
int GetHealth(Player p) { return p.Health; } // Get a pointer to GetHealth. Syntax in three parts: // 1) Return type: int // 2) Pointer name: (*p) // 3) Parameter types: (Player) int (*p)(Player) = GetHealth; // Calling the function pointer calls the function int health = p(localPlayer); DebugLog(health);
There are two variants of this syntax that make no difference to the functionality:
// Assign the address of the function instead of just its name int (*p)(Player) = &GetHealth; // Dereference the function pointer before calling it int health = (*p)(localPlayer);
Function pointers are commonly used like delegates in C#. They are an object that can be passed around that, when called, invokes a function. They are much more lightweight though as they are just a pointer. Delegates have much more functionality, such as the ability to add, remove, and invoke multiple functions and bind to functions of various types such as instance methods and lambdas. We’ll cover how to do that in C++ later on in the series.
To make an array of function pointers, add the square brackets ([]
) after its name like before:
int GetHealth(Player p) { return p.Health; } int GetLives(Player p) { return p.Lives; } // Array of pointers to functions that take a Player and return an int int (*statFunctions[])(Player) = { GetHealth, GetLives }; // Index into the array like any other array int health = statFunctions[0](localPlayer); DebugLog(health); int lives = statFunctions[1](localPlayer); DebugLog(lives);
Arrays of function pointers are commonly used for jump tables to replace a long chain of conditional logic with a simple index into a simple array indexed read operation.
Conclusion
C++ pointers functionality includes everything C# pointers can do and adds on the ability to create pointers to functions and pointers to any type. Arrays and strings are closely related to pointers, unlike their managed C# counterparts. Combined together, we have much enhanced functionality such as arrays of function pointers to make jump tables, a lightweight replacement for delegates, and an alternative to stackalloc
and fixed
-size buffers that supports any type of elements.
Next week we’ll continue the series with a related topic: references. Like in C#, these are often more commonly used than pointers and take some of the sharp edges off.
#1 by kgame on August 24th, 2020 ·
where is DebugLog function ?
#2 by jackson on August 24th, 2020 ·
DebugLog
isn’t actually written into any of the articles but you can think of it likeDebug.Log
in Unity: a function you can pass anything to and it prints all the arguments. Eventually we’ll get to the point in the series where we can actually implementDebugLog
in a generic way.#3 by Jan Reitz on March 25th, 2021 ·
I think it would help to have that function available, that one can actually run these snippets locally or in compiler explorer, to be able to play around etc.
#4 by jackson on March 26th, 2021 ·
I agree. Unfortunately, it’s quite an advanced function to write with support for arbitrary numbers of arguments and arbitrary types. I plan to finally show it a few articles from now when covering the I/O library. In the meantime, feel free to approximate it with
printf
from the [C Standard Library](/articles/6359) article.#5 by radwan on September 18th, 2021 ·
Note that the C#’s stackalloc allows for dynamically allocated arrays, whereas in C++ the size of the array must be known at compile time.
#6 by Kamikaze on October 1st, 2021 ·
It’s good to note that since version 9 C# supports function pointers.