Native collections are funny things. On one hand they’re structs, which are supposed to be value types that get copied on assignment. On the other hand, they act like reference types because they contain a hidden pointer internally. This can make using and implementing them difficult to understand, especially in the context of a ParallelFor job. Today we’ll examine more closely how to properly support ParallelFor jobs, especially with ranged containers like NativeList<T>.

Copying NativeArray

Since native collection types like NativeArray<T> are structs, a copy is made whenever a variable is assigned, parameter is passed, or value returned. Changing one copy normally doesn’t change the other, such as with Vector3:

Vector3 a = new Vector3(10, 20, 30);
// Make a copy
Vector3 b = a;
// Change the original. Doesn't change the copy.
a.x = 100;
Debug.Log(b); // [ 10, 20, 30 ]

This is still technically the case with native collection types, Consider NativeArray<T>, which has these fields according to the reference source:

internal void* m_Buffer;
internal int m_Length;
internal int m_MinIndex;
internal int m_MaxIndex;
internal AtomicSafetyHandle m_Safety;
internal DisposeSentinel m_DisposeSentinel;
internal Allocator m_AllocatorLabel;

Note: the reference source omits the #if ENABLE_UNITY_COLLECTIONS_CHECKS wrapper around the middle section of fields.

Now let’s look at what happens when a copy of a NativeArray<T> is made:

NativeArray<int> a = new NativeArray<int>(3, Allocator.Temp);
a[0] = 10;
a[1] = 20;
a[2] = 30;
// Make a copy
NativeArray<int> b = a;
// Change the original. Sort of changes the copy...
a[0] = 100;
Debug.Log(b[0]); // 100

The reason for this behavior is that making a copy of a into b makes a copy of the m_Buffer field along with all the others. The NativeArray<T> set indexer changes the memory this field points to, not the pointer itself. Then the NativeArray<T> get indexer reads the memory this field points to, not the pointer itself. Since both a and b have the same pointer, they end up reading and writing the same memory.

This behavior makes NativeArray<T> partly a value type because assignment is done by copying and partly a reference type because changes to one are reflected in changes to the other.

Copying NativeList

NativeArray<T> stores its length in its m_Length field. Like all fields, m_Length is copied during assignment, parameter passing, and value returning. This is fine for NativeArray<T> because the length never changes. This isn’t the case for other types like NativeList<T> because its length will change as elements are added to it. So how does it handle synchronizing this across all copies? The answer is to put the length inside the data pointed to by the pointer. Here’s a trimmed-down version of how it looks as of preview.9 of the com.unity.collections package:

public struct NativeList<T> : IDisposable
    where T : struct
    internal NativeListImpl<T, DefaultMemoryManager> m_Impl;
unsafe struct NativeListData
    public void* buffer;
    public int length;
    public int capacity;
public unsafe struct NativeListImpl<T, TMemManager>
    where T : struct
    where TMemManager : struct, INativeBufferMemoryManager
    NativeListData* m_ListData;

The NativeList<T> contains a NativeListImpl<T, TMemManager> which in turn contains a pointer to a NativeListData which contains a pointer to the backing array as well as the length and capacity. Let’s look at how this allows NativeList<T> to handle copying in such a way that it acts like a reference type:

NativeList<int> a = new NativeList<int>(Allocator.Temp);
NativeList<int> b = a;
Debug.Log(a[0]); // 10
Debug.Log(a[3]); // 100
Debug.Log(a.Length); // 6

If NativeList<T> simply contained its length as a field like how NativeArray<T> contained m_Length, copying a to b would have copied this field. The calls to b.Add would have changed b.m_Length without updating a.m_Length, so the Debug.Log(a.Length) at the end would have printed 3.

Instead, the length is stored in a NativeListData that the NativeList<T> has a pointer to, indirectly through NativeListImpl<T, TMemManager>. This allows the length to be shared between copies just like how NativeArray<T> shared its array elements.

Ranged Collections

NativeArray<T> has the [NativeContainerSupportsMinMaxWriteRestriction] attribute and is therefore a ranged collection. To support this, it has the m_Length, m_MinIndex, and m_MaxIndex fields. Since this is currently undocumented, let’s break down what each field means.

m_Length is an int with that specific name. When a ParallelFor job that has the collection as a field is executed, Unity reads this field and uses it to write m_MinIndex and m_MaxIndex. Note that Unity never writes to m_Length.

m_MinIndex and m_MaxIndex are also int fields with those specific names, in that specific order, and must directly follow m_Length. When a ParallelFor job that has the collection as a field is executed, Unity writes these fields based on the portion of the size that was passed to Schedule that is currently being executed. Note that Unity never reads from m_MinIndex or m_MaxIndex.

It’s the responsibility of the native collection to use the m_MinIndex and m_MaxIndex fields to bounds-check all accesses. For example, the indexer for NativeArray<T> contains this code:

if (index < m_MinIndex || index > m_MaxIndex)
Dynamic Ranged Collection Problems

Now let’s say we want to create a collection like NativeList<T> that’s dynamically resizable but also supports ParallelFor jobs as a ranged collection. This sounds easy because we’ve already seen how to solve the problem of sharing changes to the length and the bounds-checking should be trivial with a collection that’s just a one-dimensional array.

Unfortunately, supporting this is quite awkward with the native collections system. The issue is that Unity reads m_Length from ranged collections but this isn’t where collections like NativeList<T> store the length. Unity doesn’t know to read m_Impl.m_ListData->length and instead insists on reading m_Length. This puts us back in the same quandary as before: how do we synchronize m_Length across all copies of the NativeList<T>?

Unity has solved this in two ways. First, they’ve provided an implicit conversion operator to create a NativeArray<T> that shares its pointer with the NativeList<T>. This has some limitations. First, NativeArray<T> has almost no methods and therefore can’t be used with any extra functionality that might be present in NativeList<T>. Second, and more importantly, if the NativeList<T> changes its length after the NativeArray<T> is created then the m_Length field of the NativeArray<T> won’t be updated and the NativeArray<T> will continue to have the old length. Even worse, if adding elements to the NativeList<T> makes it run out of capacity then it’ll deallocate the buffer that the NativeArray<T> is still using. Unity will provide the appropriate error, but it’s very easy to write brittle code because calling NativeList<T>.Add only sometimes deallocates the array and even so it’s not obvious that it will ever resize as this is an implementation detail.

The second solution is to provide NativeList<T>.ToDeferredJobArray. This also creates a NativeArray<T>, but it sets its length to zero, allocator to Invalid, and pointer to one byte after the NativeListData pointer. This odd pointer memory address signals to Unity’s job system that the length of the NativeArray<T> needs to be written before jobs using it are executed. Exactly how this works seems to exist only inside the closed source portion of Unity and there is apparently no documentation available. If you know of any, please comment to point the way.

Dynamic Ranged Collection Solutions

Unity’s first solution, creating a NativeArray<T> with fixed size, is suitable for collections like NativeList<T> that have exactly one backing array. It isn’t suitable for collections like NativeChunkedList<T> in the NativeCollections GitHub project which contain many backing arrays. Even NativeLinkedList<T> isn’t a very good candidate since its nodes are in an arbitrary order until SortNodeMemoryAddresses is called.

The second solution seems closed to all but NativeList<T>. It’s possible that with better understanding of Unity’s internals that this is a system that can be leveraged by collections other than NativeList<T>, but for now that is unknown and best avoided. Even if the current implementation could be reverse-engineered, there are no guarantees that the implementation will remain consistent and well-supported from version to version.

This means that we need a third solution for types like NativeChunkedList<T> and NativeLinkedList<T>. One way that’s simple, straightforward, and effective is the following strategy. First, initialize m_Length, m_MinIndex, and m_MaxIndex to -1. This flags the collection as being used outside of a ParallelFor job.

public MyCollection()
    m_Length = -1;
    m_MinIndex = -1;
    m_MaxIndex = -1;

Next, never write m_MinIndex or m_MaxIndex in the collection’s code after initialization. This means only Unity will write them and only when a copy is used in a ParallelFor job.

Finally, add a method to synchronize m_Length to the value stored in the shared memory each copy has a pointer to. This allows users who are about to execute a ParallelFor job to manually synchronize the length so that Unity will read the correct value and therefore write the correct values of m_MinIndex and m_MaxIndex.

public void PrepareForParallelForJob()
    m_Length = m_State->m_Length;

Since only Unity will write non-negative values of m_MinIndex and m_MaxIndex, it will only write them when the collection is being used in a ParallelFor job, and it will write the values that need to be bounds-checked, this leads to the following updated error-checking code:

    // Used within a ParallelFor job
    if (m_MinIndex != -1 || m_MaxIndex != -1)
        // Make sure the length is correct
        if (m_Length != m_State->m_Length)
            throw new Exception(
                "Call PrepareForParallelForJob before a ParallelFor job");
        // Perform a bounds check
        if (index < m_MinIndex || index > m_MaxIndex)
            throw new Exception(
                "Index " + m_Index + " out of bounds: ["
                + m_MinIndex + ", " + m_MaxIndex + "]");

Usage is then simple and explicit:

// Initially length is zero
MyCollection a = new MyCollection();
// Make a copy
MyCollection b = a;
// Add some elements so A's length is now 3
// Synchronize B's length so it can be used in a ParallelFor job
// Use B in a ParallelFor job
const int innerloopBatchCount = 1;
MyJob job = new MyJob { Collection = b };
job.Schedule(b.Length, innerloopBatchCount).Complete();

Unity’s native collection support is still new and currently not very friendly to ranged collection types that change their length within ParallelFor jobs. The above strategy is a workaround that requires some manual code: a call to PrepareForParallelForJob. It also limits how much code can be run in jobs before synchronizing back to the main thread so that PrepareForParallelForJob can be called to update the ranged collections for future jobs. Hopefully this will be addressed in future versions of Unity. Due to the lack of documentation and closed engine source code, there is some degree of guesswork in this article. If any of it is incorrect, please comment to let me know.

Both NativeLinkedList<T> and NativeChunkedList<T> in the NativeCollections GitHub project have been updated with a PrepareForParallelForJob method to improve their correctness within a ParallelFor job context. Feel free to check out the code to see how it works in more complete types or to simply integrate it into your own projects.