A reader recently asked what the fastest way to take an absolute value was. It occurred to me that there are a lot of ways to do this in Unity! So today we’ll try them all out and see which is best.

Update: A Russian translation of this article is available.

Options

How many ways can we take the absolute value of a float? Let us count:

1. `a = System.Math.Math.Abs(f);`
2. ``` a = UnityEngine.Mathf.Abs(f); a = Unity.Mathematics.math.abs(f); if (f >= 0) a = f; else a = -f; a = f >= 0 ? f : -f; ```
``` That's a lot of ways! The custom options (#3 and #4) are obvious, but let's dive into the others to see what they're doing. Here's a decompiled version of Math.Abs: [MethodImpl(4096)] public static extern float Abs(float value); This is implemented in native code, so the C# trail ends here. Let's move on to #2 and look at Mathf.Abs: public static float Abs(float f) { return Math.Abs(f); } This is just a wrapper around Math.Abs! Moving on, let's see math.abs: [MethodImpl(MethodImplOptions.AggressiveInlining)] public static float abs(float x) { return asfloat(asuint(x) & 0x7FFFFFFF); } Apparently, this is implemented by converting the float to a uint, making the sign bit (0x7FFFFFFF is 0 followed by 31 1s) a zero, then converting back to a float. To see the conversions, let's look at asuint: [MethodImpl(MethodImplOptions.AggressiveInlining)] public static uint asuint(float x) { return (uint)asint(x); } This just wraps asint: [MethodImpl(MethodImplOptions.AggressiveInlining)] public static int asint(float x) { IntFloatUnion u; u.intValue = 0; u.floatValue = x; return u.intValue; } asint uses a union to alias the memory without changing the bits. We've seen that trick before and know that it even works with Burst. Here's how Unity implemented it: [StructLayout(LayoutKind.Explicit)] internal struct IntFloatUnion { [FieldOffset(0)] public int intValue; [FieldOffset(0)] public float floatValue; } This is the same approach as we've seen before: the same memory within the struct is used to hold the int and the float so its type can be changed without affecting the bits. Now let's look at the return trip and see how the uint is changed back into a float: [MethodImpl(MethodImplOptions.AggressiveInlining)] public static float asfloat(uint x) { return asfloat((int)x); } Again, this just uses the int overload: [MethodImpl(MethodImplOptions.AggressiveInlining)] public static float asfloat(int x) { IntFloatUnion u; u.floatValue = 0; u.intValue = x;   return u.floatValue; } This does the opposite of before using the same union. Jobs Next, let's write some Burst-compiled jobs to test out each of these approaches by taking the absolute value of one element of a NativeArray<float>: [BurstCompile] struct SingleSystemMathAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { Values[0] = Math.Abs(Values[0]); } }   [BurstCompile] struct SingleUnityEngineMathfAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { Values[0] = Mathf.Abs(Values[0]); } }   [BurstCompile] struct SingleUnityMathematicsMathAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { Values[0] = math.abs(Values[0]); } }   [BurstCompile] struct SingleIfJob : IJob { public NativeArray<float> Values;   public void Execute() { float val = Values[0]; if (val >= 0) { Values[0] = val; } else { Values[0] = -val; } } }   [BurstCompile] struct SingleTernaryJob : IJob { public NativeArray<float> Values;   public void Execute() { float val = Values[0]; Values[0] = val >= 0 ? val : -val; } } We'll look at the results of the performance test later on, but for now let's take a deeper look and see the assembly code that Burst compiles each of these jobs to. This will show us what the CPU will actually run and reveal some details hidden by the native code implementation of Math.Abs. First, here is the assembly for both Math.Abs and Mathf.Abs: mov rax, qword ptr [rdi] movss xmm0, dword ptr [rax] movabs rcx, offset .LCPI0_0 andps xmm0, xmmword ptr [rcx] movss dword ptr [rax], xmm0 ret The key line here is andps which is performing the & 0x7FFFFFFF we saw in math.abs to clear the most-significant bit. Now let's look at the Burst output for math.abs: mov rax, qword ptr [rdi] movss xmm0, dword ptr [rax] movabs rcx, offset .LCPI0_0 andps xmm0, xmmword ptr [rcx] movss dword ptr [rax], xmm0 ret As expected, this also uses andps. In fact, it's identical to the output for Math.Abs and Mathf.Abs! Here's the custom version based on if: mov rax, qword ptr [rdi] movss xmm0, dword ptr [rax] xorps xmm1, xmm1 ucomiss xmm1, xmm0 jbe .LBB0_2 movabs rcx, offset .LCPI0_0 xorps xmm0, xmmword ptr [rcx] movss dword ptr [rax], xmm0 .LBB0_2: ret This version introduces a branch: jbe. This skips writing anything to the array when the value is positive. That's an improvement because there's less memory to write but the branch may result in misprediction that inefficiently uses the CPU. Finally, here's the custom version based on the ternary operator: mov rax, qword ptr [rdi] movss xmm0, dword ptr [rax] movabs rcx, offset .LCPI0_0 andps xmm0, xmmword ptr [rcx] movss dword ptr [rax], xmm0 ret Strangely, this version does not include a branch like in the if-based version. Instead, it's identical to the output for Math.Abs, Mathf.Abs, and math.abs! Somehow Burst was able to detect and radically transform the code in this version, but not with if. Performance Now let's put all of these approaches to a performance test to see which is fastest. We'll do this by taking the absolute value of every element of a NativeArray<float> containing random values: using System; using System.Diagnostics; using Unity.Burst; using Unity.Collections; using Unity.Jobs; using Unity.Mathematics; using UnityEngine;   [BurstCompile] struct SystemMathAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { for (int i = 0; i < Values.Length; ++i) { Values[i] = Math.Abs(Values[i]); } } }   [BurstCompile] struct UnityEngineMathfAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { for (int i = 0; i < Values.Length; ++i) { Values[i] = Mathf.Abs(Values[i]); } } }   [BurstCompile] struct UnityMathematicsMathAbsJob : IJob { public NativeArray<float> Values;   public void Execute() { for (int i = 0; i < Values.Length; ++i) { Values[i] = math.abs(Values[i]); } } }   [BurstCompile] struct IfJob : IJob { public NativeArray<float> Values;   public void Execute() { for (int i = 0; i < Values.Length; ++i) { float val = Values[i]; if (val >= 0) { Values[i] = val; } else { Values[i] = -val; } } } }   [BurstCompile] struct TernaryJob : IJob { public NativeArray<float> Values;   public void Execute() { for (int i = 0; i < Values.Length; ++i) { float val = Values[i]; Values[i] = val >= 0 ? val : -val; } } }   class TestScript : MonoBehaviour { void Start() { NativeArray<float> values = new NativeArray<float>( 100000, Allocator.TempJob); for (int i = 0; i < values.Length; ++i) { values[i] = UnityEngine.Random.Range(float.MinValue, float.MaxValue); }   SystemMathAbsJob systemMathAbsJob = new SystemMathAbsJob { Values = values }; UnityEngineMathfAbsJob unityEngineMathfAbsJob = new UnityEngineMathfAbsJob { Values = values }; UnityMathematicsMathAbsJob unityMathematicsMathAbsJob = new UnityMathematicsMathAbsJob { Values = values }; IfJob ifJob = new IfJob { Values = values }; TernaryJob ternaryJob = new TernaryJob { Values = values };   // Warm up the job system systemMathAbsJob.Run(); unityEngineMathfAbsJob.Run(); unityMathematicsMathAbsJob.Run(); ifJob.Run(); ternaryJob.Run();   Stopwatch sw = Stopwatch.StartNew(); systemMathAbsJob.Run(); long systemMathAbsTicks = sw.ElapsedTicks;   sw.Restart(); unityEngineMathfAbsJob.Run(); long unityEngineMathfAbsTicks = sw.ElapsedTicks;   sw.Restart(); unityMathematicsMathAbsJob.Run(); long unityMathematicsMathAbsTicks = sw.ElapsedTicks;   sw.Restart(); ifJob.Run(); long ifTicks = sw.ElapsedTicks;   sw.Restart(); ternaryJob.Run(); long ternaryTicks = sw.ElapsedTicks;   values.Dispose();   print( "Function,Ticksn" + "System.Math.Abs," + systemMathAbsTicks + "n" + "UnityEngine.Mathf.Abs," + unityEngineMathfAbsTicks + "n" + "Unity.Mathematics.Math.Abs," + unityMathematicsMathAbsTicks + "n" + "If," + ifTicks + "n" + "Ternary," + ternaryTicks ); } } Here's the test machine I ran this on: 2.7 Ghz Intel Core i7-6820HQ macOS 10.14.6 Unity 2018.4.3f1 macOS Standalone .NET 4.x scripting runtime version and API compatibility level IL2CPP Non-development 640×480, Fastest, Windowed Function Ticks System.Math.Abs 253 UnityEngine.Mathf.Abs 198 Unity.Mathematics.Math.Abs 195 If 517 Ternary 1447 The performance is unexpectedly varied! We previously saw that four out of five versions had the exact same assembly output. Yet in the results we only see Math.Abs, Mathf.Abs, and math.Abs with about the same performance. The ternary-based custom version should have performed the same, but didn't. In fact, the ternary version performed over 10x worse! To find out what's going on, let's look at the assembly code in Burst Inspector for these jobs to see if the added loop impacted the output. Math.Abs, Mathf.Abs, and math.abs: movsxd rax, dword ptr [rdi + 8] test rax, rax jle .LBB0_8 mov rcx, qword ptr [rdi] cmp eax, 8 jae .LBB0_3 xor edx, edx jmp .LBB0_6 .LBB0_3: mov rdx, rax and rdx, -8 lea rsi, [rcx + 16] movabs rdi, offset .LCPI0_0 movaps xmm0, xmmword ptr [rdi] mov rdi, rdx .p2align 4, 0x90 .LBB0_4: movups xmm1, xmmword ptr [rsi - 16] movups xmm2, xmmword ptr [rsi] andps xmm1, xmm0 andps xmm2, xmm0 movups xmmword ptr [rsi - 16], xmm1 movups xmmword ptr [rsi], xmm2 add rsi, 32 add rdi, -8 jne .LBB0_4 cmp rdx, rax je .LBB0_8 .LBB0_6: sub rax, rdx lea rcx, [rcx + 4*rdx] movabs rdx, offset .LCPI0_0 movaps xmm0, xmmword ptr [rdx] .p2align 4, 0x90 .LBB0_7: movss xmm1, dword ptr [rcx] andps xmm1, xmm0 movss dword ptr [rcx], xmm1 add rcx, 4 dec rax jne .LBB0_7 .LBB0_8: ret if: movsxd rax, dword ptr [rdi + 8] test rax, rax jle .LBB0_26 mov r8, qword ptr [rdi] cmp eax, 8 jae .LBB0_3 xor edx, edx jmp .LBB0_22 .LBB0_3: mov rdx, rax and rdx, -8 xorps xmm0, xmm0 movabs rsi, offset .LCPI0_0 movaps xmm1, xmmword ptr [rsi] mov rsi, r8 mov rdi, rdx .p2align 4, 0x90 .LBB0_4: movups xmm3, xmmword ptr [rsi] movups xmm2, xmmword ptr [rsi + 16] movaps xmm4, xmm3 cmpltps xmm4, xmm0 pextrb ecx, xmm4, 0 test cl, 1 jne .LBB0_5 pextrb ecx, xmm4, 4 test cl, 1 jne .LBB0_7 .LBB0_8: pextrb ecx, xmm4, 8 test cl, 1 jne .LBB0_9 .LBB0_10: pextrb ecx, xmm4, 12 test cl, 1 je .LBB0_12 .LBB0_11: shufps xmm3, xmm3, 231 xorps xmm3, xmm1 movss dword ptr [rsi + 12], xmm3 .LBB0_12: movaps xmm3, xmm2 cmpltps xmm3, xmm0 pextrb ecx, xmm3, 0 test cl, 1 jne .LBB0_13 pextrb ecx, xmm3, 4 test cl, 1 jne .LBB0_15 .LBB0_16: pextrb ecx, xmm3, 8 test cl, 1 jne .LBB0_17 .LBB0_18: pextrb ecx, xmm3, 12 test cl, 1 jne .LBB0_19 .LBB0_20: add rsi, 32 add rdi, -8 jne .LBB0_4 jmp .LBB0_21 .p2align 4, 0x90 .LBB0_5: movaps xmm5, xmm3 xorps xmm5, xmm1 movss dword ptr [rsi], xmm5 pextrb ecx, xmm4, 4 test cl, 1 je .LBB0_8 .LBB0_7: movshdup xmm5, xmm3 xorps xmm5, xmm1 movss dword ptr [rsi + 4], xmm5 pextrb ecx, xmm4, 8 test cl, 1 je .LBB0_10 .LBB0_9: movaps xmm5, xmm3 movhlps xmm5, xmm5 xorps xmm5, xmm1 movss dword ptr [rsi + 8], xmm5 pextrb ecx, xmm4, 12 test cl, 1 jne .LBB0_11 jmp .LBB0_12 .p2align 4, 0x90 .LBB0_13: movaps xmm4, xmm2 xorps xmm4, xmm1 movss dword ptr [rsi + 16], xmm4 pextrb ecx, xmm3, 4 test cl, 1 je .LBB0_16 .LBB0_15: movshdup xmm4, xmm2 xorps xmm4, xmm1 movss dword ptr [rsi + 20], xmm4 pextrb ecx, xmm3, 8 test cl, 1 je .LBB0_18 .LBB0_17: movaps xmm4, xmm2 movhlps xmm4, xmm4 xorps xmm4, xmm1 movss dword ptr [rsi + 24], xmm4 pextrb ecx, xmm3, 12 test cl, 1 je .LBB0_20 .LBB0_19: shufps xmm2, xmm2, 231 xorps xmm2, xmm1 movss dword ptr [rsi + 28], xmm2 add rsi, 32 add rdi, -8 jne .LBB0_4 .LBB0_21: cmp rdx, rax je .LBB0_26 .LBB0_22: lea rcx, [r8 + 4*rdx] sub rax, rdx xorps xmm0, xmm0 movabs rdx, offset .LCPI0_0 movaps xmm1, xmmword ptr [rdx] .p2align 4, 0x90 .LBB0_25: movss xmm2, dword ptr [rcx] ucomiss xmm0, xmm2 jbe .LBB0_24 xorps xmm2, xmm1 movss dword ptr [rcx], xmm2 .LBB0_24: add rcx, 4 dec rax jne .LBB0_25 .LBB0_26: ret Ternary: xor eax, eax movabs rcx, offset .LCPI0_0 movaps xmm0, xmmword ptr [rcx] .p2align 4, 0x90 .LBB0_2: mov rcx, qword ptr [rdi] movss xmm1, dword ptr [rcx + 4*rax] andps xmm1, xmm0 movss dword ptr [rcx + 4*rax], xmm1 inc rax movsxd rcx, dword ptr [rdi + 8] cmp rax, rcx jl .LBB0_2 .LBB0_3: ret Just from a high level, we see that the if version is now substantially longer and more complex than the version for Math.Abs, Mathf.Abs, and math.abs. So it's not surprising that it's about half as fast. The custom version based on the ternary operator, on the other hand, is by far the smallest of the three outputs. It's a simple loop that performs the mask operation. So why is it so slow? Well, it happens to be the last job to run but moving it to earlier in the test has no impact on the results. Instead, the problem is that this version performs fewer absolute value operations per loop than the others. As a result, it spends more time on the loop and potentially branch mispredictions and less time taking absolute values. Conclusion When performing just one operation, Burst outputs identical code regardless of whether Math.Abs, Mathf.Abs, math.abs, or a ternary operator is used. When if is used, Burst outputs a branch instruction. So in this form, all approaches are equal except the if-based approach which should be avoided. When run in a loop, Burst produces identical machine code for the three fastest approaches: Math.Abs, Mathf.Abs, and math.abs. Using an if is about 2x slower and a ternary is about 10x slower. Both should be avoided. Put simply, Math.Abs, Mathf.Abs, and math.abs are all the best options. ```