Swapping values is probably one of the simplest algorithms which can be imagined - we learn about it when starting our programming story. There are two popular ways to accomplish this: using a temporary variable and XORing (with some restrictions). In the newest C# versions there is also a third way, about which you can read in this article.
Theory
A few weeks ago, one of my friends sent me this tweet from which we will start:
What do we see here is the usage of the value tuples syntax (introduced in the C# 7) to swap two values. As the description says, first we create a tuple with our two variables (on the right side of the assignment) and then it’s instantly deconstructed into the second tuple (on the left side of the assignment) but in reversed order. It was quite surprising as I never thought about using tuples this way, but at the same time, I was curious if this solution has some performance overhead.
On the one side, we use here ValueTuple struct, so it’s not an overhead for garbage collector - values are stored on the stack and will be just wiped after exiting from the method. On the other hand, we don’t know if the compiler will generate some extra IL instructions compared to the traditional swap using a temporary variable. Let’s check it on the simple example.
Benchmark
As in the previous articles, I used BenchmarkDotNet library which is extremally handful when measuring performance and checking output generated by the JIT. Our benchmark is very simple and swaps two variables utilizing three different methods: through temporary variable, through XOR operations, and tuples (like on the tweet above). Additional assignments on the beginning and the end of each test are made to ensure, that compiler/JIT will make these operations using registers to maximize performance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
| using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
namespace SwapBenchmark
{
[DisassemblyDiagnoser]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
public class SwapTest
{
private static volatile int A = 10;
private static volatile int B = 20;
[Benchmark]
public void SwapTraditional()
{
var a = A;
var b = B;
var temp = a;
a = b;
b = temp;
A = a;
B = b;
}
[Benchmark]
public void SwapXor()
{
var a = A;
var b = B;
a ^= b;
b ^= a;
a ^= b;
A = a;
B = b;
}
[Benchmark]
public void SwapTuples()
{
var a = A;
var b = B;
(a, b) = (b, a);
A = a;
B = b;
}
}
class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run<SwapTest>();
}
}
}
|
Let’s check the IL output first. Our first method SwapTraditional
is straightforward and in fact uses only four instructions:
- ldsfld - loads static field onto the stack.
- stsfld - pops variable from the stack and stores it into the static field.
- ldloc.x - loads local variable and stores it onto the stack at the specified index (x).
- stloc.x - pops variable from the stack and stores it into the local variables list at the specified index (x).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
| .method public hidebysig instance void
SwapTraditional() cil managed
{
.custom instance void [BenchmarkDotNet.Annotations]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor()
= (01 00 00 00 )
.maxstack 2
.locals init (
[0] int32 a,
[1] int32 temp
)
// [17 13 - 17 23]
IL_0000: volatile.
IL_0002: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
IL_0007: stloc.0 // a
// [18 13 - 18 23]
IL_0008: volatile.
IL_000a: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
// [20 13 - 20 26]
IL_000f: ldloc.0 // a
IL_0010: stloc.1 // temp
// [21 13 - 21 19]
IL_0011: stloc.0 // a
// [22 13 - 22 22]
IL_0012: ldloc.1 // temp
// [24 13 - 24 19]
IL_0013: ldloc.0 // a
IL_0014: volatile.
IL_0016: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
// [25 13 - 25 19]
IL_001b: volatile.
IL_001d: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
// [26 9 - 26 10]
IL_0022: ret
} // end of method SwapTest::SwapTraditional
|
We can see the sequence of the following operations: first, the compiler loads static field A
onto the stack and then pops it into the local variable a
. Static field B
is also loaded in a similar way, but it stays in the stack. Then, the local variable a
is loaded onto the stack and immediately popped into the temp
. Just after it, the value of the static field B
(which has been loaded earlier) is assigned to the a
. At this point, the local variable a
contains value of the static field B
, and the local variable temp
contains the value of the a
. The last operations are loading both of them onto the stack and popping into A
and B
.
Now time for the second test, which uses XOR operations. The compiler uses here the new instruction xor
which does exactly what the name indicates - pops two variables from the stack, XORs them and stores the result again on the stack.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
| .method public hidebysig instance void
SwapXor() cil managed
{
.custom instance void [BenchmarkDotNet.Annotations]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor()
= (01 00 00 00 )
.maxstack 2
.locals init (
[0] int32 a,
[1] int32 b
)
// [31 13 - 31 23]
IL_0000: volatile.
IL_0002: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
IL_0007: stloc.0 // a
// [32 13 - 32 23]
IL_0008: volatile.
IL_000a: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
IL_000f: stloc.1 // b
// [34 13 - 34 20]
IL_0010: ldloc.0 // a
IL_0011: ldloc.1 // b
IL_0012: xor
IL_0013: stloc.0 // a
// [35 13 - 35 20]
IL_0014: ldloc.1 // b
IL_0015: ldloc.0 // a
IL_0016: xor
IL_0017: stloc.1 // b
// [36 13 - 36 20]
IL_0018: ldloc.0 // a
IL_0019: ldloc.1 // b
IL_001a: xor
IL_001b: stloc.0 // a
// [38 13 - 38 19]
IL_001c: ldloc.0 // a
IL_001d: volatile.
IL_001f: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
// [39 13 - 39 19]
IL_0024: ldloc.1 // b
IL_0025: volatile.
IL_0027: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
// [40 9 - 40 10]
IL_002c: ret
} // end of method SwapTest::SwapXor
|
The thrid test is finally our tuple-based swap. Surprisingly, it’s quite similar to the first one (where the temporary variable was used). First, both static fields A
and B
are loaded into the local variables a
and b
, then we have a series of ldloc
and stloc
instructions which also use a temporary variable called V_2
to swap values.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
| .method public hidebysig instance void
SwapTuples() cil managed
{
.custom instance void [BenchmarkDotNet.Annotations]BenchmarkDotNet.Attributes.BenchmarkAttribute::.ctor()
= (01 00 00 00 )
.maxstack 2
.locals init (
[0] int32 a,
[1] int32 b,
[2] int32 V_2
)
// [45 13 - 45 23]
IL_0000: volatile.
IL_0002: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
IL_0007: stloc.0 // a
// [46 13 - 46 23]
IL_0008: volatile.
IL_000a: ldsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
IL_000f: stloc.1 // b
// [48 13 - 48 29]
IL_0010: ldloc.1 // b
IL_0011: ldloc.0 // a
IL_0012: stloc.2 // V_2
IL_0013: stloc.0 // a
IL_0014: ldloc.2 // V_2
IL_0015: stloc.1 // b
// [50 13 - 50 19]
IL_0016: ldloc.0 // a
IL_0017: volatile.
IL_0019: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::A
// [51 13 - 51 19]
IL_001e: ldloc.1 // b
IL_001f: volatile.
IL_0021: stsfld int32 modreq ([System.Runtime]System.Runtime.CompilerServices.IsVolatile) SwapBenchmark.SwapTest::B
// [52 9 - 52 10]
IL_0026: ret
} // end of method SwapTest::SwapTuples
|
In theory, this method is a bit slower because utilizes more IL instructions than in the first example (5 ldloc
/5 stloc
vs 3 ldloc
/3 stloc
). Will this affect the final output generated by the JIT? The answer is: no!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| ; SwapBenchmark.SwapTest.SwapTraditional()
mov eax,[7FFC4CC5B7AC]
mov edx,[7FFC4CC5B7B0]
mov [7FFC4CC5B7AC],edx
mov [7FFC4CC5B7B0],eax
ret
; SwapBenchmark.SwapTest.SwapXor()
mov eax,[7FFC4CC2B7AC]
mov edx,[7FFC4CC2B7B0]
xor eax,edx
xor edx,eax
xor eax,edx
mov [7FFC4CC2B7AC],eax
mov [7FFC4CC2B7B0],edx
ret
; SwapBenchmark.SwapTest.SwapTuples()
mov eax,[7FFC4CC4B7AC]
mov edx,[7FFC4CC4B7B0]
mov [7FFC4CC4B7AC],edx
mov [7FFC4CC4B7B0],eax
ret
|
Both first and third tests have exactly the same outputs - we can see that they have been optimized to use fast registers instead of using a slower stack. So the most important conclusion is: using tuples to swap values is exactly as fast as using a temporary variable.
The method which uses XOR operations is a bit more complex - three additional xor
instructions won’t speed up the whole process for sure. In the next chapter, we will do a more life example with a real algorithm to check the times.
Bubble sort
One of the best algorithms to test the execution time of swaps is a bubble sort, so we will use it to check the performance of our three methods. In this benchmark, there are three tests where each performs a sort for the array with 10000 elements (initialized with values from 10000 to 1 to get the most pessimistic case).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
| using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
namespace BubbleSortSwapBenchmark
{
[DisassemblyDiagnoser]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
public class BubbleSortSwapTest
{
private const int N = 10000;
[Benchmark]
public void BubbleSortSwapTraditional()
{
var array = Enumerable.Range(0, N).Reverse().ToArray();
var sorted = false;
while (!sorted)
{
sorted = true;
for (var i = 0; i < array.Length - 1; i++)
{
if (array[i] > array[i + 1])
{
var temp = array[i];
array[i] = array[i + 1];
array[i + 1] = temp;
sorted = false;
}
}
}
}
[Benchmark]
public void BubbleSortSwapXor()
{
var array = Enumerable.Range(0, N).Reverse().ToArray();
var sorted = false;
while (!sorted)
{
sorted = true;
for (var i = 0; i < array.Length - 1; i++)
{
if (array[i] > array[i + 1])
{
array[i] ^= array[i + 1];
array[i + 1] ^= array[i];
array[i] ^= array[i + 1];
sorted = false;
}
}
}
}
[Benchmark]
public void BubbleSortSwapTuples()
{
var array = Enumerable.Range(0, N).Reverse().ToArray();
var sorted = false;
while (!sorted)
{
sorted = true;
for (var i = 0; i < array.Length - 1; i++)
{
if (array[i] > array[i + 1])
{
(array[i], array[i + 1]) = (array[i + 1], array[i]);
sorted = false;
}
}
}
}
}
class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run<BubbleSortSwapTest>();
}
}
}
|
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.959 (1909/November2018Update/19H2)
Intel Core i5-8300H CPU 2.30GHz (Coffee Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.100-preview.5.20279.10
[Host] : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
.NET Core 3.1 : .NET Core 3.1.6 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.31603), X64 RyuJIT
Job=.NET Core 3.1 Runtime=.NET Core 3.1
Method | Mean | Error | StdDev | Code Size |
---|
BubbleSortSwapTraditional | 91.88 ms | 0.355 ms | 0.315 ms | 522 B |
BubbleSortSwapXor | 133.82 ms | 0.668 ms | 0.624 ms | 540 B |
BubbleSortSwapTuples | 91.33 ms | 0.359 ms | 0.336 ms | 528 B |
The results are clear and valid with our earlier observations: both traditional and tuple-based swaps have exactly the same performance (differences in the mean time are within the error margin).
Summary
Sometimes, it’s not easy to tell if some solution has performance overhead, especially when we think about JIT with its complex optimization processes. As we saw in this article, a tuple-based algorithm is an excellent way to swap two elements, providing better readability and the same performance as the traditional algorithm using a temporary variable. Definitely worth to try and use in our projects.