Which Hash Set is Fastest?
Say you need to keep track of things you’ve already done, perhaps to avoid doing them again. What’s the fastest way to do that? HashSet<T>
seems like a natural fit, so you might choose that without a second thought. But is it faster than similar collections like Hashtable
and Dictionary<TKey, TValue>
? Today’s article puts all three to the test to see which one can insert elements, check for containment, and remove elements the quickest. Read on for the surprising results!
HashSet<T>
is, like other sets, a collection of unique values of type T
. Internally it uses a hash to quickly add values to the set, check if values are contained, and remove elements from the set. It should be the go-to data structure when you need to do containment checks only. That’s different than the next two classes.
Hashtable
is a one-to-many map of any type of key to any type of value. There’s no generic T
type here, just plain object
keys and values. Internally it also uses a hash for add, contains, and remove operations. If you have varying types of keys and values, it seems like the obvious choice for a key-value map.
Dictionary<TKey, TValue>
is a one-to-many map of TKey
keys to TValue
values. It’s the strongly-typed generic version of Hashtable
. Since all the keys and all the values in a map are almost always the same type as each other, this class is much more common than Hashtable
in C# code. You get strong typing and support for value types (e.g. int
) without the need for boxing.
Today’s test doesn’t need a map, just containment checks. Maps can do that too, so the test adds on a dummy value to the Hashtable
and Dictionary
. That means the three classes are used like this:
HashSet<T>
—T
is aMyObject
classHashtable
— map ofMyObject
keys to a single dummyMyObject
valueDictionary<TKey, TValue>
—TKey
is aMyObject
,TValue
is a dummybool
Now let’s add, check containment, and remove to each collection a million times:
using System; using System.Collections; using System.Collections.Generic; using UnityEngine; class TestScript : MonoBehaviour { class MyObject { } string report = ""; void Start() { // Setup System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); Hashtable hashtable = new Hashtable(); HashSet<MyObject> hashSet = new HashSet<MyObject>(); Dictionary<MyObject, bool> dictionary = new Dictionary<MyObject, bool>(); MyObject dummyValue = new MyObject(); const int size = 1000000; MyObject[] myObjects = new MyObject[size]; for (int i = 0; i < size; ++i) { myObjects[i] = new MyObject(); } // Add stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashtable.Add(myObjects[i], dummyValue); } long hashtableAddTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashSet.Add(myObjects[i]); } long hashSetAddTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { dictionary.Add(myObjects[i], true); } long dictionaryAddTime = stopwatch.ElapsedMilliseconds; // Contains stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashtable.Contains(myObjects[i]); } long hashtableContainsTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashSet.Contains(myObjects[i]); } long hashSetContainsTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { dictionary.ContainsKey(myObjects[i]); } long dictionaryContainsTime = stopwatch.ElapsedMilliseconds; // Remove stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashtable.Remove(myObjects[i]); } long hashtableRemoveTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { hashSet.Remove(myObjects[i]); } long hashSetRemoveTime = stopwatch.ElapsedMilliseconds; stopwatch.Reset(); stopwatch.Start(); for (int i = 0; i < size; ++i) { dictionary.Remove(myObjects[i]); } long dictionaryRemoveTime = stopwatch.ElapsedMilliseconds; // Report report = string.Format( "Container,Add Time,Contains Time,Remove Time\n" + "Hashtable,{0},{1},{2}\n" + "HashSet,{3},{4},{5}\n" + "Dictionary,{6},{7},{8}", hashtableAddTime, hashtableContainsTime, hashtableRemoveTime, hashSetAddTime, hashSetContainsTime, hashSetRemoveTime, dictionaryAddTime, dictionaryContainsTime, dictionaryRemoveTime ); Debug.Log(report); } void OnGUI() { GUI.TextField(new Rect(0, 0, Screen.width, Screen.height), report); } }
If you want to try out the test yourself, simply paste the above code into a TestScript.cs
file in your Unity project’s Assets
directory and attach it to the main camera game object in a new, empty project. Then build in non-development mode, ideally with IL2CPP. I ran it that way on this machine:
- LG Nexus 5X
- Android 7.1.1
- Unity 5.5.1f1, IL2CPP
And here are the results I got:
Container | Add Time | Contains Time | Remove Time |
---|---|---|---|
Hashtable | 2184 | 714 | 794 |
HashSet | 2635 | 697 | 659 |
Dictionary | 2247 | 622 | 720 |
The surprising result here is that each of these container classes is best at one of the three operations! Hashtable
is best at adding, HashSet
is best at removing, and Dictionary
is best at containment checks. That means it’s important to know which type of operation your code is likely to do the most of. For example, if you usually clear instead of removing individual elements then you probably shouldn’t pick HashSet
. If you check a lot more than you add then you should probably choose Dictionary
.
Unfortunately this means that the container class that makes the most sense as a set is not always the best pick. You may need to go with a Dictionary
or Hashtable
to get the most performance in some situations, even though you never use their mapping capability.
#1 by MisaJ on March 27th, 2017 ·
It would be interesting to check how does List.Contains compares to these, at various collection sizes.
There is a strong penalty for iterating over hash sets/dictionaries, so often times it is worth to do
List.Contains() before adding, instead of storing objects intro HashSets.
For very small collections (N < 10), List.Contains should be even faster than HashSet.Contains
#2 by jackson on March 27th, 2017 ·
Obviously
List
will be much slower at large collection sizes, but it may be able to compete with very small collections like the <10 you mention. Let me know what results you get if you try this out.