Today’s article is both a reminder to optimize your algorithms and data structures before your code and a demonstration of the payoff you’ll get by doing so. By choosing the most effective algorithm and data structure to solve your problem you’ll reap huge rewards in performance. A 10x, 100x, or even bigger boost is easily attainable.

To demonstrate the importance of algorithm choice, let’s consider a problem where you need to find a key-value pair. Having read many articles on this site that inevitably result in Vector beating out competitors like Array, you might opt to store your key-value pairs in a Vector.<Pair> where Pair looks like this:

class Pair
{
    public var key:*;
    public var value:*;
}

You can then use the fastest loopfor—to search the Vector from start to finish:

var len:uint = vec.length;
for (var i:uint = 0; i < len; ++i)
{
    var pair:Pair = vec[i];
    if (pair.key == targetKey)
    {
        // ... do something with the key-value pair
        break;
    }
}

And that’ll work just fine with a small number of key-value pairs. I tried this out in the following environment:

  • Release version of Flash Player 12.0.0.77
  • 2.3 Ghz Intel Core i7-3615QM
  • Mac OS X 10.9.1
  • Google Chrome 33.0.1750.152
  • ASC 2.0.0 build 354071 (-debug=false -verbose-stacktraces=false -inline -optimize=true)

And here are the results I got for finding every key-value pair for various numbers of key-value pairs:

10 100 1000 10000 100000 1000000 10000000
0 0 8 568 n/a n/a n/a

Up to 100 key-value pairs it looks like the algorithm and data structure choice is blazing fast: zero milliseconds. But then it takes 8 milliseconds to do one thousand, 568 to do 10,000, and over the 15 second Flash Player timeout to do 100,000.

If you look at the loop, you’ll see that it’s about as optimal as you can get. The length field is cached and a for loop is used. The only optimization choice you really have is to improve the algorithm and/or data structure. Let’s start with the algorithm part and try a binary search instead of a linear search. This assumes that that keys are in sorted order, but allows us to change from O(N) to a O(log2N).

Here’s what a binary search looks like in AS3 for a Vector.<Pair>:

low = 0;
high = size-1;
while (low <= high)
{
	mid = low + (high-low)/2;
	pair = arr[mid];
	if (pair.key > targetKey)
	{
		high = mid-1;
	}
	else if (pair.key < targetKey)
	{
		low = mid+1;
	}
	else
	{
		// ... do something with the key-value pair
		break;
	}
}

Here are the results side-by-side with the linear search results:

10 100 1000 10000 100000 1000000 10000000
Array (linear) 0 0 8 568 n/a n/a n/a
Array (binary) 0 0 0 3 30 353 4065

Where the linear search was taking 568 milliseconds at 10,000 elements, the binary search is only taking 3. It can even handle the larger sizes without timing out the Flash Player. While it’s hard to say exactly how much faster that is, it’s certainly a huge speedup.

But what if you wanted to go faster? Again the binary search code itself is about as fast as it can get. Redundant reads into the Vector have been eliminated by the pair local variable, a while loop is used, and so forth. It’s now time for a new algorithm and a new data structure.

Next we switch to using a Dictionary as a good, built-in implementation of a hash table. Finding a key is now O(1) instead of O(log2N). The algorithm for finding the key is now a single line of code:

dict[targetKey];

Or if you just want to check if the key is contained:

targetKey in dict;

Let’s see how this compares with the previous two approaches:

10 100 1000 10000 100000 1000000 10000000
Array (linear) 0 0 8 568 n/a n/a n/a
Array (binary) 0 0 0 3 30 353 4065
Dictionary 0 0 0 0 1 19 186

At small numbers of key-value pairs, all the algorithms are about the same. At the 1,000 level the binary search really starts to beat the linear search. At the 10,000 level the Dictionary really starts to beat the binary search. It’s 3x faster there and only widens its lead as more key-value pairs are added. It’s about 20x faster after that point, too.

This point of this article is not to tell you to use Dictionary for your key-value maps because you probably already knew that. The point is to remind you that algorithm and data structure choice is usually far more important than optimizations at the code level. You should generally start by thinking about your choice of algorithm and data structure and then, if needed, optimize your code to make the best use of them. The above shows at least an 80x boost in performance and that is probably far too conservative. In short, it’s well worth your time.

Here’s the full performance test:

package
{
	import flash.display.*;
	import flash.utils.*;
	import flash.text.*;
 
	public class AlgorithmChoice extends Sprite
	{
		private var logger:TextField = new TextField();
 
		public function AlgorithmChoice()
		{
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
 
			logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(logger);
 
			init();
		}
 
		private function init(): void
		{
			var i:int;
			var j:int;
			var beforeTime:int;
			var afterTime:int;
			var dict:Dictionary;
			var vec:Vector.<Pair>;
			var pair:Pair;
			var found:Boolean;
			var low:int;
			var high:int;
			var mid:int;
			var sizes:Array = [10,100,1000,10000,100000,1000000,10000000];
			var results:Object = {
				"Array (linear)": [],
				"Array (binary)": [],
				"Dictionary": []
			};
 
			for each (var size:uint in sizes)
			{
				dict = new Dictionary();
				vec = new Vector.<Pair>();
				for (i = 0; i < size; ++i)
				{
					pair = new Pair();
					pair.key = i;
					pair.value = i*2;
					vec.push(pair);
 
					dict[i] = pair.value;
				}
 
				if (i <= 10000)
				{
					beforeTime = getTimer();
					for (i = 0; i < size; ++i)
					{
						found = false;
						for (j = 0; j < size; ++j)
						{
							pair = vec[j];
							if (pair.key == i)
							{
								found = true;
								break;
							}
						}
					}
					afterTime = getTimer();
					results["Array (linear)"].push(afterTime-beforeTime);
				}
				else
				{
					results["Array (linear)"].push("n/a");
				}
 
				beforeTime = getTimer();
				for (i = 0; i < size; ++i)
				{
					found = false;
					low = 0;
					high = size-1;
					while (low <= high)
					{
						mid = low + (high-low)/2;
						pair = vec[mid];
						if (pair.key > i)
						{
							high = mid-1;
						}
						else if (pair.key < i)
						{
							low = mid+1;
						}
						else
						{
							found = true;
							break;
						}
					}
				}
				afterTime = getTimer();
				results["Array (binary)"].push(afterTime-beforeTime);
 
				beforeTime = getTimer();
				for (i = 0; i < size; ++i)
				{
					found = false;
					found = i in dict;
				}
				afterTime = getTimer();
				results["Dictionary"].push(afterTime-beforeTime);
			}
 
			logger.appendText("," + sizes.join(",") + "\n");
			for (var name:String in results)
			{
				logger.appendText(name + "," + results[name].join(",") + "\n");
			}
		}
	}
}
internal class Pair
{
	var key:*;
	var value:*;
}

Run the test app

Spot a bug? Have a question or suggestion? Post a comment!