Sorted Array
AS3 gives you arrays and a way to sort them, but no easy way to keep them sorted as you add elements to them. Today’s article discusses an easy way to do just that: build a sorted array and keep it sorted. As a bonus, the array provides very fast searching for the elements it contains.
When you have a sorted array you can use a binary search to quickly find elements in the array. A binary search will run in, at the worst case, log(N)
operations, where N
is the size of the array. With an unsorted array, the search will take N
operations. Here’s a comparison:
Size | Unsorted Search Operations | Sorted Search Operations |
---|---|---|
10 | 10 | 3 |
100 | 100 | 7 |
1000 | 1000 | 10 |
10000 | 10000 | 13 |
100000 | 100000 | 17 |
1000000 | 1000000 | 20 |
The difference between them is huge even at small sizes like 100 and only grows from there. With that in mind, you’ll understand the indexOf
function of the SortedArray
class here:
package { public class SortedArray { public var array:Array = []; public function add(element:*): int { var leftIndex:int; var rightIndex:int = array.length-1; var middleIndex:int; var middleElement:*; while (leftIndex <= rightIndex) { middleIndex = (rightIndex+leftIndex) / 2; middleElement = array[middleIndex]; if (middleElement < element) { leftIndex = middleIndex + 1; } else if (middleElement > element) { rightIndex = middleIndex - 1; } else { array.splice(middleIndex, 0, element); return middleIndex; } } array.splice(leftIndex, 0, element); return leftIndex; } public function indexOf(element:*): int { var leftIndex:int; var rightIndex:int = array.length-1; var middleIndex:int; var middleElement:*; while (leftIndex <= rightIndex) { middleIndex = (rightIndex+leftIndex) / 2; middleElement = array[middleIndex]; if (middleElement < element) { leftIndex = middleIndex + 1; } else if (middleElement > element) { rightIndex = middleIndex - 1; } else { return middleIndex; } } return -1; } } }
But what about the add
function? Well, you may have noticed that it looks awfully similar to the indexOf
function. That’s because it is the same exact binary search algorithm as in indexOf
, but tweaked to insert instead of getting an index. You see, one handy side-effect of the binary search algorithm is that it narrows in on the index of the element being searched for. When we find the element, it’s simple to add a duplicate element right where it is. When we don’t find it, we know where it should have been and simply add it there.
So, how do you use the SortedArray
class instead of a regular Array
? For the most part you simply use its array
field like you always have used an Array
: loop over it, remove elements from it, etc. Just make sure you don’t add elements to it directly via a function like push
or unshift
. Also, don’t change its elements via the index []
operator. If you do either of these, the array will no longer be sorted and both the indexOf
and add
functions will break.
I could have designed the class to keep the array private and be a lot safer to use, but I intentionally chose not to for the sake of performance. If the array was private, it’d be much more expensive to iterate over it or call any of its functions. It should be pretty easy to make the array private and wrap its API If you’d like a safer version of the class.
Now, let’s put theory to the test and see how SortedArray
stacks up against a regular Array
when it comes to the two features of the class: adding and searching for elements.
package { import flash.display.Sprite; import flash.display.StageAlign; import flash.display.StageScaleMode; import flash.text.TextField; import flash.text.TextFieldAutoSize; import flash.utils.getTimer; public class SortedArrayTest extends Sprite { private var logger:TextField = new TextField(); private function row(...cols): void { logger.appendText(cols.join(",")+"\n"); } public function SortedArrayTest() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; logger.autoSize = TextFieldAutoSize.LEFT; addChild(logger); var beforeTime:int; var afterTime:int; var unsortedAddTime:int; var sortedAddTime:int; var unsortedIndexOfTime:int; var sortedIndexOfTime:int; var i:int; var REPS:int = 1000; var SIZES:Array = [10,100,1000,10000]; row( "Size", "Unsorted Add Time", "Sorted Add Time", "Unsorted IndexOf Time", "Sorted IndexOf Time" ); for each (var size:int in SIZES) { var values:Array = []; for (i = 0; i < size; ++i) { values.push(i); } values.sort(function(a:int, b:int):int { return Math.random()<0.5?-1:1; }); var unsortedArray:Array = []; beforeTime = getTimer(); for (i = 0; i < size; ++i) { unsortedArray.push(values[i]); } afterTime = getTimer(); unsortedAddTime = (afterTime-beforeTime); var sortedArray:SortedArray = new SortedArray(); beforeTime = getTimer(); for (i = 0; i < size; ++i) { sortedArray.add(values[i]); } afterTime = getTimer(); sortedAddTime = (afterTime-beforeTime); beforeTime = getTimer(); for (i = 0; i < size; ++i) { unsortedArray.indexOf(values[i]); } afterTime = getTimer(); unsortedIndexOfTime = (afterTime-beforeTime); beforeTime = getTimer(); for (i = 0; i < size; ++i) { sortedArray.indexOf(values[i]); } afterTime = getTimer(); sortedIndexOfTime = (afterTime-beforeTime); row( size, unsortedAddTime, sortedAddTime, unsortedIndexOfTime, sortedIndexOfTime ); } } } }
I ran this performance test in the following environment:
- Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.2.202.235
- 2.4 Ghz Intel Core i5
- Mac OS X 10.7.3
And here are the results I got:
Size | Unsorted Add Time | Sorted Add Time | Unsorted IndexOf Time | Sorted IndexOf Time |
---|---|---|---|---|
10 | 0 | 0 | 0 | 0 |
100 | 0 | 0 | 1 | 0 |
1000 | 0 | 3 | 4 | 1 |
10000 | 1 | 64 | 482 | 3 |
These results make it clear that the theory is working out in reality. You can see the unsorted search time grow linearly but the sorted search time grow much slower- logarithmically. Even at only 1000 elements in the array, the sorted version is four times faster than the unsorted version. At 10,000 elements, the gap widens to 160x.
As for the time to add elements to the arrays, the unsorted version is clearly quicker since it does not attempt to find the correct index to insert at in order to maintain a sorted array. So when choosing an approach with your own code, you’ll need to balance this extra time to insert against a purely unsorted approach or an approach where you add a lot of elements unsorted and then sort the array. Since the array is a public field of SortedArray
, it’s easy to do this:
var sa:SortedArray = new SortedArray(); for (var i:int; i < 10000; ++i) { sa.array.push(int(Math.random())); } sa.array.sort(Array.NUMERIC);
I hope you find the above class useful the next time your programming task calls for a sorted array. If you’ve spotted a bug or have a suggestion, please feel free to comment!
#1 by Cor on May 7th, 2012 ·
Notice the sort is not correct:
0,1,10,100,101,102,103,104,105,106,107,108,109,11,110,111,112,113,…
#2 by Trope on May 7th, 2012 ·
Default sort function for array is string comparison. Therefore:
or even better
which is even faster.
#3 by jackson on May 7th, 2012 ·
Thanks for pointing this out. I’ve updated the suggestion code in the article to use
Array.NUMERIC
.#4 by Henke37 on May 7th, 2012 ·
I can’t help but fell that the shuffling shouldn’t be done by abusing the comparator like that. Sorting algorithms tend to depend on it being deterministic.
As for maintaining a sorted list, using an array is a questionable solution. You should invest in a smarter data structure that makes keeping it sorted much cheaper than the splice call. Splice is moderately expensive.
#5 by jackson on May 7th, 2012 ·
There are probably better ways to do a random shuffle, but I just wanted something to quickly jumble the elements around so they weren’t in order so I’d get a decent average-case for adding elements rather than an extreme (best- or worst-case).
Splicing into an array can definitely be expensive: up to
N
operations if splicing into the beginning. However,Array
offers no other alternative. I could have used a linked list instead, but then I’d no longer have a sorted array and would lose some really important features like lightning fast random access via the[]
operator. I’d also have to write a lot more code to flesh out an API comparable to theArray
class and it would probably be slower most of the time (see my series on linked lists). Still, I’d love to hear any suggestions you have on how to improve on what’s in the article.#6 by Erik on May 7th, 2012 ·
Would you recommend an equivalent wrapperclass for Vectors?
#7 by jackson on May 7th, 2012 ·
Since
Vector
is typed and due to the limitation that you can’t create typed/generic/templated classes in AS3, you would need to create one for every type ofVector
you wanted a sorted version of in order to get any speed out of it. For example, you’d need aSortedVectorInt
forVector.<int>
andSortedVectorString
forVector.<String>
. This would be OK if you don’t have many types of vectors you’d like sorted versions of, but not as suitable for a general-purpose code library.#8 by Erik on May 8th, 2012 ·
Thanks for the reply. I have this post flagged in some old code of mine which is relevant:
http://stackoverflow.com/questions/3607593/is-it-faster-to-add-to-a-collection-then-sort-it-or-add-to-a-sorted-collection
#9 by skyboy on May 8th, 2012 ·
You can use an Object-typed variable and get nearly equal speeds to a Vector-typed variable. The constructor property can also be used to ensure it’s a Vector that isn’t a numeric-typed Vector (
constructor
is always Vector.<*> or one of .<int|uint|Number>).#10 by skyboy on May 8th, 2012 ·
Gah. One of int|uint|Number.
#11 by ben w on May 8th, 2012 ·
if (middleElement < element) …
if you know the elements are going to be numeric you might get even more gains by replacing the * with a Number, would also probably save some memory too.
Be interesting to see what effect that would have on performance, so if you knew you were working with numbers you could just modify the code to suit.
#12 by jackson on May 8th, 2012 ·
You can actually use
SortedArray
with anything that the<
and>
operators work on. Here’s an example withString
:Plus, if the list held
Number
and you wanted to use it withint
oruint
, you’d end up doing a lot of conversion between the types. It would also use twice as much memory as necessary sinceNumber
is 8 bytes andint
/uint
is 4.#13 by skyboy on May 8th, 2012 ·
Actually, the VM allocates Atoms (a word-aligned pointer with additional data stored in the low 3 bits), which are always 8 bytes on 64-bit machines (I had to do a bit of research to find out exactly how they were declaring it and what the end result was), to store ints/uints under 30 bits (even on 64 bit machines); once a value requires the 30th bit or higher, a Number is allocated to hold the value.
Also, due to the bug with */Number, you may be allocating up to 4x more memory for numeric types that won’t be collected immediately.
However, it may be possible to store the type as Object instead of * (but the value assigned to the Object-variable must be * if it is to be numeric) and avoid potential allocation killers while doing the comparisons. Doing so will lose the ability to distinguish between undefined and null while sorting, but you can increase performance by special-casing this (currently, undefined and null are converted to Strings before comparing).
It would be nice if the subtle performance killers didn’t exist and no one needed to know the exact operations of the VM to avoid the pitfalls and potential pitfalls.
#14 by Joshua on December 6th, 2012 ·
This seems pretty amazing. Would there be a means of extending this to multi-dimensional arrays?
Setting an array as primary, another as secondary and to have things sort?
Is it also possible to do an IndexOf a multi-dimensional array where you need to ensure a 2-3 entries match?
Thanks,
#15 by jackson on December 7th, 2012 ·
Yes, I think it could be adapted to that purpose. Perhaps a simpler adaptation would be to pass in a compare
Function
like you do withVector.sort
and then use that rather than the<
and>
operators to determine the sort order. This would allow you to make aSortedArray
of anything, just like withVector
.#16 by Joshua on December 7th, 2012 ·
I’ll give that a try. I have a few 2D and 3D arrays that are getting out of hand in size and this could help quite a bit.
Thanks
#17 by benjamin guihaire on August 13th, 2013 ·
It would be nice to redo all those tests for Vector. instead of Array.