Loop Speed Redux
AS3 has three kinds of loops—for, for-in, and for-each—but which is fastest? I attempted to answer that question about three years ago, but the article is in dire need of a followup as many version of Flash Player have been released since then and the question is core to our everyday lives as AS3 programmers. So which type of loop is fastest in 2012?
I’ve updated the original test to output better results, but the core of it (even the number of iterations per loop type) hasn’t changed at all.
package { import flash.display.*; import flash.utils.*; import flash.text.*; public class LoopSpeed extends Sprite { private var __logger:TextField = new TextField(); private function row(cols:Array): void { __logger.appendText(cols.join(",")+"\n"); } public function LoopSpeed() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); const SIZE:uint = 6000; const REPS:uint = 2000; var i:int; var s:String; var a:int; var rep:int; var before:uint; var after:uint; var cols:Array; // Containers to loop over var arr:Array = new Array(SIZE); var vecFixed:Vector.<int> = new Vector.<int>(SIZE, true); var vecVariable:Vector.<int> = new Vector.<int>(SIZE, false); var bmdAlpha:BitmapData = new BitmapData(SIZE, 1, true); var bmdNoAlpha:BitmapData = new BitmapData(SIZE, 1, false); var obj:Object = {}; var dictStrong:Dictionary = new Dictionary(false); var dictWeak:Dictionary = new Dictionary(true); var ba:ByteArray = new ByteArray(); // Initialize all containers for (i = 0; i < SIZE; ++i) { arr[i] = int(Math.random()*1000); } for (i = 0; i < SIZE; ++i) { vecFixed[i] = arr[i]; } for (i = 0; i < SIZE; ++i) { vecVariable[i] = arr[i]; } for (i = 0; i < SIZE; ++i) { bmdAlpha.setPixel32(i, 0, arr[i]); } for (i = 0; i < SIZE; ++i) { bmdNoAlpha.setPixel32(i, 0, arr[i]); } for (i = 0; i < SIZE; ++i) { obj[i] = i; } for (i = 0; i < SIZE; ++i) { dictStrong[i] = i; } for (i = 0; i < SIZE; ++i) { dictWeak[i] = i; } ba.length = SIZE; for (i = 0; i < SIZE; ++i) { ba[i] = i; } row([ "Loop Type", "Array", "Fixed Vector", "Variable Vector", "BMD w/ alpha getPixel32", "BMD w/o alpha getPixel32", "BMD w/ alpha getPixel", "BMD w/o alpha getPixel", "Object", "Dictionary (strong keys)", "Dictionary (weak keys)", "ByteArray" ]); cols = ["For-Each"]; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in arr) { a = i; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in vecFixed) { a = i; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in vecVariable) { a = i; } } after = getTimer(); cols.push(after-before); cols.push("n/a", "n/a", "n/a", "n/a"); // no BitmapData tests possible before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in obj) { a = i; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in dictStrong) { a = i; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for each (i in dictWeak) { a = i; } } after = getTimer(); cols.push(after-before); cols.push("n/a"); // no ByteArray test possible row(cols); cols = ["For-In"]; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in arr) { a = arr[s]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in vecFixed) { a = vecFixed[s]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in vecVariable) { a = vecVariable[s]; } } after = getTimer(); cols.push(after-before); cols.push("n/a", "n/a", "n/a", "n/a"); // no BitmapData tests possible before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in obj) { a = obj[s]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in dictStrong) { a = dictStrong[s]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (s in dictWeak) { a = dictWeak[s]; } } after = getTimer(); cols.push(after-before); cols.push("n/a"); // no ByteArray test possible row(cols); cols = ["For"]; before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = arr[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = vecFixed[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = vecVariable[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdAlpha.getPixel32(i, 0); } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdNoAlpha.getPixel32(i, 0); } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdAlpha.getPixel(i, 0); } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = bmdNoAlpha.getPixel(i, 0); } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = obj[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = dictStrong[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = dictWeak[i]; } } after = getTimer(); cols.push(after-before); before = getTimer(); for (rep = 0; rep < REPS; ++rep) { for (i = 0; i < SIZE; ++i) { a = ba[i]; } } after = getTimer(); cols.push(after-before); row(cols); } } }
I ran this test app in the following environment:
- Flex SDK (MXMLC) 4.6.0.23201, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 11.4.402.265
- 2.3 Ghz Intel Core i7
- Mac OS X 10.8.2
And here are the results I got:
| Loop Type | For-Each | For-In | For | 
|---|---|---|---|
| Array | 244 | 4552 | 77 | 
| Fixed Vector | 287 | 4629 | 31 | 
| Variable Vector | 288 | 4616 | 31 | 
| BMD w/ alpha getPixel32 | n/a | n/a | 145 | 
| BMD w/o alpha getPixel32 | n/a | n/a | 122 | 
| BMD w/ alpha getPixel | n/a | n/a | 164 | 
| BMD w/o alpha getPixel | n/a | n/a | 125 | 
| Object | 688 | 4912 | 366 | 
| Dictionary (strong keys) | 733 | 5124 | 367 | 
| Dictionary (weak keys) | 816 | 5197 | 375 | 
| ByteArray | n/a | n/a | 137 | 

The formatting has certainly gotten better since 2009. We can now see clearly that there is a defined order of speed. The for-in loop is the slowest, for-each faster than that, and in all cases for is the fastest of all. It’s also the only one to provide access to BitmapData and ByteArray collections, which makes it the most flexible. It’s some extra typing and that may interfere with readability, but when speed counts you should opt for good old for.
Spot a bug? Have a suggestion? Post a comment!
#1 by Maxime on September 24th, 2012 ·
Hi,
Why are you doing an assignation (“a = i”) in your for each loops ? We usually use a for each to avoid the explicit assignation in the loop, i think that in order to be accurate, it should no have any code inside your for each block, no ?
#2 by jackson on September 24th, 2012 ·
Good point, as that does unfairly burden the
for-eachloop. The code was there in an attempt to ensure that the loop was not simply optimized out since it does nothing. However, I’ve removed thea = ifrom all of thefor-eachloops and come up with the following results (same test environment):Each loop is a little bit faster, but by no means does it alter the overall ranking of loop types. The
for-eachloop is still slower than theforloop in all cases.Thanks again for pointing this out. It does help paint a more accurate picture.
#3 by Maxime on September 24th, 2012 ·
Thank you for the answer, but i try to implement my own test and my result are very differents from yours :
I’ll totally understand if you don’t have time for that, but if you do, could you look at my code and help me figure out what is the cause of thoses differences ?
http://www.dranem.net/upload/TestPerformanceLite.as
#4 by jackson on September 24th, 2012 ·
I ran your code on Windows 7 running a quad core 2.8 GHz Xeon W3530 and got these results:
This seems pretty much in line with the results from the article regarding “Variable Vector”. In the article I showed a 9.29x slowdown for using
for-eachcompared tofor. These results show a 15.62x slowdown, but it’s a different environment (CPU, OS, etc.). The numbers may not match up perfectly, but they shouldn’t be expected to. The principle holds though:for-eachis much slower thanfor.#5 by Tyler on September 24th, 2012 ·
I’m seeing results close to jackson’s. If I switch over to running in debug mode, I see results close to Maxime’s.
#6 by jackson on September 24th, 2012 ·
That could definitely be it. Debug players have some very strange performance characteristics. Since end users generally don’t use them, I never test on debug players. Always make sure to test on release players.
#7 by Maxime on September 25th, 2012 ·
Thank you, that was it indeed !
#8 by Tyler on September 24th, 2012 ·
I had the same thought about the for each loop. It would be nice to see a separate test case for the for each that doesn’t do the additional assignment, since that initial assignment is why we typically like to use it.
#9 by pixelbender on September 24th, 2012 ·
for each (i in some) {
a = i; // this step is unnecessary, we can use “i” instead
}
#10 by ben w on September 25th, 2012 ·
changed it to 1000 REPS so as not to time-out on my machine:
old compiler (4.6):
[first run]
Loop Type,Array,Fixed Vector,Variable Vector,BMD w/ alpha getPixel32,BMD w/o alpha getPixel32,BMD w/ alpha getPixel,BMD w/o alpha getPixel,Object,Dictionary (strong keys),Dictionary (weak keys),ByteArray
For-Each,173,194,206,n/a,n/a,n/a,n/a,409,416,512,n/a
For-In,3555,3539,3525,n/a,n/a,n/a,n/a,3675,3795,3936,n/a
For,52,33,31,131,130,133,134,235,244,244,137
[second run]
Loop Type,Array,Fixed Vector,Variable Vector,BMD w/ alpha getPixel32,BMD w/o alpha getPixel32,BMD w/ alpha getPixel,BMD w/o alpha getPixel,Object,Dictionary (strong keys),Dictionary (weak keys),ByteArray
For-Each,167,196,199,n/a,n/a,n/a,n/a,402,420,496,n/a
For-In,3566,3527,3541,n/a,n/a,n/a,n/a,3680,3826,3913,n/a
For,52,33,32,132,131,130,123,236,246,252,139
asc2.0:
[first run]
Loop Type,Array,Fixed Vector,Variable Vector,BMD w/ alpha getPixel32,BMD w/o alpha getPixel32,BMD w/ alpha getPixel,BMD w/o alpha getPixel,Object,Dictionary (strong keys),Dictionary (weak keys),ByteArray
For-Each,195,217,227,n/a,n/a,n/a,n/a,453,447,515,n/a
For-In,3639,3632,3661,n/a,n/a,n/a,n/a,3790,3981,4106,n/a
For,40,21,21,115,105,114,115,228,224,234,120
[second run]
Loop Type,Array,Fixed Vector,Variable Vector,BMD w/ alpha getPixel32,BMD w/o alpha getPixel32,BMD w/ alpha getPixel,BMD w/o alpha getPixel,Object,Dictionary (strong keys),Dictionary (weak keys),ByteArray
For-Each,200,225,219,n/a,n/a,n/a,n/a,405,447,521,n/a
For-In,3580,3570,3542,n/a,n/a,n/a,n/a,3664,3859,3953,n/a
For,40,21,21,123,101,124,104,215,225,228,120
Both run in flash player 11.4.402.265
From a quick glance it would appear that for loops are indeed faster (by a fairly decent margin too for Array and Vector access) but for each has slowed down?!?!
Interesting, hopefully one day we will get an update that is ONLY improvements.
so others can check and post results back here are the two swfs to test:
http://bwhiting.co.uk/flash/loopSpeed/asc1/
http://bwhiting.co.uk/flash/loopSpeed/asc2/
#11 by Tyler on September 26th, 2012 ·
I see the same trends in my results.
Even though the new compiler is still in development, I think it’s good to run these comparisons. The Adobe team has in my experience been receptive to receiving feedback about performance differences between development builds (and why not – if they’re working on those things, now is the best time for the issues to come up). Also, Jackson’s performance tests often get into less commonly used aspects of the language, meaning they can help turn up bugs.
#12 by jackson on September 26th, 2012 ·
I definitely agree that many of my tests would be useful for finding bugs and comparing performance in ASC 2.0. It may be a good idea for the ASC 2.0 team to go through this site running many of the tests since all of the code is available in each article. However, I’m not sure how interesting it would be for readers of this site. That said, I will definitely be doing a big re-test when ASC 2.0 comes out of beta and AS3 developers start using it for real projects.
#13 by Kyle Murray / Krilnon on September 28th, 2012 ·
I heard people working on performance talking about this site while I was an intern there, so I wouldn’t be too surprised to learn that they were making sure that their tests cover what you cover here. (If not just taking and running your exact code.)
#14 by Edmundo on September 26th, 2012 ·
What affects the efficiency of the loops compared to your results in 2009 the most? the fact that the game is now compiled with sdk 4.6 or that it runs in flash player 11? Would a game built with sdk 4.0 perform equally well under flash player 11?
#15 by jackson on September 26th, 2012 ·
The results are roughly the same as in the old article, but the numbers are much lower mainly due to the much faster computer used for testing. Flash Player 11.4 and Flex SDK 4.6 don’t seem to have had much effect, which is the main conclusion of the article.