For Vs. While Revisited
I wrote an article last November titled For Vs. While that needs a bit of updating an expanding today. While I updated the performance figures in my series on Flash 10.1 performance, I continue to get questions in the comments section of the original article that explore new areas. So today we’ll look at the ubiquitous for
and while
loops a little more.
Who doesn’t use the for
and while
loops? Further, who hasn’t heard some wild rumor or speculation that one is faster than the other? Ever see code that loops backward for apparently no reason other than performance? Firstly, here are the performance results from the Flash 10.1 performance series—which still apply today—that answer these questions:
Function | Time |
---|---|
for (forward) | 2249 |
for (backward) | 2018 |
while (forward) | 2245 |
while (backward) | 2020 |
The above clearly show two important facts:
- Forward loops are slower than backward loops by about 10%
for
loops exactly as fast aswhile
loops
Now let’s look at some sample functions and their bytecode (produced by MXMLC 4.1) to get an idea as to why the performance is as it is. Firstly, the AS3 functions:
private function forForward(): void { var i:int; for (i = 0; i < 100; ++i) {} } private function forBackward(): void { var i:int; for (i = 100; i--; ) {} } private function whileForward(): void { var i:int; while (i < 100) {++i;} } private function whileBackward(): void { var i:int = 100; while (i--) {} }
And the bytecode, annotated by me:
function private::forForward():void /* disp_id 0*/ { // local_count=2 max_scope=1 max_stack=2 code_len=23 0 getlocal0 // get "this" 1 pushscope // add this function's scope 2 pushbyte 0 // push literal 0 4 setlocal1 // set local "i" to 0 (default initialization) 5 pushbyte 0 // push literal 0 7 setlocal1 // set local "i" to 0 (loop initialization) 8 jump L1 // go to block L1 below L2: 12 label // no-op (loop body) 13 inclocal_i 1 // i++ L1: 15 getlocal1 // get local "i" 16 pushbyte 100 // push literal 100 18 iflt L2 // if (i < 100) go to L2 block above 22 returnvoid // return; } function private::forBackward():void /* disp_id 0*/ { // local_count=2 max_scope=1 max_stack=2 code_len=22 0 getlocal0 // get "this" 1 pushscope // add this function's scope 2 pushbyte 0 // push literal 0 4 setlocal1 // set local "i" to 0 (default initialization) 5 pushbyte 100 // push literal 100 7 setlocal1 // set local "i" to 100 (loop initialization) 8 jump L1 // go to L1 below L2: 12 label // no-op (loop body) L1: 13 getlocal1 // get local "i" 14 dup // duplicate local "i" 15 decrement_i // i-1 16 setlocal1 // i = i-1 17 iftrue L2 // if (i) go to L2 block above 21 returnvoid // return; } function private::whileForward():void /* disp_id 0*/ { // local_count=2 max_scope=1 max_stack=2 code_len=20 0 getlocal0 // get "this" 1 pushscope // add this function's scope 2 pushbyte 0 // push literal 0 4 setlocal1 // get local "i" 5 jump L1 // go to L1 block below L2: 9 label // no-op (loop body) 10 inclocal_i 1 // i++ L1: 12 getlocal1 // get local "i" 13 pushbyte 100 // push literal 100 15 iflt L2 // if (i < 100) go to L2 block above 19 returnvoid // return; } function private::whileBackward():void /* disp_id 0*/ { // local_count=2 max_scope=1 max_stack=2 code_len=19 0 getlocal0 // get "this" 1 pushscope // add this function's scope 2 pushbyte 100 // push literal 100 4 setlocal1 // set "i" to 100 5 jump L1 // go to L1 block below L2: 9 label // no-op (loop body) L1: 10 getlocal1 // get local "i" 11 dup // duplicate local "i" 12 decrement_i // i-1 13 setlocal1 // i = i-1 14 iftrue L2 // if (i) go to L2 block above 18 returnvoid // return }
If you were expecting the bytecode to be the same for for
and while
because for
is supposed to be syntax sugar for while
, the above bytecode should have come as a surprise. Instead of two pairs of identical bytecode, we see bytecode with 18, 19, 21, and 22 lines. The forward loops are also 1 line longer than the backward loops. This ~5% increase in lines correlates with a ~10% slowdown. Looking at the bytecode differences between all of these, it’s not obvious why there would be such a speed benefit to backward loops. Take, for example, the for
loops: the forward loop looks simpler and more straightforward than the backward version. All of this points to the bytecode not being the reason for the slowdown. Remember that the next step after the bytecode is for it to be just-in-time compiled to native code. The JIT is, somewhere in this process, generating more efficient native code for the backward version than for the forward version.
While it’s good to know some of these details so we can debunk statements like “for
is just syntax sugar for a while
loop”, the end result is the same, as shown above. However, next time you see a backward loop you should keep in mind that not only is the bytecode substantially different from its forward counterpart, but the end result is actually a 10% performance increase.
#1 by skyboy on December 27th, 2010 ·
For is just syntax sugar for while loops, the difference between the sizes of each type of loop is because you’ve got more code in the for loops.
To make them the same, you would need to use one of these two:
The backwards loops are faster because you have one less comparison per loop:
The
iflt
bytecode is just so the interpreter runs slightly faster, and the output SWF is smaller; it’s basically syntax sugar for:Backwards loops also have the advantage of the iterator variable not needing to be modified when you iterate over an Array to splice out certain elements.
#2 by jackson on December 27th, 2010 ·
Good point about the extra initialization of the loop iterator by what I called the “default initialization” in my annotations above. If the
iflt
really does boil down to aniftrue
, it would seem to be a more expensive branching operation. I sure hope the JIT can generate better native code than it would from your example above. Perhaps though, given the 10% performance difference, it is not.#3 by skyboy on December 27th, 2010 ·
I don’t believe any hardware has native “iflt,” “ifgt” or other similar circuits; Only the if circuit.
#4 by jackson on December 27th, 2010 ·
I can’t speak to specific chips’ hardware implementation, but x86 assembly has plenty of instructions to jump on conditionals like that. They’re even named similarly, just using “jump” instead of “if”. So
iflt
would bejl
(jump less). Here’s some documentation on them.#5 by skyboy on December 27th, 2010 ·
That’d be why I didn’t see them when I glanced over a reference. The only way to see if nJIT is actually making use of these instructions is to dump its output to a file instead of running the instructions. A task for another time, far too much work to get that up and running; For me.
#6 by as3isolib on December 28th, 2010 ·
With the new player update (10.1) has the pre-increment/decrement and post-increment/decrement times changed? I noted in your previously mentioned article that Robert Penner mentioned there was no difference at the time of that posting.
I have tried testing this before as well as the for vs while on my MBP and didn’t seem to see any performance difference in using post, pre, for, while and the various combinations of all these parameters.
Also in my testing I had reset all tests to use the same syntax:
#7 by jackson on December 29th, 2010 ·
Check out my original article and the Flash Player 10.1 followup article. In short, there’s no difference between pre- and post-, increment and decrement, and just adding/subtracting one.
#8 by as3isolib on December 29th, 2010 ·
Ah sorry I musta missed or forgot those two posts. Good stuff. I think post increment/decrement is much easier to read so I am glad to be able to stick with it.