JacksonDunstan.com

I wrote an article last November titled For Vs. While that needs a bit of updating an expanding today. While I updated the performance figures in my series on Flash 10.1 performance, I continue to get questions in the comments section of the original article that explore new areas. So today we’ll look at the ubiquitous for and while loops a little more.

Who doesn’t use the for and while loops? Further, who hasn’t heard some wild rumor or speculation that one is faster than the other? Ever see code that loops backward for apparently no reason other than performance? Firstly, here are the performance results from the Flash 10.1 performance series—which still apply today—that answer these questions:

Function	Time
for (forward)	2249
for (backward)	2018
while (forward)	2245
while (backward)	2020

The above clearly show two important facts:

Forward loops are slower than backward loops by about 10%
for loops exactly as fast as while loops

Now let’s look at some sample functions and their bytecode (produced by MXMLC 4.1) to get an idea as to why the performance is as it is. Firstly, the AS3 functions:

private function forForward(): void
{
	var i:int;
	for (i = 0; i < 100; ++i) {}
}
 
private function forBackward(): void
{
	var i:int;
	for (i = 100; i--; ) {}
}
 
private function whileForward(): void
{
	var i:int;
	while (i < 100) {++i;}
}
 
private function whileBackward(): void
{
	var i:int = 100;
	while (i--) {}
}

And the bytecode, annotated by me:

  function private::forForward():void    /* disp_id 0*/
  {
    // local_count=2 max_scope=1 max_stack=2 code_len=23
    0       getlocal0             // get "this"
    1       pushscope             // add this function's scope
    2       pushbyte          0   // push literal 0
    4       setlocal1             // set local "i" to 0 (default initialization)
    5       pushbyte          0   // push literal 0
    7       setlocal1             // set local "i" to 0 (loop initialization)
    8       jump              L1  // go to block L1 below
 
 
    L2: 
    12      label                 // no-op (loop body)
    13      inclocal_i        1   // i++
 
    L1: 
    15      getlocal1             // get local "i"
    16      pushbyte          100 // push literal 100
    18      iflt              L2  // if (i < 100) go to L2 block above
 
    22      returnvoid            // return;
  }
 
  function private::forBackward():void    /* disp_id 0*/
  {
    // local_count=2 max_scope=1 max_stack=2 code_len=22
    0       getlocal0             // get "this"
    1       pushscope             // add this function's scope
    2       pushbyte          0   // push literal 0
    4       setlocal1             // set local "i" to 0 (default initialization)
    5       pushbyte          100 // push literal 100
    7       setlocal1             // set local "i" to 100 (loop initialization)
    8       jump              L1  // go to L1 below
 
 
    L2: 
    12      label                 // no-op (loop body)
 
    L1: 
    13      getlocal1             // get local "i"
    14      dup                   // duplicate local "i"
    15      decrement_i           // i-1
    16      setlocal1             // i = i-1
    17      iftrue            L2  // if (i) go to L2 block above
 
    21      returnvoid            // return;
  }
 
  function private::whileForward():void    /* disp_id 0*/
  {
    // local_count=2 max_scope=1 max_stack=2 code_len=20
    0       getlocal0             // get "this"
    1       pushscope             // add this function's scope
    2       pushbyte          0   // push literal 0
    4       setlocal1             // get local "i"
    5       jump              L1  // go to L1 block below
 
 
    L2: 
    9       label                 // no-op (loop body)
    10      inclocal_i        1   // i++
 
    L1: 
    12      getlocal1             // get local "i"
    13      pushbyte          100 // push literal 100
    15      iflt              L2  // if (i < 100) go to L2 block above
 
    19      returnvoid            // return;
  }
 
function private::whileBackward():void    /* disp_id 0*/
  {
    // local_count=2 max_scope=1 max_stack=2 code_len=19
    0       getlocal0             // get "this"
    1       pushscope             // add this function's scope
    2       pushbyte          100 // push literal 100
    4       setlocal1             // set "i" to 100
    5       jump              L1  // go to L1 block below
 
 
    L2: 
    9       label                 // no-op (loop body)
 
    L1: 
    10      getlocal1             // get local "i"
    11      dup                   // duplicate local "i"
    12      decrement_i           // i-1
    13      setlocal1             // i = i-1
    14      iftrue            L2  // if (i) go to L2 block above
 
    18      returnvoid            // return
  }

If you were expecting the bytecode to be the same for for and while because for is supposed to be syntax sugar for while, the above bytecode should have come as a surprise. Instead of two pairs of identical bytecode, we see bytecode with 18, 19, 21, and 22 lines. The forward loops are also 1 line longer than the backward loops. This ~5% increase in lines correlates with a ~10% slowdown. Looking at the bytecode differences between all of these, it’s not obvious why there would be such a speed benefit to backward loops. Take, for example, the for loops: the forward loop looks simpler and more straightforward than the backward version. All of this points to the bytecode not being the reason for the slowdown. Remember that the next step after the bytecode is for it to be just-in-time compiled to native code. The JIT is, somewhere in this process, generating more efficient native code for the backward version than for the forward version.

While it’s good to know some of these details so we can debunk statements like “for is just syntax sugar for a while loop”, the end result is the same, as shown above. However, next time you see a backward loop you should keep in mind that not only is the bytecode substantially different from its forward counterpart, but the end result is actually a 10% performance increase.

#1 by skyboy on December 27th, 2010 · Reply

For is just syntax sugar for while loops, the difference between the sizes of each type of loop is because you’ve got more code in the for loops.
To make them the same, you would need to use one of these two:

private function forForward():void {
    var i:int;
    for (i = 0; i &lt; 100; ++i) {}
}
 
private function forBackward():void {
    var i:int;
    for (i = 100; i--; ) {}
}
 
private function whileForward():void {
    var i:int;
    i = 0;
    while (i &lt; 100) {++i;}
}
 
private function whileBackward():void {
    var i:int;
    i = 100;
    while (i--) {}
}

private function forForward():void {
    for (var i:int; i &lt; 100; ++i) {}
}
 
private function forBackward():void {
    for (var i:int = 100; i--; ) {}
}
 
private function whileForward():void {
    var i:int;
    while (i &lt; 100) {++i;}
}
 
private function whileBackward():void {
    var i:int = 100;
    while (i--) {}
}

The backwards loops are faster because you have one less comparison per loop:
The iflt bytecode is just so the interpreter runs slightly faster, and the output SWF is smaller; it’s basically syntax sugar for:

getlocal1
pushbyte 100
lessthan
iftrue #jump

Backwards loops also have the advantage of the iterator variable not needing to be modified when you iterate over an Array to splice out certain elements.

#2 by jackson on December 27th, 2010 · Reply

Good point about the extra initialization of the loop iterator by what I called the “default initialization” in my annotations above. If the iflt really does boil down to an iftrue, it would seem to be a more expensive branching operation. I sure hope the JIT can generate better native code than it would from your example above. Perhaps though, given the 10% performance difference, it is not.
- #3 by skyboy on December 27th, 2010 · Reply
  
  I don’t believe any hardware has native “iflt,” “ifgt” or other similar circuits; Only the if circuit.
  - #4 by jackson on December 27th, 2010 · Reply
    
    I can’t speak to specific chips’ hardware implementation, but x86 assembly has plenty of instructions to jump on conditionals like that. They’re even named similarly, just using “jump” instead of “if”. So iflt would be jl (jump less). Here’s some documentation on them.
    - #5 by skyboy on December 27th, 2010 · Reply
      
      That’d be why I didn’t see them when I glanced over a reference. The only way to see if nJIT is actually making use of these instructions is to dump its output to a file instead of running the instructions. A task for another time, far too much work to get that up and running; For me.

#6 by as3isolib on December 28th, 2010 · Reply

With the new player update (10.1) has the pre-increment/decrement and post-increment/decrement times changed? I noted in your previously mentioned article that Robert Penner mentioned there was no difference at the time of that posting.

I have tried testing this before as well as the for vs while on my MBP and didn’t seem to see any performance difference in using post, pre, for, while and the various combinations of all these parameters.

Also in my testing I had reset all tests to use the same syntax:

i = ?; //initial iterator value
m = ?; //conditional length value
for (i; i  m; i++--) //you get the idea

#7 by jackson on December 29th, 2010 · Reply

Check out my original article and the Flash Player 10.1 followup article. In short, there’s no difference between pre- and post-, increment and decrement, and just adding/subtracting one.
- #8 by as3isolib on December 29th, 2010 · Reply
  
  Ah sorry I musta missed or forgot those two posts. Good stuff. I think post increment/decrement is much easier to read so I am glad to be able to stick with it.

For Vs. While Revisited

Comments