I recently perused the AVM2 bytecode overview that Adobe provides and found something strange: they have special instructions for the first few local variables. Getting the first local variables is done by getlocal0, getlocal1, getlocal2, getlocal3, but then a generalized getlocal is used for local variables thereafter. This brought me to think about the performance implications of these specialized instructions. After all, why have them when you already have a generalized one? Read on to see the performance differences and if you can get any speed boost from rearranging your local variables.

Deriving a test to judge the difference between the specific getlocalN and the general getlocal (as well as setlocalN and setlocal) could be difficult, but thankfully (in this case) MXMLC in Flex 4 will order local variables exactly as you declare them. It’ll even keep (and initialize!) any local variables you don’t use, which seems to be a glaring optimization opportunity. So it’s easy to lay out local variables such that we can pick which will be in the first four corresponding to getlocal0, getlocal1, getlocal2, and getlocal3. After this is done, it’s a simple matter of using that local variable a lot to emphasize the local variable reads and writes. A very long loop is perfectly suited to this task. So, here is the test app featuring the two test functions:

package
{
	import flash.text.*;
	import flash.utils.*;
	import flash.display.*;
 
	public class VariableOrdering extends Sprite
	{
		private var __logger:TextField = new TextField();
 
		public function VariableOrdering()
		{
			__logger.autoSize = TextFieldAutoSize.LEFT;
			addChild(__logger);
 
			testIFirst();
			testILast();
		}
 
		private function log(msg:*): void
		{
			__logger.appendText(msg + "\n");
		}
 
		private function testIFirst(): void
		{
			const REPS:uint = 4000000000;
			var i:uint;
			var beforeTime:int;
			var afterTime:int;
			var val0:int;
			var val1:int;
			var val2:int;
			var val3:int;
			var val4:int;
			var val5:int;
			var val6:int;
			var val7:int;
			var val8:int;
			var val9:int;
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
			}
			afterTime = getTimer();
			log("i first: " + (afterTime-beforeTime));
		}
 
		private function testILast(): void
		{
			var val1:int;
			var val2:int;
			var val3:int;
			var val4:int;
			var val5:int;
			var val6:int;
			var val7:int;
			var val8:int;
			var val9:int;
			const REPS:uint = 4000000000;
			var i:uint;
			var beforeTime:int;
			var afterTime:int;
			var val0:int;
 
			beforeTime = getTimer();
			for (i = 0; i < REPS; ++i)
			{
			}
			afterTime = getTimer();
			log("i last: " + (afterTime-beforeTime));
		}
	}
}

As you can see, the loop variables i and REPS are placed at the beginning of the function in the case of testIFirst and at the end in the case of testILast. By moving both of them to the end we can double our local variable access per-loop. The rest of the variables&emdash; valN&emdash;are simply there to provide the padding necessary to ensure i and REPS are past the first four local variables and well into the generalized area covered by getlocal and setlocal. So how does these test functions perform? Let’s take a look.

Environment testIFirst testILast
3.0 Ghz Intel Core 2 Duo, Windows XP 6876 6939
2.0 Ghz Intel Core 2 Duo, Mac OS X 9433 9639

Putting i and REPS first produces a whopping 0.92% performance increase on Windows XP and 0.06% performance increase on Mac OS X. Both differences are very small, but if you run the test over and over (as I’ve done), you’ll see that they do occur at each test and with roughly the same performance advantage in favor of getlocalN/setlocalN. This is, still, a tiny number, but what do you have to give up in order to cash in on an extra ~1%? All you need to do is rearrange the order that your local variables are declared to get the extra speedup. If you have any areas of code that are iterating over a lot of elements, you may find this handy for eeking out just a little bit more performance. For most AS3 programmers though, this will simply come as a relief that you needn’t worry about the getlocalN/setlocalN opcodes providing a performance boost that you’re missing out on.