Since January, Adobe has dropped the “premium features” requirement for Flash apps that use the “domain memory opcodes” (a.k.a. “Alchemy opcodes”) that provide low-level performance-boosting operations that let you deal more-or-less directly with blocks of memory. Then in February we got Flash Player 11.6 along with built-in ASC 2.0 support for this feature. Today’s article shows you how to use these opcodes and takes a first stab at improving performance with them. Are they really all they’re cracked up to be?

To use the “domain memory opcodes”, you first need to make sure your build environment is configured properly. If it’s not, none of the code below will work. For starters, you need to use ASC 2.0 as ASC 1.0 does not directly support these opcodes. You can get ASC 2.0 as part of the AIR SDK. I’ll leave configuring your IDE (e.g. Flash Builder) to you since there are far too many to discuss here. Second, you need to target Flash Player 11.6 by adding these command-line parameters (or the equivalent in your IDE):

--target-player=11.6.0 -swf-version=19

Now you’re ready to start using the “domain memory opcodes”. These exist as package-level functions inside avm2.intrinsics.memory which is automatically available to you without needing to link against any additional SWCs. You don’t need to worry about the function call overhead for these since ASC 2.0 will automatically replace these function calls with the equivalent domain memory opcodes. This means that there won’t be any function call overhead but you get to use nice AS3 functions rather than hand-typing assembly code. This is all very similar to Apparat and other tools’ original support for “Alchemy opcodes”.

Here’s what the avm2.intrinsics.memory package looks like:

package avm2.intrinsics.memory
	public function li8(addr:int): int;                 // Load Int 8-bit
	public function li16(addr:int): int;                // Load Int 16-bit
	public function li32(addr:int): int;                // Load Int 32-bit
	public function lf32(addr:int): Number;             // Load Float 32-bit (a.k.a. "float")
	public function lf64(addr:int): Number;             // Load Float 64-bit (a.k.a. "double")
	public function si8(value:int, addr:int): void;     // Store Int 8-bit
	public function si16(value:int, addr:int): void;    // Store Int 16-bit
	public function si32(value:int, addr:int): void;    // Store Int 32-bit
	public function sf32(value:Number, addr:int): void; // Store Float 32-bit (a.k.a. "float")
	public function sf64(value:Number, addr:int): void; // Store Float 64-bit (a.k.a. "double")
	public function sxi1(value:int): int;               // Sign eXtend 1-bit integer to 32 bits
	public function sxi8(value:int): int;               // Sign eXtend 8-bit integer to 32 bits
	public function sxi16(value:int): int;              // Sign eXtend 16-bit integer to 32 bits

You can use these like any other package-level function (e.g. flash.utils.getTimer):

import avm2.intrinsics.memory.li32; // import the function itself
class MyClass
	function foo(): void
		// Get the 32-bit integer value at myAddr
		var val:int = li32(myAddr);

So what memory are these functions dealing with? Well, that’s the so-called “domain memory” attached to the current ApplicationDomain. Essentially, each SWF gets its own ApplicationDomain that defines its environment such as the classes that are available. When you load another SWF (e.g. an animation to show), that will get its own ApplicationDomain (subject to some Loader tricks). But for most purposes, you only need to deal with the current domain and can ignore all others. Here’s how you set up the “domain memory”:

import flash.system.ApplicationDomain;
import flash.utils.ByteArray;
import flash.utils.Endian;
var myDomainMemory:ByteArray = new ByteArray();
myDomainMemory.length = 4*1024; // allocate at least a few KB to use with the opcodes
myDomainMemory.endian = Endian.LITTLE_ENDIAN; // domain memory should always be little endian
ApplicationDomain.currentDomain.domainMemory = myDomainMemory;

Now let’s see if we can use them to make some code run faster. Here I’ve just tested the first idea to come to mind: storing 32-bit floats in domain memory for faster uploading to Stage3D resources like VertexBuffer. Since there’s no 32-bit floating point type in AS3 (only the 64-bit Number), a Vector.<Number> needs to be converted by the CPU from 64-bit to 32-bit at upload time. Flash Player does this for us in native code, but it’s still slow. If we store the floats in a ByteArray as 32-bit then we can gain better control over the conversion process. For example, if we keep the ByteArray after uploading it then we can avoid 64-to-32-bit conversion again when we handle context loss.

Here’s a little app that tests storing some floating point values in a Vector.<Number> as well as “domain memory” as contiguous blocks of both 32-bit and 64-bit floating point values. The applicable domain memory opcodes to look for are sf32 and sf64.

	import flash.utils.Endian;
	import avm2.intrinsics.memory.sf32;
	import avm2.intrinsics.memory.sf64;
	import flash.system.ApplicationDomain;
	import flash.utils.getTimer;
	import flash.utils.ByteArray;
	import flash.text.TextFieldAutoSize;
	import flash.display.StageScaleMode;
	import flash.display.StageAlign;
	import flash.text.TextField;
	import flash.display.Sprite;
	public class FillFloats extends Sprite
		private var __logger:TextField = new TextField();
		private function row(...cols): void
		public function FillFloats()
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
			__logger.autoSize = TextFieldAutoSize.LEFT;
		private function init(): void
			const SIZE:uint = 10000000;
			var i:uint;
			var vec:Vector.<Number> = new Vector.<Number>(SIZE);
			var domainMemory:ByteArray = new ByteArray();
			domainMemory.length = SIZE*4 + SIZE*8;
			domainMemory.endian = Endian.LITTLE_ENDIAN;
			var floats:uint = 0;
			var doubles:uint = SIZE*4;
			var curAddr:uint;
			var beforeTime:int;
			var afterTime:int;
			ApplicationDomain.currentDomain.domainMemory = domainMemory;
			row("Storage", "Time");
			beforeTime = getTimer();
			for (i = 0; i < SIZE; ++i)
				vec[i] = i;
			afterTime = getTimer();
			row("Vector", (afterTime-beforeTime));
			beforeTime = getTimer();
			curAddr = floats;
			for (i = 0; i < SIZE; ++i)
				sf32(i, curAddr);
				curAddr += 4;
			afterTime = getTimer();
			row("Domain Memory (floats)", (afterTime-beforeTime));
			beforeTime = getTimer();
			curAddr = doubles;
			for (i = 0; i < SIZE; ++i)
				sf64(i, curAddr);
				curAddr += 8;
			afterTime = getTimer();
			row("Domain Memory (doubles)", (afterTime-beforeTime));

I ran this test in the following environment:

  • Release version of Flash Player 11.8.800.97
  • 2.3 Ghz Intel Core i7
  • Mac OS X 10.8.4
  • ASC 2.0 build 352231 (-debug=false -verbose-stacktraces=false -inline)

And here are the results I got:

Storage Time
Vector 24
Domain Memory (floats) 27
Domain Memory (doubles) 26

Performance Graph

Performance of the domain memory opcodes is consistently close to the Vector performance, but never quite matches it. I’ve tried lots of variations (not shown) including assignment to random memory locations, pulling Number values out of a Vector instead of converting from an int (the iterator), using a Number-typed iterator, copying the floating-point values to store from another Vector or domain memory, doing more copies per loop iteration, using different browsers, and so forth. Regardless, Vector seems to beat out the domain memory opcodes in this case.

So while domain memory doesn’t shine here, that doesn’t mean that it has no purpose anywhere. There are several reports of it providing massive speedups in other cases. I’ll certainly keep looking for speedups using domain memory opcodes. In the meantime, there may also be a lot of variation between different types of hardware. Care to try it out and post your results in the comments?