At long last, Flash Player 11 has been released and carries with it a raft of exciting new features. Perhaps most exciting is the inclusion of the new Stage3D class (and related libraries) to enable GPU-accelerated graphics rendering. Today’s article will be the first to cover this new API and discusses one of its features: reading back the rendered scene into a BitmapData that you can put on the regular Stage. Surely this will be a popular operation for merging 3D and 2D, so let’s see how fast it is!

If hardware acceleration is being used, the pixels will need to be sent back from video card memory (VRAM) into main system memory (RAM), which can be a very expensive operation. If the software renderer is being used instead of hardware acceleration, the pixels will already be in RAM so the transfer will be—theoretically—a much quicker memory copy operation.

To test this theory, I wrote a little performance app. It draws absolutely nothing with the Stage3D API and only displays a little UI for controlling the app. This way, we can isolate the performance of Context3D.drawToBitmapData, which is responsible for reading the Stage3D‘s pixels into a BitmapData.

	import flash.display3D.*;
	import flash.external.*;
	import flash.display.*;
	import flash.sampler.*;
	import flash.system.*;
	import flash.utils.*;
	import flash.text.*;
	import flash.geom.*;
	import com.adobe.utils.*;
	public class Stage3DReadback extends Sprite
		private static const PAD:Number = 3;
		private static const TEXT_FORMAT:TextFormat = new TextFormat("_sans", 11);
		private var __stage3D:Stage3D;
		private var __tf:TextField = new TextField();
		private var __context:Context3D;
		private var __bmdAlpha:BitmapData;
		private var __bmdNoAlpha:BitmapData;
		private var __mode:String;
		private var __enterFrameHandler:Function;
		private var __driverInfo:String;
		public function Stage3DReadback()
			stage.align = StageAlign.TOP_LEFT;
			stage.scaleMode = StageScaleMode.NO_SCALE;
			__stage3D = stage.stage3Ds[0];
			makeButton("Toggle Hardware", onToggleHardware);
			makeButton("No Readback", onNoReadback);
			makeButton("Readback (no alpha)", onReadbackNoAlpha);
			makeButton("Readback (alpha)", onReadbackAlpha);
			var about:TextField = new TextField();
			about.autoSize = TextFieldAutoSize.LEFT;
			about.defaultTextFormat = TEXT_FORMAT;
			about.htmlText = '<font color="#0071BB">'
				+ '<a href="">'
				+ ''
				+ '</a></font>\n'
				+ 'October 2011';
			about.x = stage.stageWidth - PAD - about.width;
			about.y = PAD;
			var logger:TextField = __tf;
			logger.autoSize = TextFieldAutoSize.LEFT;
			logger.y = this.height;
			__mode = "No Readback";
			__enterFrameHandler = onEnterFrameNoReadback;
		private function setupContext(renderMode:String): void
			__tf.text = "Setting up context with render mode: " + renderMode;
			__stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
		private function onContextCreated(ev:Event): void
			__stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated);
			const width:int = stage.stageWidth;
			const height:int = stage.stageHeight;
			__context = __stage3D.context3D;
			__context.configureBackBuffer(width, height, 0, true);
			__driverInfo = __context.driverInfo;
			// First time only
			if (!__bmdNoAlpha)
				__bmdNoAlpha = new BitmapData(width, height, false);
				__bmdAlpha = new BitmapData(width, height, true);
			setMode(__mode, __enterFrameHandler);
		private function removeAllEnterFrameHandlers(): void
			removeEventListener(Event.ENTER_FRAME, onEnterFrameNoReadback);
			removeEventListener(Event.ENTER_FRAME, onEnterFrameReadbackNoAlpha);
			removeEventListener(Event.ENTER_FRAME, onEnterFrameReadbackAlpha);
		private function setMode(name:String, enterFrameHandler:Function): void
			__mode = name;
			__enterFrameHandler = enterFrameHandler;
			addEventListener(Event.ENTER_FRAME, enterFrameHandler);
		private function onToggleHardware(ev:MouseEvent): void
			__tf.text = "Toggling hardware...";
				__driverInfo.toLowerCase().indexOf("software") >= 0
					? Context3DRenderMode.AUTO
					: Context3DRenderMode.SOFTWARE
		private function onNoReadback(ev:MouseEvent): void
			setMode("No Readback", onEnterFrameNoReadback);
		private function onReadbackNoAlpha(ev:MouseEvent): void
			setMode("Readback (no alpha)", onEnterFrameReadbackNoAlpha);
		private function onReadbackAlpha(ev:MouseEvent): void
			setMode("Readback (alpha)", onEnterFrameReadbackAlpha);
		private function reportTime(name:String, time:int): void
			__tf.text = __driverInfo + " - " + name + ": " + time + " ms";
		private function onEnterFrameNoReadback(ev:Event): void
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
			reportTime("No readback", drawTime);
		private function onEnterFrameReadbackNoAlpha(ev:Event): void
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
			reportTime("Readback (no alpha)", drawTime);
		private function onEnterFrameReadbackAlpha(ev:Event): void
			var beginTime:int = getTimer();
			__context.clear(0xEE/255, 0xEA/255, 0xD9/255, 1.0);
			var endTime:int = getTimer();
			var drawTime:int = endTime - beginTime;
			reportTime("Readback (alpha)", drawTime);
		private function makeButton(label:String, callback:Function): void
			var tf:TextField = new TextField();			
			tf.defaultTextFormat = TEXT_FORMAT; = "label";
			tf.text = label;
			tf.autoSize = TextFieldAutoSize.LEFT;
			tf.selectable = false;
			tf.x = tf.y = PAD;
			var button:Sprite = new Sprite(); = label;;, 0, tf.width+PAD*2, tf.height+PAD*2);;, 0x000000);, 0, tf.width+PAD*2, tf.height+PAD*2);
			button.addEventListener(MouseEvent.CLICK, callback);
			button.x = PAD + this.width;
			button.y = PAD;

I ran this performance test with the following environment:

  • Flex SDK (MXMLC), compiling in release mode (no debugging or verbose stack traces)
  • Release version of Flash Player
  • 2.4 Ghz Intel Core i5
  • Mac OS X 10.7.1
  • NVIDIA GeForce GT 330M 256 MB

And got these results:

Resolution No Readback Readback (no alpha) Readback (alpha)
640×480 0 3 3
800×600 0 4 4
1024×768 0 6 6
1280×720 0 8 8
1920×1080 0 15 15
Resolution No Readback Readback (no alpha) Readback (alpha)
640×480 1 2 2
800×600 1 4 4
1024×768 3 6 6
1280×720 3 7 7
1920×1080 7 15 15

Stage3D Readback Performance (hardware) Chart

Stage3D Readback Performance (software) Chart

Software rendering is clearly slower overall, even with a blank scene. Unfortunately, it seems no faster at reading the scene back into the BitmapData than the hardware-accelerated version. This would have been one of software rendering’s only performance advantages over hardware-accelerated rendering, but it seems as though this optimization is not (yet) in place.

Nonetheless, this test points out an important fact: reading the scene’s pixels back into a BitmapData is very expensive and possibly not feasible in real time with large scenes. For example, a game attempting to run at a smooth 30 frames-per-second has only 33 milliseconds per frame to do its work. If reading the 3D scene back into RAM takes 15 milliseconds, the rest of the game (e.g. physics, sound, 2D rendering, networking) must be quite fast to accommodate it. Also, it’s a good idea to think of older systems than my test machine, which is a relatively new MacBook Pro. Still, if adding 3D content to a 2D stage scene is very important, it seems like it can be accomplished so long as you limit the resolution of the 3D scene.

Spot a bug? Have a suggestion? Different results on a different OS or video card? Post a comment!