Stage3D Draw Calls: Part 1
There’s no doubt that Flash 11’s new Stage3D
API can produce some amazing results by giving us access to the power of the user’s video card/GPU. However, it’d be a mistake to blindly assume that it is always faster than the traditional Flash display list (i.e. Stage
). Today’s article begins a series that discusses the topic of “draw calls” and how they heavily impact the performance of your application.
Simply put, a “draw call” (or simply a “draw”) is a call to Context3D.drawTriangles
. To get anything other than a background color to show up, you’ll need to make at least one call to this function. As it turns out, a video card/GPU is happiest when you make exactly one call to this function. The reason is simple: GPUs prefer to be given a large data set and to chug away on it uninterrupted. Giving the GPU a few triangles at a time is the perfect way to interrupt this batch processing of your rendering data. In fact, you may have noticed that a shader program in Flash 11 cannot contain any branching code such as if-else
or switch
. While these branching instructions are actually available on many GPUs, they result in an interruption to the GPU’s batch processing because the sequence of shader opcodes to execute will change depending on the condition of your branching code.
So, for today’s test I will pit the Stage3DSprite
class from my Simple 2D With Stage3D article against the venerable Bitmap
class from the Flash Player API. The test app allows you to change the mode from Stage3DSprite
to Bitmap
, enable/disable moving, rotating, and scaling the sprites, and increase/decrease the number of sprites being displayed. First off, here is the Stage3DSprite
class with one method (dispose
) added to it:
package { import flash.geom.*; import flash.utils.*; import flash.display.*; import flash.display3D.*; import flash.display3D.textures.*; import com.adobe.utils.*; /** * A Stage3D-based 2D sprite * @author Jackson Dunstan, */ public class Stage3DSprite { /** Cached static lookup of Context3DVertexBufferFormat.FLOAT_2 */ private static const FLOAT2_FORMAT:String = Context3DVertexBufferFormat.FLOAT_2; /** Cached static lookup of Context3DVertexBufferFormat.FLOAT_3 */ private static const FLOAT3_FORMAT:String = Context3DVertexBufferFormat.FLOAT_3; /** Cached static lookup of Context3DProgramType.VERTEX */ private static const VERTEX_PROGRAM:String = Context3DProgramType.VERTEX; /** Cached static lookup of Vector3D.Z_AXIS */ private static const Z_AXIS:Vector3D = Vector3D.Z_AXIS; /** Temporary AGAL assembler to avoid allocation */ private static const tempAssembler:AGALMiniAssembler = new AGALMiniAssembler(); /** Temporary rectangle to avoid allocation */ private static const tempRect:Rectangle = new Rectangle(); /** Temporary point to avoid allocation */ private static const tempPoint:Point = new Point(); /** Temporary matrix to avoid allocation */ private static const tempMatrix:Matrix = new Matrix(); /** Temporary 3D matrix to avoid allocation */ private static const tempMatrix3D:Matrix3D = new Matrix3D(); /** Cache of positions Program3D per Context3D */ private static const programsCache:Dictionary = new Dictionary(true); /** Cache of positions and texture coordinates VertexBuffer3D per Context3D */ private static const posUVCache:Dictionary = new Dictionary(true); /** Cache of triangles IndexBuffer3D per Context3D */ private static const trisCache:Dictionary = new Dictionary(true); /** Vertex shader program AGAL bytecode */ private static var vertexProgram:ByteArray; /** Fragment shader program AGAL bytecode */ private static var fragmentProgram:ByteArray; /** 3D context to use for drawing */ public var ctx:Context3D; /** 3D texture to use for drawing */ public var texture:Texture; /** Width of the created texture */ public var textureWidth:uint; /** Height of the created texture */ public var textureHeight:uint; /** X position of the sprite */ public var x:Number = 0; /** Y position of the sprite */ public var y:Number = 0; /** Rotation of the sprite in degrees */ public var rotation:Number = 0; /** Scale in the X direction */ public var scaleX:Number = 1; /** Scale in the Y direction */ public var scaleY:Number = 1; /** Fragment shader constants: U scale, V scale, {unused}, {unused} */ private var fragConsts:Vector.<Number> = new <Number>[1, 1, 1, 1]; // Static initializer to create vertex and fragment programs { tempAssembler.assemble( Context3DProgramType.VERTEX, // Apply draw matrix (object -> clip space) "m44 op, va0, vc0\n" + // Scale texture coordinate and copy to varying "mov vt0, va1\n" + "div vt0.xy, vt0.xy, vc4.xy\n" + "mov v0, vt0\n" ); vertexProgram = tempAssembler.agalcode; tempAssembler.assemble( Context3DProgramType.FRAGMENT, "tex oc, v0, fs0 <2d,linear,mipnone,clamp>" ); fragmentProgram = tempAssembler.agalcode; } /** * Make the sprite * @param ctx 3D context to use for drawing */ public function Stage3DSprite(ctx:Context3D): void { this.ctx = ctx; if (!(ctx in trisCache)) { // Create the shader program var program:Program3D = ctx.createProgram(); program.upload(vertexProgram, fragmentProgram); programsCache[ctx] = program; // Create the positions and texture coordinates vertex buffer var posUV:VertexBuffer3D = ctx.createVertexBuffer(4, 5); posUV.uploadFromVector( new <Number>[ // X, Y, Z, U, V -1, -1, 0, 0, 1, -1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, -1, 0, 1, 1 ], 0, 4 ); posUVCache[ctx] = posUV; // Create the triangles index buffer var tris:IndexBuffer3D = ctx.createIndexBuffer(6); tris.uploadFromVector( new <uint>[ 0, 1, 2, 2, 3, 0 ], 0, 6 ); trisCache[ctx] = tris; } } /** * Set a BitmapData to use as a texture * @param bmd BitmapData to use as a texture */ public function set bitmapData(bmd:BitmapData): void { var width:uint = bmd.width; var height:uint = bmd.height; // Create a new texture if we need to if (createTexture(width, height)) { // If the new texture doesn't match the BitmapData's dimensions if (width != textureWidth || height != textureHeight) { // Create a BitmapData with the required dimensions var powOfTwoBMD:BitmapData = new BitmapData( textureWidth, textureHeight, bmd.transparent ); // Copy the given BitmapData to the newly-created BitmapData tempRect.width = width; tempRect.height = height; powOfTwoBMD.copyPixels(bmd, tempRect, tempPoint); // Upload the newly-created BitmapData instead bmd = powOfTwoBMD; // Scale the UV to the sub-texture fragConsts[0] = textureWidth / width; fragConsts[1] = textureHeight / height; } else { // Reset UV scaling fragConsts[0] = 1; fragConsts[1] = 1; } } // Upload new BitmapData to the texture texture.uploadFromBitmapData(bmd); } /** * Create the texture to fit the given dimensions * @param width Width to fit * @param height Height to fit * @return If a new texture had to be created */ protected function createTexture(width:uint, height:uint): Boolean { width = nextPowerOfTwo(width); height = nextPowerOfTwo(height); if (!texture || textureWidth != width || textureHeight != height) { texture = ctx.createTexture( width, height, Context3DTextureFormat.BGRA, false ); textureWidth = width; textureHeight = height; return true; } return false; } /** * Render the sprite to the 3D context */ public function render(): void { tempMatrix3D.identity(); tempMatrix3D.appendRotation(-rotation, Z_AXIS); tempMatrix3D.appendScale(scaleX, scaleY, 1); tempMatrix3D.appendTranslation(x, y, 0); ctx.setProgram(programsCache[ctx]); ctx.setTextureAt(0, texture); ctx.setProgramConstantsFromMatrix(VERTEX_PROGRAM, 0, tempMatrix3D, true); ctx.setProgramConstantsFromVector(VERTEX_PROGRAM, 4, fragConsts); ctx.setVertexBufferAt(0, posUVCache[ctx], 0, FLOAT3_FORMAT); ctx.setVertexBufferAt(1, posUVCache[ctx], 3, FLOAT2_FORMAT); ctx.drawTriangles(trisCache[ctx]); } /** * Dispose of this sprite's resources */ public function dispose(): void { if (texture) { texture.dispose(); texture = null; } } /** * Get the next-highest power of two * @param v Value to get the next-highest power of two from * @return The next-highest power of two from the given value */ public static function nextPowerOfTwo(v:uint): uint { v--; v |= v >> 1; v |= v >> 2; v |= v >> 4; v |= v >> 8; v |= v >> 16; v++; return v; } } }
Here is the test app:
package { import flash.display3D.*; import flash.display.*; import flash.filters.*; import*; import flash.text.*; import flash.geom.*; import flash.utils.*; public class Stage3DSpriteSpeed extends Sprite { private static const MODE_3D:int = 1; private static const MODE_BITMAP:int = 2; [Embed(source="flash_logo_icon.jpg")] private static const TEXTURE:Class; private var context3D:Context3D; private var stats:TextField = new TextField(); private var lastStatsUpdateTime:uint; private var lastFrameTime:uint; private var frameCount:uint; private var driver:TextField = new TextField(); private var texture:BitmapData = (new TEXTURE() as Bitmap).bitmapData; private var sprites3D:Vector.<Stage3DSprite> = new <Stage3DSprite>[]; private var spritesBitmap:Vector.<Bitmap> = new <Bitmap>[]; private var mode:int = MODE_3D; private var moving:Boolean; private var rotating:Boolean; private var scaling:Boolean; private var numSprites:int = 2000; private var container:Sprite = new Sprite(); public function Stage3DSpriteSpeed() { stage.align = StageAlign.TOP_LEFT; stage.scaleMode = StageScaleMode.NO_SCALE; stage.frameRate = 60; addChild(container); var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.addEventListener(Event.CONTEXT3D_CREATE, onContextCreated); stage3D.requestContext3D(Context3DRenderMode.AUTO); } protected function onContextCreated(ev:Event): void { // Setup context var stage3D:Stage3D = stage.stage3Ds[0]; stage3D.removeEventListener(Event.CONTEXT3D_CREATE, onContextCreated); context3D = stage3D.context3D; context3D.configureBackBuffer( stage.stageWidth, stage.stageHeight, 0, true ); context3D.enableErrorChecking = true; // Setup UI stats.background = true; stats.backgroundColor = 0xffffffff; stats.autoSize = TextFieldAutoSize.LEFT; stats.text = "Getting FPS..."; addChild(stats); driver.background = true; driver.backgroundColor = 0xffffffff; driver.text = "Driver: " + context3D.driverInfo; driver.autoSize = TextFieldAutoSize.LEFT; driver.y = stats.height; addChild(driver); makeButtons( "Mode: Stage3DSprite", "Mode: Bitmap", null, "Add 100 Sprites", "Remove 100 Sprites", null, "Enable Moving", "Enable Rotating", "Enable Scaling" ); // Start the simulation makeSprites(); addEventListener(Event.ENTER_FRAME, onEnterFrame); } private function makeButtons(...labels): void { const PAD:Number = 5; var curX:Number = PAD; var curY:Number = stage.stageHeight - PAD; for each (var label:String in labels) { if (!label) { curX = PAD; curY -= button.height + PAD; continue; } var tf:TextField = new TextField(); tf.mouseEnabled = false; tf.selectable = false; tf.defaultTextFormat = new TextFormat("_sans", 16, 0x0071BB); tf.autoSize = TextFieldAutoSize.LEFT; tf.text = label; = "lbl"; var button:Sprite = new Sprite(); button.buttonMode = true;;, 0, tf.width+PAD, tf.height+PAD);;;, 0, tf.width+PAD, tf.height+PAD); button.addChild(tf); button.addEventListener(MouseEvent.CLICK, onButton); if (curX + button.width > stage.stageWidth - PAD) { curX = PAD; curY -= button.height + PAD; } button.x = curX; button.y = curY - button.height; addChild(button); curX += button.width + PAD; } } private function makeSprites(): void { // Clear old sprites context3D.clear(0.5, 0.5, 0.5); context3D.present(); for each (var spr3D:Stage3DSprite in sprites3D) { spr3D.dispose(); } sprites3D.length = 0; spritesBitmap.length = 0; container.removeChildren(); // Make new sprites var i:int; switch (mode) { case MODE_3D: var scale:Number = texture.width / stage.stageWidth; for (; i < numSprites; ++i) { spr3D = new Stage3DSprite(context3D); spr3D.bitmapData = texture; spr3D.x = Math.random()*2-1; spr3D.y = Math.random()*2-1; spr3D.scaleX = spr3D.scaleY = scale; sprites3D[i] = spr3D; } break; case MODE_BITMAP: for (; i < numSprites; ++i) { var bm:Bitmap = new Bitmap(texture); bm.x = Math.random()*stage.stageWidth; bm.y = Math.random()*stage.stageHeight; spritesBitmap[i] = bm; container.addChild(bm); } break; } // Reset FPS frameCount = 0; lastFrameTime = 0; lastStatsUpdateTime = getTimer(); } private function onButton(ev:MouseEvent): void { var tf:TextField ="lbl"); var lbl:String = tf.text; switch (lbl) { case "Mode: Stage3DSprite": mode = MODE_3D; makeSprites(); break; case "Mode: Bitmap": mode = MODE_BITMAP; makeSprites(); break; case "Add 100 Sprites": numSprites += 100; makeSprites(); break; case "Remove 100 Sprites": if (numSprites) { numSprites -= 100; makeSprites(); } break; case "Enable Moving": moving = true; tf.text = "Disable Moving"; break; case "Disable Moving": moving = false; tf.text = "Enable Moving"; break; case "Enable Rotating": rotating = true; tf.text = "Disable Rotating"; break; case "Disable Rotating": rotating = false; tf.text = "Enable Rotating"; break; case "Enable Scaling": scaling = true; tf.text = "Disable Scaling"; break; case "Disable Scaling": scaling = false; tf.text = "Enable Scaling"; break; } } private function onEnterFrame(ev:Event): void { // Render the scene switch (mode) { case MODE_3D: var spr3D:Stage3DSprite; context3D.clear(0.5, 0.5, 0.5); for each (spr3D in sprites3D) { spr3D.render(); } if (moving) { for each (spr3D in sprites3D) { spr3D.x = Math.random()*2-1; spr3D.y = Math.random()*2-1; } } if (rotating) { for each (spr3D in sprites3D) { spr3D.rotation = 360*Math.random(); } } if (scaling) { var baseScale:Number = texture.width / stage.stageWidth; for each (spr3D in sprites3D) { spr3D.scaleX = baseScale*Math.random(); spr3D.scaleY = baseScale*Math.random(); } } context3D.present(); break; case MODE_BITMAP: var dispObj:DisplayObject; if (moving) { var stageWidth:Number = stage.stageWidth; var stageHeight:Number = stage.stageHeight; for each (dispObj in spritesBitmap) { dispObj.x = Math.random()*stageWidth; dispObj.y = Math.random()*stageHeight; } } if (rotating) { for each (dispObj in spritesBitmap) { dispObj.rotation = 360*Math.random(); } } if (scaling) { for each (dispObj in spritesBitmap) { dispObj.scaleX = Math.random(); dispObj.scaleY = Math.random(); } } break; } // Update stats display frameCount++; var now:int = getTimer(); var dTime:int = now - lastFrameTime; var elapsed:int = now - lastStatsUpdateTime; if (elapsed > 1000) { var framerateValue:Number = 1000 / (elapsed / frameCount); stats.text = "FPS: " + framerateValue.toFixed(4) + ", Sprites: " + numSprites; lastStatsUpdateTime = now; frameCount = 0; } lastFrameTime = now; } } }
And here is the texture image used.
I ran the test app with in the following environment:
- Flex SDK (MXMLC), compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player
- 2.4 Ghz Intel Core i5
- NVIDIA GeForce GT 330M 256 MB
- Mac OS X 10.7.3
With these settings:
- Sprites: 2000
Here are the results I got:
Mode | FPS |
Stage3DSprite | 26 |
Bitmap | 48 |
How could Stage3D
have lost!? Even on this modern GPU that has no trouble playing games like World of Warcraft, it’s still brought to its knees by a measly 2000 quads. At two triangles per quad, that’s only 4000 triangles! Even worse, conventional wisdom says—correctly— that you shouldn’t scale or rotate your Bitmap
objects because it’s much slower to draw them. But even drawing 2000 of them per frame on a pretty fast machine is way faster than the Stage3D
approach which is supposed to be great at scaling and rotating 3D objects.
As you might have guessed, the answer lies in the number of draw calls being performed. Consider that each sprite that is drawn to the screen is its own draw call and you will see, given the introductory paragraph, that we are constantly interrupting the GPU. It really wants to chug along drawing all 4000 triangles in one go, but we’re feeding it only two at a time, telling it to stop and wait for our next two triangles, feeding it two more, and so on.
So, the above test shows you a clear-cut case of the classic 2D Stage
beating the pants off of the supposedly high-performance Stage3D
. The next article in this series will show you how to optimize to reduce draw calls and turn the tables on our old friend Stage
. Stay tuned!
Spot a bug? Have a suggestion or a question? Post a comment!
#1 by Bob on February 20th, 2012 ·
For 2000 stationary Sprites, I got 10 fps for Stage3D and 60 fps for the bitmaps. (Intel i3 M 380 with no extra discrete graphics anything, so running on software mode.) It would be interesting to see something like Starling added into the tests.
#2 by jackson on February 20th, 2012 ·
Thanks for the software rendering results. It’s a nice addition to the data.
While I’ll probably cover Starling some day, this series is focused on what’s happening behind the scenes—draw calls—for the purpose of general learning about 3D graphics rendering. If I do end up covering Starling, I’ll be referencing these articles when discussing how fast/slow it is. This will also give a simple little framework to benchmark against. Stay tuned! :)
#3 by Bob on February 21st, 2012 ·
Huh. Today when I ran it I now got: “Driver: DirectX9Ex (Direct blitting)” And 30 fps for 2000 Stage3D sprites vs. 60 fps for Bitmaps. I’ll admit I’m thoroughly uninformed and I have no idea what would have caused the difference from yesterday to today.
#4 by Tyler on February 20th, 2012 ·
These results are in line with what I’ve seen elsewhere. I have however always found that when rotation or scaling are thrown into the mix the Stage3D version gains the advantage over the Bitmap approach, and that’s the same for me with this example (Windows 7, Chrome). At 4000 quads for example, I’m seeing 58fps with Stage3D vs. 60 for Bitmap. But when I enable rotation, the Stage3D version doesn’t drop at all while the Bitmap version drops by more than half. I believe with larger objects the advantage also shifts to Stage3D, even without rotation, etc.
Among the 2D Stage3D frameworks, Genome2D and ND2D seem to do the best job of consolidating the draw calls, and I’m not sure there are any occasions where old-fashioned bitmap blitting beats them now.
#5 by jackson on February 20th, 2012 ·
The reason the
version doesn’t drop when rotation and scaling are enabled is that they are always doing rotation and scaling. The quad is drawn out of two general 3D triangles, not a 2D blit and therefore there is no prohibition of scaling and rotation in order to maximize speed.As mentioned above, I’ll probably be writing an article about the various 2D
frameworks. I’ll try to include as many as possible, including Starling, Genome2D, and ND2D and discuss how they compare with what this series of articles comes up with (not to say that the goal of this series is to come up with a 2D framework).#6 by skyboy on February 20th, 2012 ·
Software blitting | Bitmap rendering
stationary: 0.8 | 52
moving: 0.8 | 10
rotating: 0.8 | 1.7
The Stage3D is rather stable at around 0.8 FPS regardless of what else is going on (apparently there is no fast path for anything; it all gets applied); but the Bitmap method clearly wins without a GPU present.
#7 by skyboy on February 20th, 2012 ·
Right. That was for 3,500 sprites.
#8 by jackson on February 20th, 2012 ·
I’m interested to see what your results look like after the next article when the number of draw calls is reduced dramatically. I’ll be sure to include my own software rendering results in the article, too.
Thanks for the numbers.
#9 by focus on February 21st, 2012 ·
My results for 6000 sprites:
Mode: Stage3DSprite (DirectX9 Direct blitting) – 60 FPS (GPU usage ~25%)
Mode: Bitmap – 40 FPS (1 CPU core usage – 100%)
GPU – AMD Radeon HD 6790
CPU – Intel i7 2600K (4.9GHz)
#10 by jackson on February 21st, 2012 ·
I hit a resource limit around 4100-4200
sprites. You can see the error with a debug player. How are you getting all the way up to 6000 sprites?#11 by skyboy on February 21st, 2012 ·
More RAM. Some high-end (not gaming rigs) consumer grade computers are coming with 6-16 GB now.
#12 by Bob on February 21st, 2012 ·
On the release player there really isn’t any way to tell that more sprites aren’t being added, so he most likely just didn’t notice when he hit the maximum.
#13 by focus on February 22nd, 2012 ·
Yeah, if I hit the maximum, I didn’t noticed it. I have 16 GB of ram, but it is not used a lot of ram at all…
#14 by hexagonstar on February 21st, 2012 ·
Stage3D wins here. At 4000 objects, all moving, rotating, scaling, with Stage3DSprite it’s still at 60fps while Bitmap drops down to 45fps.
#15 by hexagonstar on February 21st, 2012 ·
Just tried a bit more to confirm that everything runs well. Upped the ante to 10000 objects, all moving, rotating, scaling:
Stage3DSprite: 59-60 fps
Bitmap: 20-23 fps
With 20000 objects:
Stage3DSprite: 59-60 fps
Bitmap: 10-11 fps
This is on Release Player, i7 quadcore@3.6Ghz, 12GB RAM and a ATI Radeon HD 5800.
#16 by Matan Uberstein on February 22nd, 2012 ·
4000 Objects (limit error at 4100), moving, scaling and rotating.
Stage3D: 44fps (CPU @ 19-22%)
Bitmap: 31fps (CPU @ 30-40%)
Rig: AMD X6 @ 2.8Ghz, 8 RAM @ 1600Mhz, XFX 8800GT with debug FP
Thanks for sharing, very interesting. :)
#17 by Daniel on February 22nd, 2012 ·
CPU Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz
Video Adapter NVIDIA GeForce 310M 536870912
browser: chrome
flash: 11,1,102,62
2000 sprites
stage3d: cca 60 FPS + 25% CPU
bitmap: cca 30 FPS + 33% CPU
looks like our setup is not that far apart, but what I see is stage3d beating bitmap hands down
#18 by ben w on February 23rd, 2012 ·
Interesting test!
amazed that hexagonstar can run 20000 objects on his machine without it exploding!!! Happen to be stress testing some stage3d stuff on my engine at the moment, be interesting to see how well it performs for others.. might upload it and post a link.
Anything above 1000 draw calls seems to start slowing things down, 2000+ and 60 fps becomes hard to maintain
You should add another option in there to change the size of the bitmap being used.. one 5 times bigger will have a much bigger effect on the native version than the gpu version (I would think)
#19 by jackson on February 23rd, 2012 ·
Good idea. I’ll add that to the next test.
#20 by ben w on February 23rd, 2012 · <- video of current stress testing, playable demo will follow at some point soon
#21 by ben w on February 24th, 2012 ·
demo up and running
#22 by jackie young on March 14th, 2012 ·
I tested the app but it showed that the stage 3D defeated the Bitmap with 3000 sprites where stage3D got 60fps,bitmap only got 30fps,what‘s going on?
#23 by jackson on March 14th, 2012 ·
Performance may vary from machine to machine, but the point is that at least in some cases
may perform worse thanBitmap
due to the number of draw calls. In your case, this seems to not be true due to some combination of OS, driver, video card, etc.