Amazing Lookups Optimization
Today’s article is about an unintuitive-yet-simple optimization you can use to hugely increase the speed of reading from Array, Vector, Dictionary, Object, and dynamic classes. Need I say more? Read on for this amazing speedup!
I was recently reading an article on Mark Knol’s site about some of the loop optimizations I’ve discussed before (e.g. removing the length getter) when I saw something I’d never seen before, despite my nine articles on AS3 loops and years of reading other people’s AS3 code. Mark was casting the result of an Array read operation to the type of variable he was assigning it to:
function test(items:Array, index:uint): void { var item:MyItem; item = items[index]; // normal version item = items[index] as MyItem; // Mark's version }
To be sure, you do not need to type  as MyItem because the result of indexing an Array is an untyped (*) variable that can be assigned to anything. You don’t even get a compiler warning. If the type can’t be assigned, you’ll simply get null, 0, or some other default value. But, since this was an article on loop optimization and I was about to write a comment pointing out that casts can be expensive, I figured I should test my assumption. As it turns out, this cast wasn’t slowing down his version at all. In fact, it was yielding far superior performance to the version without a cast. Shocked, I developed a full performance test with Array, Vector, Dictionary, Object, and dynamic classes to see if this optimization applied elsewhere:
package { import flash.display.*; import flash.utils.*; import flash.text.*; public class CastingLookups extends Sprite { private var __logger:TextField = new TextField(); private function row(...cols): void { __logger.appendText(cols.join(",") + "\n"); } public function CastingLookups() { __logger.autoSize = TextFieldAutoSize.LEFT; addChild(__logger); var beforeTime:int; var afterTime:int; var noCastTime:int; var castTime:int; var item:MyItem; var i:uint; var len:uint = 10000000; var itemsArray:Array = new Array(len); var itemsVector:Vector.<MyItem> = new Vector.<MyItem>(len); var itemsDictionary:Dictionary = new Dictionary(); var itemsObject:Object = new Object(); var itemsDynClass:Dynamic = new Dynamic(); for (i = 0; i < len; ++i) { itemsArray[i] = itemsVector[i] = itemsDictionary[i] = itemsObject[i] = itemsDynClass[i] = new MyItem(); } row("Type", "No Cast Time", "Cast Time"); beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsArray[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsArray[i] as MyItem; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Array", noCastTime, castTime); beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i] as MyItem; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Vector", noCastTime, castTime); beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsDictionary[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsDictionary[i] as MyItem; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Dictionary", noCastTime, castTime); beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsObject[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsObject[i] as MyItem; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Object", noCastTime, castTime); beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsDynClass[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsDynClass[i] as MyItem; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Dynamic Class", noCastTime, castTime); } } } class MyItem{} dynamic class Dynamic{}
I ran the performance test with the following environment:
- Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.3.181.34
- 2.4 Ghz Intel Core i5
- Mac OS X 10.6.8
And got these results:
| Type | No Cast Time | Cast Time | 
|---|---|---|
| Array | 134 | 68 | 
| Vector | 126 | 63 | 
| Dictionary | 340 | 270 | 
| Object | 332 | 267 | 
| Dynamic Class | 331 | 270 | 

The point is not to compare the various container types (as in Map Performance or Accessing Objects), but the huge speedup when the cast is added. For Array and Vector, the cast nearly doubles the speed! For Object, Dictionary, and dynamic classes, the optimization is less drastic, but still about a 25% speedup.
How is this possible? To see, let’s look at the bytecode generated for the “no cast” version of the Vector test: (with annotations by me)
    221     pushbyte      	0               // push 0 literal value
    223     convert_u     	                // convert 0 to an unsigned int
    224     setlocal      	4               // set 0 to i
    226     jump          	L7              // go to block L7
 
 
    L8: 
    230     label         	
    231     getlocal      	7               // get itemsVector
    233     getlocal      	4               // get i
    235     getproperty   	null            // index the vector
    237     coerce        	private::MyItem // implicit cast the result to a MyItem
    239     setlocal3     	                // set the result to item
    240     getlocal      	4               // get i
    242     increment     	                // i++
    243     convert_u     	                // convert i to an unsigned int
    244     setlocal      	4               // set result to i
 
    L7: 
    246     getlocal      	4               // get i
    248     getlocal      	5               // get len
    250     iflt          	L8              // if i < len, go to block L8Now let’s look at the version with the cast:
    278     pushbyte      	0               // push 0 literal value
    280     convert_u     	                // convert 0 to an unsigned int
    281     setlocal      	4               // set 0 to i
    283     jump          	L9              // go to block L9
 
 
    L10: 
    287     label         	
    288     getlocal      	7               // get itemsVector
    290     getlocal      	4               // get i
    292     getproperty   	null            // index the vector
    294     getglobalscope	                // get the object at the top of the scope chain
    295     getslot       	2               // get the item at slot 2 in the global scope (i.e. MyItem)
    297     astypelate    	                // "as" cast to MyItem
    298     coerce        	private::MyItem // implicit cast to MyItem (again)
    300     setlocal3     	                // set the result to item
    301     getlocal      	4               // get i
    303     increment     	                // i++
    304     convert_u     	                // convert i to an unsigned int
    305     setlocal      	4               // set result to i
 
    L9: 
    307     getlocal      	4               // get i
    309     getlocal      	5               // get len
    311     iflt          	L10             // if i < len, goto block L10Notice that the only difference is that the cast version adds the as cast via these three lines:
    294     getglobalscope	                // get the object at the top of the scope chain
    295     getslot       	2               // get the item at slot 2 in the global scope (i.e. MyItem)
    297     astypelate    	                // "as" cast to MyItemThese three lines are the only difference between the “cast” and “no cast” versions of every tested type.
How can adding instructions yield a 2x performance increase? I do not know. I’ve looked over the source code and bytecode at least a dozen times now and am positive that I haven’t switched the order or anything silly like that. If you spot an error, please comment below about it. Barring any mistake though, it looks like we have a way to hugely increase the speed at which we can access Array, Vector, Dictionary, Object, and dynamic classes!
#1 by Matan Uberstein on July 18th, 2011 ·
Wow! This is an eye opener for me! My notion was the same as yours e.g. Casting = Slower. This is clearly not the case and come to think of it, it makes sense, I guess…
Thanks for sharing, really love all your posts. :)
#2 by makc on July 18th, 2011 ·
what about item = MyItem (items[index]); ? I always thought “as” brings additional overhead of safety, converting non-MyItem-s to null-s, where MyItem (…) will throw an RTE.
#3 by skyboy on July 18th, 2011 ·
asis actually substantially faster than the function-style cast, if I recall earlier tests correctly.#4 by jackson on July 18th, 2011 ·
You do recall correctly. I converted the test to use function call-style casting (as suggested by makc) and the overhead made the cast version a little slower than the no-cast version.
#5 by makc on July 18th, 2011 ·
people are checking this on wonderfl, and overall seem to have an agreement with your results, but not me :) http://wonderfl.net/c/q0Fr <- here I have both "as" and f-style cast slower than untyped access (but "as" is still faster than f-style cast as you said).
#6 by makc on July 18th, 2011 ·
wait a minute, there was copy-paste error in that code, now when I fixed it, “as” turns out to be slowest one
#7 by makc on July 18th, 2011 ·
but for MovieClip-s “as” is clear winner http://wonderfl.net/c/vQnz seems to be in line with what focus reorted earlier
#8 by intoxopox on August 12th, 2011 ·
With regard to the speed of as versus direct function-style casting, I have found that it really depends on the class. See my old post here: http://www.professorfripples.com/blog/?p=123
Those tests still stand up for me.
#9 by skyboy on July 18th, 2011 ·
The speed up is definitely coming from the JIT here, possibly knowing the size of what’s being extracted allows for lookups to be directly inlined when generating machine code instead of passing through the native index function.
What I’m curious about now, is whether or not saving
MyItemto a variable will result in more of a speed gain, or remove it; as well as if an index into the multiname pool is faster (the class would need to be public and in its own package).The latter may offer no change, or it may speed it up. The former could do anything.
One also has to wonder if a normal dot lookup to a non-dynamic property is sped up from this, or slowed down.
While probably slowed down, apparently we can never know for certain without testing.
#10 by jackson on July 18th, 2011 ·
I’m not sure what you mean by saving
MyItemto a variable. I tried doing that and the result of theascast becomes anObject, so I got this error:Changing the type of
itemtoObjectfixed the warning, but the speed was much slower. It was more on-par with the no-cast version.As for regular, non-dynamic dot access, I did try it but there was no speedup so I didn’t post the results.
#11 by skyboy on July 18th, 2011 ·
Something like:
var mIRef:Class = MyItem; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i] as mIRef; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Vector", noCastTime, castTime);If it still shows the same speedup, then this would also be a good test; Since flash can’t determine the type statically:
var mIRef:Class = itemsVector[0].constructor; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i]; } afterTime = getTimer(); noCastTime = afterTime-beforeTime; beforeTime = getTimer(); for (i = 0; i < len; ++i) { item = itemsVector[i] as mIRef; } afterTime = getTimer(); castTime = afterTime-beforeTime; row("Vector", noCastTime, castTime);While both are of limited use, generic code, such as what’s in a library, could benefit or suffer from any differences between this and an equivalent constant lookup.
#12 by skyboy on July 18th, 2011 ·
I just ran a test similar to the second one by implementing it in my fastSort class for vector sorts, pre-sorted:
So this advantage is definitely centered on load-time known types, and not on run-time determined types.
A bit of a shame, this could have pushed my sort method to match or beat sortOn on all systems; not just the older systems where the main advantage is the in-place sorting.
#13 by ben w on July 18th, 2011 ·
wow interesting find!!
I used to have casting in my old 3d engine but took time stripping it all out as a number of tests showed it was definitely slower!
This must be a newly added optimisation… do you have a way to test is across different flash versions! Also have you tried it with inbuilt classes i.e. Vector3D or something like that?
ben
#14 by jackson on July 18th, 2011 ·
I’ve tested back a few versions and found that it works in 10.2 and 10.1, but 10.0 wasn’t showing the speedup. Perhaps it was introduced in 10.1…
And yes, it does work for
Vector3D. However, the speedup seems to be much reduced on top-level classes likeString.#15 by Mark on July 18th, 2011 ·
Wow, that’s a massive speed increase. I can kind of understand why this would be beneficial for an Array, as all of the elements are untyped, but I don’t understand the speed improvements for a Vector. Although, thinking about it, you don’t necessarily have to reference an element in a Vector to a variable of the same type (interfaces, subclasses, etc). So I guess there could be speed increases there too.
Regardless, it’s an awesome spot!
Mark
#16 by pleclech on July 18th, 2011 ·
I think you should test something like that :
http://pastebin.com/3uhrxCRs
You will be surprised but not on the right way.
Best,
Patrick.
#17 by jackson on July 18th, 2011 ·
Hey Patrick,
The code snippet in the link looks just like a regular loop over a
Vector. Am I missing something?#18 by pleclech on July 18th, 2011 ·
Well in your test you didn’t use the Class after the as , what i added is a simple usage of the class for example read of a property or calling a method, and the result are terrible.
#19 by jackson on July 18th, 2011 ·
Ah, I see now. I too see very poor results with the
ascast when accessing a property or calling a method ofitem. However, there are other cases where I can insert some extra code into the loop and still see better performance with theascast:Math.sqrt(64),item = item,item2 = item, oritem = null. As skyboy correctly points out, these results are due to something going on in the JIT. I’ve looked at the bytecode for the property access and method call versions and nothing stands out.Anyone have any more ways that the item can be used once it’s been quickly accessed using the
ascast?#20 by focus on July 18th, 2011 ·
Tested with top-level class (like String, Number, Object) typed variables in the array.
Got the opposite results.
#21 by jackson on July 18th, 2011 ·
Interesting find. I’ve just tried
Stringandintand there was either a minor (~5%) speedup or no speedup at all. Built-in classes that are not on the top-level (e.g.Vector3D) do get the speedup though.#22 by Mark Knol on July 18th, 2011 ·
Cool to see there is room more performance boosts, even with some relative simple casting. Thanks for testing my blogpost + the link in your article :)
#23 by Jonas on July 18th, 2011 ·
Different results with FP 10.1 standalone & 11plugin (flex_sdk_4.5.1.21328)
WIN 10,1,52,14 StandAlone
Type,No Cast Time,Cast Time
Array,133,219
Vector,134,220
Dictionary,321,402
Object,333,433
Dynamic Class,338,427
WIN 11,0,0,58 PlugIn
Type,No Cast Time,Cast Time
Array,121,216
Vector,119,211
Dictionary,316,389
Object,292,396
Dynamic Class,299,397
#24 by jackson on July 18th, 2011 ·
Interesting. I am seeing the speedup on Windows with Flash 11 plugin, Flash 10.2 standalone, and Flash 10.1 standalone but not in Flash 10.0 standalone. All of this is with a 2.8 Ghz quad core Xeon W3530 on Windows 7. What system and OS did you run the tests on?
#25 by Jonas on July 18th, 2011 ·
Hello, I run the test with a 2,67 GHz Core i5 on Win 7 64 bit.
#26 by Daniel on July 18th, 2011 ·
I tried the test in 10.3 and 11.0 inside browser as well as 10 standalone and the results seem pretty consistent.
http://wonderfl.net/c/6Cin
I’ve also tried using int instead of MyItem, and there is little difference.
http://wonderfl.net/c/6kns
I did however notice in the standalone DEBUG version my results were more in line with janas’ results.
the project was compiled for release, not debug, but was ran inside the debug player.
#27 by jackson on July 18th, 2011 ·
I’m not sure what you’re reporting here. Which versions are you seeing the cast speedup with and which are you not?
Also, I try to never test under debug as end users do not run it.
#28 by Daniel on July 18th, 2011 ·
I’ll summarize :)
I see the speedup everywhere I tested except for standalone-debug.
The reason I’m posting this is to give some perspective to others that are seeing these reverse results so that they test results in non-debug version of flash player.
#29 by jackson on July 18th, 2011 ·
Thanks for clarifying. :)
#30 by Lincoln on July 18th, 2011 ·
Great article. What are you using to view the compiled byte code? Are you doing it by hand or is there a tool you use?
#31 by jackson on July 18th, 2011 ·
I used the nemo440 AIR app. It hasn’t been updated in a while, but it still works. :)
#32 by Lincoln on July 19th, 2011 ·
Awesome, thanks a lot!
#33 by Lincoln on July 19th, 2011 ·
Hmm, it works on basic asset swfs published from Flash, but our main project swf has the following error.
Error: Error: Unknown format 0xb535743
It also seems to happen whenever I compile any swfs using Flash Builder 4.5. What are you using to compile your swfs and what sdk version are you using (if that matters)?
#34 by skyboy on July 19th, 2011 ·
Nemo440 only works for certain target versions. Up to 10.1 I think.
#35 by jackson on July 19th, 2011 ·
This is true. However, it’s easy to compile for an earlier version. I compiled the test app from the command line like so:
#36 by Lincoln on July 20th, 2011 ·
Gotcha. Thanks again guys!
#37 by Jonas on July 18th, 2011 ·
Oh… Same test on a non debug player (10.3) :
WIN 10,3,162,28 ActiveXType,
No Cast Time,Cast Time
Array,103,49
Vector,96,45
Dictionary,263,209
Object,255,199
Dynamic Class,255,200
Vector + var access,104,159
Vector + method call,139,205
I added the pleclech “Vector + var access” and “Vector + method call” tests … http://pastebin.com/Lh2R1UNY
In this case it seems that cast is less efficient.
#38 by skyboy on July 18th, 2011 ·
While thinking about this, I realized the JIT might actually be removing operations, treating them as dead code. Jonas’s tests seem to verify this.
A few simple and very quick operations seem to negate the DCE because it’s not complex enough.
var t:Boolean = true; for(i = 0; i< len; ++i) { t = Boolean(int(t) & int(itemsVector[i] as MyItem != null)); } log(t); t = true; for(i = 0; i< len; ++i) { t = Boolean(int(t) & int(itemsVector[i] != null)); } log(t);That code should be able to avoid any DCE, but the operations are all very fast and O(1) so do not impact the results in any significant way.
Replacing short circuited boolean operators (&&, ||) with bitwise operations and int/boolean conversions where applicable (e.g., where the operations are not more expensive than the jumps themselves) could be an article itself, since it provides significant performance improvements.
#39 by jackson on July 19th, 2011 ·
I don’t think it’s DCE because that would result in an empty loop. Commenting out the loop body and running on the same machine as in the article (2.4 Ghz Intel Core i5, Mac OS X 10.6), I still get about ~25ms for the empty loop and the same 68ms for the
ascast version.As for your test code, it is indeed a way to take advantage of the
ascast approach. I’m seeing ~244ms without the cast and ~225ms with the cast on the same machine. I’ll have to write that article on using bitwise operators instead of logical operators. :)#40 by Tommy Reilly on July 19th, 2011 ·
https://bugzilla.mozilla.org/show_bug.cgi?id=672490
#41 by Nicolas on July 20th, 2011 ·
FYI, haXe (http://haxe.org) already does this optimization for you automatically.
It also does fast array-index lookup, for instance in AS3 you have to convert arr[x+1] into arr[int(x+1)] in order to get decent speed.
#42 by jackson on July 20th, 2011 ·
For this haXe code:
beforeTime = Lib.getTimer(); i = 0; while (i < len) { item = itemsArray[i]; ++i; } afterTime = Lib.getTimer();I get this bytecode:
00025) + 0:0 findpropstrict <q>[public]flash.utils::getTimer 00026) + 1:0 callproperty <q>[public]flash.utils::getTimer, 0 params 00027) + 1:0 convert_i 00028) + 1:0 setlocal_1 00029) + 0:0 pushbyte 0 00030) + 1:0 convert_u 00031) + 1:0 setlocal r6 00032) + 0:0 jump ->44 00033) + 0:0 label 00034) + 0:0 getlocal r5 00035) + 1:0 getlocal r6 00036) + 2:0 getproperty <l,multi>{[public]""} 00037) + 1:0 coerce <q>[public]::MyItem 00038) + 1:0 coerce <q>[public]::MyItem 00039) + 1:0 setlocal_3 00040) + 0:0 getlocal r6 00041) + 1:0 increment 00042) + 1:0 convert_u 00043) + 1:0 setlocal r6 00044) + 0:0 getlocal r6 00045) + 1:0 getlocal r4 00046) + 2:0 iflt ->33 00047) + 0:0 findpropstrict <q>[public]flash.utils::getTimer 00048) + 1:0 callproperty <q>[public]flash.utils::getTimer, 0 paramsI don’t see any
astypelateinstruction in there, which seems to indicate that haXe isn’t doing this exact optimization. I do see a doublecoercethough, so perhaps that’s meant to do the same thing.As for performance, on my 2.8 Ghz Xeon I’m getting about 94ms for the “no-cast” version, 55 ms for the “as cast” version, and 94ms for the haXe version. This would also seem to indicate that the optimization hasn’t been done, since the haXe version is performing on-par with the unoptimized “no-cast” version.
#43 by Ossi Rönnberg on July 20th, 2011 ·
Quite strange
OSX 10.6 Core 2 Duo Chrome running 10.3.181.36
Type,No Cast Time,Cast Time
Array,133,69
Vector,136,61
Dictionary,416,303
Object,415,292
Dynamic Class,413,297
Totally opposite results
OSX 10.6 Core 2 Duo Safari running 11.0.1.60
Type,No Cast Time,Cast Time
Array,229,388
Vector,223,427
Dictionary,543,711
Object,657,795
Dynamic Class,598,704
#44 by jackson on July 20th, 2011 ·
It seems like you’re using the debug version of Flash 11 and the release version of Flash 10.
#45 by Deril on July 29th, 2011 ·
This is interesting find, it’s curious how it is slower in debug version… but not release.
My best guess is that then op code is interpreted, virtual machine does not need to do some work on finding/setting objects type.., that is heavy if you don’t do this operator ‘as’ casting…
… why it’s not automatically happen with Vectors is a mystery…
#46 by Bram on August 3rd, 2011 ·
Mmm – 10 x slower on my machine ( MBP 5.2, 2.66 Intel Core Duo, 4GB memory ) after copy pasting your code.
Flash Player 10,3,181,14 release build, Safari 5.1, OS X 10.6.8
Type No Cast Time Cast Time
Array 1196 1356
Vector 1190 1322
Dictionary 1428 1573
Object 1412 1603
Dynamic Class 1399 1556
#47 by mani on September 1st, 2011 ·
Type,No Cast Time,Cast Time
Array,1312,1481
Vector,1283,1427
Dictionary,1585,1724
Object,1553,1722
Dynamic Class,1591,1730
Flash Player 10,3,181,14 debug, Google Chrome, OS X 10.6.8
-_-
#48 by jackson on September 1st, 2011 ·
I try to never test with the debug player. Not only are the results much slower than with the release player, but sometimes code that is faster in release will be slower in debug.
#49 by Javier on February 9th, 2012 ·
First, sorry about my English.
Jackson, this is very interesting and it´s going to help me a lot in some process consumer functions.
One thing I notice, is that if I do the test with FP 10.1 debugger it get like double time casting but with FP 10.1 not debugger it get same result like yours.
Another thing I don’t understand is way Vector casting is faster, if Vector is already casted.
Thanks for all this tips.
Have fun.
#50 by jackson on February 9th, 2012 ·
No worries about your English; my Spanish is even worse. :)
The debugger player is notoriously slow and inconsistent with the performance of the release player. For example, A may run faster than B in the debugger and then slower than B in the release player, even if both run much faster. Usually this is because the debugger punishes AS3 code heavily and native code (i.e. most of the Flash API) much less or not at all. For these reasons and that virtually no end users will ever run the debugger player, I’d recommend only performance testing with release players. This is the policy I’ve always had in this site’s articles.
As to why
Vectorcasting is faster, that is why the article has the word “amazing” in its title. You’re correct- It shouldn’t be faster at all as theVectoris already typed. Actually, that’s the whole purpose of aVector. The reason for the speedup remains a mystery to me.#51 by Javier on February 9th, 2012 ·
Thanks for the response :)
#52 by caboosetp on March 19th, 2013 ·
Environment: FP 11.6.602.180 (Not Debug), Apache Flex 4.9 with Air 3.7, Intel I7-2600k @ 3.4ghz
I was happy when I found this article as I thought I could get some code running faster. Since I’m using a different environment, I first ran your test code to make sure it worked the same and got similar results, aside from vector which seemed the same. This made sense since Vectors are type cast anyways and the newer libraries are supposed to be optimized.
Type,No Cast Time,Cast Time
Array,39,17
Vector,17,16
Dictionary,194,167
Object,184,161
Dynamic Class,184,165
Then I tried using the typecast in a Shellsort algorithm for the Number type and it exploded. Namely changing
t = data[i];tot = data[i] as Number;and
data[j] = data[ji];todata[j] = data[ji] as Number;and came out with
AS3 Shell Sort: 3452
AS3 Shell Sort on ordered elements: 958
AS3 Shell Sort Typecast: 13013
AS3 Shell Sort Typecast on ordered elements: 5609
Which is almost 4 times as slow. This blew my mind, so I decided to run the same benchmark you had, but using Number instead of a user defined class.
// .. rest of benchmark unchanged
var num:Number;
var itemsVectorN:Vector. = new Vector.(len);
for (i = 0; i < len; ++i){
itemsVectorN[i] = 3.14;
}
beforeTime = getTimer();
for (i = 0; i < len; ++i){
num = itemsVectorN[i];
}
afterTime = getTimer();
noCastTime = afterTime-beforeTime;
beforeTime = getTimer();
for (i = 0; i < len; ++i){
num = itemsVectorN[i] as Number;
}
afterTime = getTimer();
castTime = afterTime-beforeTime;
row("VectorN", noCastTime, castTime);
and got the result
VectorN,16,237as well as with nearly identical code for Array and gotArrayN,27,87I know someone made a post earlier about top level elements being slower, and you had found int to have a similar speedup rather than a slowdown, so I tried out int just to be sure and found it to much slower with a result of
VectorI,17,135I’m wondering if I made a mistake in the code, or if it’s the new libraries. It seems the casting is still faster for user defined classes but much slower for primitive data types. I’m not currently using the ASC 2.0 compiler though, but I know it does a very good job at optimizing. I’m curious if, with the newer libraries and possibly ASC 2.0, you still get primitive data types to be faster and if so why I don’t see the same results.
#53 by jackson on March 19th, 2013 ·
I happen to still have the original SWF (which I just uploaded for your testing) so I ran it in my new testing environment:
And got these results:
So with upgrades from Flash Player 10.3 to 11.6, OS X 10.6 to 10.8, and a Core i5 to a Core i7 it seems as though I’m still getting the same results on all but
Vector, albeit faster due to the improvement in hardware. You’re seeing no improvement inVectoreither, so our results match there. The rest of what you’re reporting in regards toNumberandintare showing huge performance losses where I found at the time a negligible change in performance compared to not casting. So I changed the test app to useNumberand got these results:ASC 1.0:
ASC 2.0:
And with
intI got these results:ASC 1.0:
ASC 2.0:
Again I concur with your findings. This seems to be a huge change that has occurred since the article was originally written. I have a hard time believing that it’s due to my upgraded hardware, probably doesn’t have anything to do with upgrading my OS, and ASC 2.0 isn’t making much of a difference beyond the normal “noise” in testing. So it seems that somewhere between Flash Player 10.3 and 11.6 there was a change that dramatically reduced the performance of casting the result of a lookup to a top-level class. That’s a shame and possibly even a bug.
Lastly, here are my results of the original test (
MyItem, not top-level classes) re-compiled with ASC 2.0:Vectoris now also worse off with the cast, but at least the optimization is still present withArray.#54 by skyboy on March 20th, 2013 ·
The dismal performance of Vector in the first test is probably because of the target version of flash in the SWF (and may impact casting for Number/int: different target versions occasionally use different code paths for compatibility’s sake).
The effect of Number/int performing worse with the cast may be due to changes in the VM, causing a conversion to a boxed type then back? The same penalty may not exist for generic top-level classes; if it does, then the penalty is stemming from the way the core classes are included with the VM (rather than boxing/unboxing), effectively containing them in a separate SWF and limiting how much can be optimized due to the different context.
#55 by Jeff Spicer on September 11th, 2013 ·
i’m no expert, but just thinking about how type casting might be done behind the scenes might explain why it’s faster to type cast items in a loop.
if the type is not known, then perhaps Flash Player must iterate the unknown object, looking for properties and methods, and then attempting to match them to the defined type.
But perhaps using “as” instead iterates the targeted type, matching it to the item’s vars/methods.
a test might be to have MyItem be a complex class, but the item[i] is of a type extended from MyItem and has extra params in it. or perhaps MyItem is an interface or something. Maybe some tests like that will give a clue as to how type casting is done behind the scenes.
If Flash Player does use different forms of type casting, then it would follow that the more base types like String and Number and the optimized Vector class would not benefit from type casting because the type is already known, and the extra step would slow it down.