Faster Logic With Bitwise Operators
Logical operators are necessary in every app, so it’s unfortunate that they are so slow in AS3. That’s why I was happy to see a potential alternative in a recent comment by Skyboy. Today’s article shows you how to do logic faster by avoiding logical operators (e.g. &&
) and using their bitwise (e.g. &
) operator counterparts instead.
First thing’s first, this is going to lead to some funky-looking code:
// instead of this... result = operand1 && operand2; // do this... result = Boolean(int(operand1) & int(operand2)); // if operand1 and operand2 are not Booleans, do this... result = Boolean(int(operand1 != null) & int(operand2 != null));
So you’re going to sacrifice some readability and do some extra typing if you want to use this method to improve your app’s speed. But improving your app’s speed is what this site is all about, so let’s do some performance testing to see if this funky code is worth it.
The following performance test simply does a bunch of statements like the above with Boolean
variables equaling false
and true
and Sprite
variables equaling null
and not-null
. For each of these variables, the app does 1, 5, and 10 &&
/&
operations.
package { import flash.display.*; import flash.utils.*; import flash.text.*; public class BitwiseLogic extends Sprite { private var __logger:TextField = new TextField(); private function row(...cols): void { __logger.appendText(cols.join(",")+"\n"); } public function BitwiseLogic() { var logger:TextField = __logger; logger.autoSize = TextFieldAutoSize.LEFT; addChild(logger); var beforeTime:int; var afterTime:int; var logicalTime:int; const REPS:int = 100000000; var i:int; var FALSE:Boolean = false; var TRUE:Boolean = true; var NULL:Sprite; var NOTNULL:Sprite = this; var bool:Boolean; row("Test", "&& Time", "& Time"); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = FALSE && FALSE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean(int(FALSE) & int(FALSE)); } afterTime = getTimer(); row("1-false", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = FALSE && FALSE && FALSE && FALSE && FALSE && FALSE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) ); } afterTime = getTimer(); row("5-false", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = FALSE && FALSE && FALSE && FALSE && FALSE && FALSE && FALSE && FALSE && FALSE && FALSE && FALSE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) & int(FALSE) ); } afterTime = getTimer(); row("10-false", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = TRUE && TRUE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean(int(TRUE) & int(TRUE)); } afterTime = getTimer(); row("1-true", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = TRUE && TRUE && TRUE && TRUE && TRUE && TRUE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) ); } afterTime = getTimer(); row("5-true", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = TRUE && TRUE && TRUE && TRUE && TRUE && TRUE && TRUE && TRUE && TRUE && TRUE && TRUE; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) & int(TRUE) ); } afterTime = getTimer(); row("10-true", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NULL && NULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean(int(NULL!=null) & int(NULL!=null)); } afterTime = getTimer(); row("1-null", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NULL && NULL && NULL && NULL && NULL && NULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) ); } afterTime = getTimer(); row("5-null", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NULL && NULL && NULL && NULL && NULL && NULL && NULL && NULL && NULL && NULL && NULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) & int(NULL!=null) ); } afterTime = getTimer(); row("10-null", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NOTNULL && NOTNULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean(int(NOTNULL!=null) & int(NOTNULL!=null)); } afterTime = getTimer(); row("1-not null", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) ); } afterTime = getTimer(); row("5-not null", logicalTime, (afterTime-beforeTime)); beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL && NOTNULL; } afterTime = getTimer(); logicalTime = afterTime - beforeTime; beforeTime = getTimer(); for (i = 0; i < REPS; ++i) { bool = Boolean( int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) & int(NOTNULL!=null) ); } afterTime = getTimer(); row("10-not null", logicalTime, (afterTime-beforeTime)); } } }
I ran the performance test with the following environment:
- Flex SDK (MXMLC) 4.5.1.21328, compiling in release mode (no debugging or verbose stack traces)
- Release version of Flash Player 10.3.181.35
- 2.4 Ghz Intel Core i5
- Mac OS X 10.7.0
And got these results:
Test | && Time | & Time |
---|---|---|
1-false | 241 | 237 |
5-false | 304 | 236 |
10-false | 1018 | 231 |
1-true | 238 | 243 |
5-true | 304 | 248 |
10-true | 506 | 234 |
1-null | 240 | 245 |
5-null | 596 | 227 |
10-null | 1034 | 241 |
1-not null | 242 | 232 |
5-not null | 862 | 227 |
10-not null | 1431 | 242 |
As the graphs make exceedingly clear, the bitwise versions start out just as fast as their logical counterparts but then remain totally flat as the number of operands rises. This is in stark contrast to the logical operator versions that scale more-or-less linearly as the number of operands increases. So at least there’s no need to use this bitwise trick with just one operator, but it really helps out as you pile on more and more operators.
Why do we see the above performance? To find out, I made some simple test functions to inspect their bytecode:
private function bitwiseBoolean(val:Boolean, bool:Boolean): void { bool = Boolean(int(val) & int(val)); } private function logicalBoolean(val:Boolean, bool:Boolean): void { bool = val && val; }
Here’s the bytecode you get for these: (with annotations by me)
// bitwiseLogical getlocal1 // get val (first operand) convert_b // convert val to a Boolean dup // duplicate val iffalse L1 // if Boolean(val) is false, go to block L1 pop // pop val off stack getlocal1 // get val (second operand) convert_b // convert val to a Boolean L1: convert_b // convert val to a Boolean (again) setlocal2 // set Boolean(val) to the bool parameter // bitwiseBoolean findpropstrict Boolean // get the top-level Boolean function findpropstrict int // get the top-level int function getlocal1 // get val (first operand) callproperty int (1) // call int(val) [first operand] findpropstrict int // get the top-level int function getlocal1 // get val (second operand) callproperty int (1) // call int(val) [second operand] bitand // int(val) & int(val) callproperty Boolean (1) // call Boolean(int(val) & int(val)) convert_b // convert the result to a Boolean setlocal2 // set the result to the bool parameter
The key difference is that the bitwise operator version does not jump to any blocks. Essentially, the logical operator version is doing an if
check for every operator but the bitwise operator version is simply doing bitwise arithmetic.
In conclusion, if you’re using more than one logical operator per statement in performance-critical code (e.g. looping over a large game map or 3D meshes), you may want to switch some of these statements to using bitwise operators. You could see a 7x performance increase!
Any other tips for speeding up logic in AS3? Let’s discuss it in the comments!
#1 by skyboy on August 1st, 2011 ·
Interestingly, it looks like there is some level of optimization going on for Booleans in the conditionals, for up to 5 variables, and looking at the interpreter’s source, there’s a clear reason why: up to 5 variables can be retrieved at once, but I’m shocked that this optimization isn’t applied as soon as you have a run of more than 5. I thought that trend would have been applied as a sort of modulo for more than 5.
However, the null/notnull tests should show better performance if you include Boolean comparisons (i.e,
!= null
). There are what looks like some native code optimizations for Booleans.It would probably also be good to note that the comparisons don’t necessarily have to be all in one go, reducing the number of &/| in total in a loop/oft. called function will show similar speed ups, potentially greater than a lookup table, particularly for something like this:
Boolean(int(i == 0x2D) | int(i == 0x2E) | (int(i > 0x2F) & int(i < 0x3A)) | int(i == 0x2B));
.#2 by jackson on August 1st, 2011 ·
I too am pretty surprised at the “bend” in the graph. Why not keep applying the optimization for the next five operands?
I modified the performance test from the article to do 10x the iterations for the 1-operator version and 2x the iterations for the 5-operator version, so all tests are doing the same total number of operators. The performance advantages are still there, but watered down (perhaps due to overhead from loops, etc.). Here’s what I get on a 2.8 Ghz Xeon W3530 running Windows 7: (with 10x fewer iterations)
#3 by skyboy on August 1st, 2011 ·
I upped it so each test has the same number of operations per loop, and found this:
I also moved it off into its own method that’s run on a click, resetting the field to “” on each click.
From this, it looks like the code in constructors (potentially limited to static class initializers ($cinit) and the document class initializer) is not optimized, and that the multi-varaible optimization is applied in other functions (5/10 tests being so similar) and that Booleans are handled faster than non-Booleans. int/uint may be handled similarly fast, based on the operator speed test from a while back.
Even an explicit Boolean conversion / ! operator would be faster for non-Booleans from what I’m seeing, with some special-case optimization for single tests. Bitops being a flat line across all tests due to the lack of branching. Going on this, it would be better to maintain a sorted Array / Vector with null objects / exceptions moved out of the way of the common case, and handled in a separate loop so that additional branch isn’t in the way, and if there are no branches, the loop can be unrolled (most often, probably no more than twice; maybe up to 8) to reap sometimes massive benefits. Especially on older hardware.
#4 by skyboy on August 1st, 2011 ·
I just added != null tests to the null/not null tests, and it had zero effect. Odd. Maybe it’s only for Boolean variables..?
And reducing the number of tests back to 1 / 5 / 10 and moving it into / out of the constructor showed no difference on my machine. A straight line from 225, 400, 560. Where did your arch come from? A blip in the test maybe?
Using plug-in FP 10.3 and compiling with mxmlc 4.5.1.
#5 by jackson on August 1st, 2011 ·
I just tried moving the 1/5/10 version out to a mouse click handler and get the same results- arch and all. I then made the mouse click handler version do the 10/10/10 test as I described above and got the same results as described above. All of this is on the same Core i5 Mac as in the article.
I’ve written a while ago about the mythical constructor slowdown and found nothing. Still, I keep it in the back of my head and would really like to see evidence that there is a real slowdown for some kinds of code, even though it may mean going back to fix dozens of my articles…
#6 by skyboy on August 1st, 2011 ·
Aaaand now I’m confused.
DCE off in mm.cfg (AS3DCE=0):
DCE on in mm.cfg but bool is included in each step (e.g., FALSE && bool / NOTNULL && bool)
I expect the Boolean results in the second test, but I expect the bitop results in the first. I don’t know what flash is doing, or if it’s even obeying mm.cfg anymore.
#7 by jackson on August 1st, 2011 ·
The “DCE off” tests look like mine except the 1-operator versions are a lot quicker with logical than bitwise operators.
The “DCE on” tests seem to swap bitwise operator performance between
Boolean
andSprite
tests and increase the time taken by 2-4x. Logical operator performance looks unchanged.It’s strange that neither “DCE on” nor “DCE off” give you the same results that you got above. Unless, of course, this test has been modified from above.
#8 by skyboy on August 1st, 2011 ·
What it looks like is happening to me is that DCE is applied weirdly (it would nice to have the final output) and the first result, along with yours are subject to it, making them not that useful, and additionally, the explicit Boolean cast isn’t optimized away, but looked it in the test you did a while back due to DCE removing it, and the related code. Reducing it to get/pops maybe? I don’t really understand what’s going on with it.
However, when I remove the Boolean cast and make a special int variable just for the int tests, I get this:
The bitop test still seems off to me (it doesn’t increase over time? is getlocal really that fast?) but the tests now align more closely to my expectations.
The modified test is here: http://pastebin.com/nuscwMTT
#9 by jackson on August 2nd, 2011 ·
getlocal
should be hitting CPU cache and therefore be really fast. Still, I too think the results are weird.#10 by as3isolib on August 1st, 2011 ·
Dangit Jackson, I gotta tell you, you are the reason I have refactored the as3isolib.v2 core render process about half a dozen times haha!!! Quit writing these articles or I will never get anything done.
J/k of course but wow, that is an impressive find. Looks like i have another test to run this evening, performance tweaking the heck out of the v2 core. Thanks for this excellent post.
#11 by jackson on August 1st, 2011 ·
Glad to hear you’re finding all the articles so useful. I can’t wait to see how far you can take v2! :)
#12 by as3isolib on August 7th, 2011 ·
So i gave this a go and found that my original code was more performant. Not sure that I am doing it right, but I basically have opt out conditionals where I check to see if something is needing rendering, if not based a series of 2-3 “compound” conditionals, then continue thru the render loop.
#13 by as3isolib on August 7th, 2011 ·
oh geez…. sorry about the formatting.
#14 by jackson on August 7th, 2011 ·
The first one looks right, but the latter two are using
&
when they should be using|
. For an example, see Skyboy’s comment above.#15 by as3isolib on August 8th, 2011 ·
ah….. yeah that might help. Thanks for pointing out. I will plug that fix in and give it a go, then report back to see if the results actually increase.
#16 by skyboy on August 8th, 2011 ·
The Boolean cast is unnecessary in an if statement and is costing more performance than it needs to (whether minor or significant).
#17 by as3isolib on August 10th, 2011 ·
I made the necessary adjustments, and I do see that now the bitwise operations are as performant as the compound conditionals. I do not see any improvement over the compounding conditionals however. This is probably because I am processing 1000+ (but not much more) objects per render pass and this particular logic is nominal in terms of performance degradation.
If and when I ever get around to implementing molehill APIs, I may be able to see a difference. This is SO useful nonetheless. Thanks Jackson and Skyboy.
#18 by jackson on August 10th, 2011 ·
Glad to hear you’re finding it useful. :-D
#19 by lyua on August 3rd, 2011 ·
cool investigation! and I would like to note, that ‘&&’ is more safe fore case when you construct conditions paying to account that second operand may not being verified, like
(var != null && var.property == value )
; ‘&’ operator calculates all conditions independently of result of part of condition, isn’t it? so, this one(Boolean(int(var != null) & int(var.property == value))
bring run-time error when ‘var’ comes as a ‘null’.#20 by jackson on August 3rd, 2011 ·
You are correct-
&
does not do “short circuiting” and could lead to problems if you think it will, such as thenull
case you point out. This technique is only applicable when you don’t need short circuiting.#21 by aconma on May 19th, 2013 ·
First, sorry for my poor English.
Please note that the result abow is only true if you run this code in Flash Player only (For example: browser)
If you complie and run in AIR, the result is:
Test,&& Time,& Time
1-false,313,1104
5-false,757,2207
10-false,1216,3716
1-true,312,1155
5-true,315,2263
10-true,665,3713
1-null,312,1143
5-null,355,2172
10-null,1249,3730
1-not null,309,1067
5-not null,937,2255
10-not null,2107,3724
(I editted comment for spelling reason)
#22 by aconma on May 19th, 2013 ·
I mean: above
#23 by jackson on May 19th, 2013 ·
Interesting results. Can you share more information about the compile and runtime environment you tested with? For example, was this AIR for Windows, Mac, iOS, or Android? Which device and with what specs? What compiler and with what settings?
#24 by aconma on May 19th, 2013 ·
Of course. The environment of the test above:
– AIR SDK 3.4 for desktop (extended desktop only), compiling in release mode (no debugging or verbose stack traces)
– Release version of Flash Player 11.4
– 2.3 Ghz Intel Core i3
– Windows 7 Ultimate 32Bit (6.1 Build 7600)
And other test, run in Flash Player 11.4 only (no debugging and tracing)
Test,&& Time,& Time
1-false,300,277
5-false,613,277
10-false,1051,292
1-true,313,266
5-true,395,276
10-true,690,280
1-null,281,277
5-null,614,290
10-null,1100,276
1-not null,309,276
5-not null,910,275
10-not null,2080,277
#25 by vishal on January 11th, 2015 ·
The most simple way to check for n’s divisibility by 9 is to do n%9.
Another method is to sum the digits of n. If sum of digits is multiple of 9, then n is multiple of 9.
The above methods are not bitwise operators based methods and require use of ‘%’ and ‘/’. The bitwise operators are generally faster than modulo and division operators. Following is a bitwise operator based method to check divisibility by 9.
you can check this link also for example :-
http://www.firmcodes.com/check-number-multiple-9-using-bitwise-operators/