Is pre-increment really faster than post increment? Part 1

If you are a C++ programmer, I’ll bet at some point in time, maybe during a code review, someone told you “hey you should use a pre-increment there because post-increments are slower”.

I had heard this some years ago in the context of for loops with the caveat of “they might have fixed it by now [the compiler], but why take the chance”. Fair enough I thought, and I started using pre-increment in for loops, but kept using post increment in other places because i felt it felt more natural. Also i don’t write code that makes it actually matter. I try to be explicit instead of clever to prevent bugs and such. For instance, lots of C++ programmers with years of experience probably wouldn’t be 100% sure about what order things happen in for this small code sample: *++p = 3;

To be more explicit, you could increment p on one line and then set *p to 3 on the next. That’s easier to get your head around because it’s more explicit. The optimizer can handle the details of combining the code, and as we will see a little bit later, code generation without an optimizer seems to do just fine too!

Ayways, I heard this again today during a code review. Someone saw i had a piece of code where i was incrementing an unsigned int like this (not as part of a for loop, just inside of an if statement): numEnables++;

They said “you should change that to a pre-increment because it’s faster”. Well, I decided I should have a look and see if that was true and that’s where today’s journey begins (:

Disclaimer: All my findings are based on what I’ve seen in MSVC 2010 and 2012 using default settings for debug and release building a win32 console application. When in doubt, check your own disassembly and make sure your code is doing what you think it is. I wish I had years ago so I would have known the truth back then.

Will It Blend?

Lets check out the assembly generated in release of a simple program to see what assembly it makes

int main(void)
{
int someNumber = 3;
someNumber++;
return 0;
}

Here is the resulting code from the disassembly window. The assembly code of our program is in the blue box:

PS you can find the diasassembly window by setting a breakpoint on someNumber++, running the program and when it hits the breapoint, going under the “Debug” menu, selecting “Windows” and then selecting “Disassembly”.

prepost1

Ok so what the heck happened to our program? All it’s doing is xoring the eax register against itself (to set it to zero) and then it’s calling “ret”. That is our “return 0” statement. Everything else got optimized away! The optimizer realized that nothing meaningful changes if it doesn’t calculate someNumber, so it decides not to.

Let’s try a printf to print out our number. That way the optimizer CAN’T optimize our code away.

#include
#include

int main(void)
{
int someNumber = 3;
someNumber++;
printf("someNumber = %i\r\n", someNumber);
return 0;
}

Now here’s the disassembly:

prepost2

Ok we are doing better i guess. we see a push, a call, an add and then our return 0 code (xor eax, eax and ret).

I included the watch window so that you could see the values that the code is working with. The push pushes our printf format string onto the stack, and then it calls printf. Then it has the line “add sp, 8”. What that does is move the stack pointer up 8 bytes. Parameters are passed on the stack, so that line is there to undo the 4 byte push of the printf format string, and also the 4 byte push of someNumber. It’s just cleaning up the stack.

But where the heck was someNumber pushed onto the stack? I actually have no idea… do you know? If you know, please post a comment, I’m really curious how that works LOL.

EDIT: Thanks to Christophe for unraveling this, he points out that the assembly window was just not showing all the instructions:

For some reason, your MSVC debug window screencap shows the actual main() asm debug starting at 0x10b1002, not 0x10b1000. If you look at the line above at 0x10b0fff, it shows some “add byte ptr [edx+4], ch”. Which is because MSVC incorrectly deduced that there were some code starting at 0x10b0fff and so it messes up the debug print out of the actual code at 0x10b1000 (which would be some “push 4” on the stack, which is the incremented “someNumber” var).

We are not quite there. The optimizer is doing some funky stuff, and I think it’s because the optimizer knows that someNumber is the number “4”. It doesnt have to do any run time calculations so it isn’t

To make it so the optimizer can’t optimizer our number away like that, lets have the user input someNumber so the compiler has no idea what the right value is until run time.

#include
#include

int main(void)
{
int someNumber = 0;
printf("Please enter a number.\r\n");
scanf("%i",&someNumber);
someNumber++;
printf("someNumber = %i\r\n", someNumber);
return 0;
}

And here’s the code generated. The important part (our increment) is in the blue box:

prepost3

Ok there’s our post increment code. Let’s change the post increment to a preincrement:

#include
#include

int main(void)
{
int someNumber = 0;
printf("Please enter a number.\r\n");
scanf("%i",&someNumber);
++someNumber;
printf("someNumber = %i\r\n", someNumber);
return 0;
}

And here’s the code it generates:

prepost4

If you compare this generated code to the previously generated code, you’ll see it’s the exact same (some memory addresses have changed, but the instructions are the same). This is pretty good evidence that for this type of usage case, it doesn’t matter if you use post or pre increment – in RELEASE anyways.

Well, you might think to yourself “the optimizer does a good job, but if using a pre-increment, maybe you can make your DEBUG code faster so it isn’t so painful to debug the program”. Good point! It turns out that in debug though, preincrement and post increment give the same generated code. Here it is:

prepost5a
prepost5b

So, it looks like in this usage case that the compiler does not care whether you use preincrement or post increment.

Let’s check out for loops real quick before calling it a day.

For Loops

Here’s our post increment for loop testing code. Note we are doing the same tricks as before to prevent the key parts from getting optimized away.

#include
#include

int main(void)
{
int someNumber = 0;
printf("Please enter a number.\r\n");
scanf("%i",&someNumber);
for(int index = 0; index < someNumber; index++)
printf("index = %i\r\n", index);
return 0;
}

Post and pre-increment generate the same code in release! Here it is:

prepost6

It turns out they generate the same code in debug as well:

prepost7a
prepost7b

Summary

Well, it looks like for these usage cases (admittedly, they are not very complex code), it really just does not matter. You still ought to check your own usage cases just to make sure you see the results I’m seeing. See something different? Post a comment!

In part 2 I’ll show you a case where using pre or post increment does matter. Here it is!

Is pre-increment really faster than post increment? Part 2

Comments

comments

About Demofox

I'm a game and engine programmer at Blizzard Entertainment and have been making games since 1990 (starting out with QBasic and TI-85 games) My shipped titles include: * Heroes of the Storm * StarCraft II: Heart of the Swarm & Legacy of the void * Insanely Twisted Shadow Planet (PC) * Gotham City Impostors (PC, 360, PS3) * Line Rider (PC, Wii, DS) I also like hiking, making music, learning cool new stuff and attempting the impossible.