Is pre-increment really faster than post increment? Part 2

In the first part of this blog post (Is pre-increment really faster than post increment? Part 1) I showed that it really doesn’t seem to matter if you use post or pre increment with simple integer types.

I then promised an example of where the choice of pre or post increment DOES matter, and here it is.

The long and the short of it is this…

  • When you pre increment a variable, you are changing the value of a variable, and the new value will be used for whatever code the pre increment is part of.
  • When you post increment a variable, you are changing the value of a variable, but the OLD value is used for whatever code the post increment is part of.

To make post increment work that way, it essentially needs to make a copy of the variable before the increment and return the copy for use by the code of which the post increment is a part of.

A pre increment has no such need, it modifies the value and everything uses the same variable. There is no copy needed.

The compiler / optimizer apparently does a good job of figuring out when it does or does not need to make a copy of integral types, but it doesn’t do as well with complex objects. Here’s some sample code to demonstrate this and the output that it generates in both debug and release.

Test Code

#include 
#include 
#include 

//=============================================================
class CScopeMessage
{
public:
	CScopeMessage(const char *label)
	{
		PrintIndent();
		printf("%srn", label);
		s_indentLevel++;
	}

	CScopeMessage(const char *label, int objectId)
	{
		PrintIndent();
		printf("%s obj %irn", label, objectId);
		s_indentLevel++;
	}

	CScopeMessage(const char *label, int objectId, int copyObjectId)
	{
		PrintIndent();
		printf("%s (obj %i) to obj %irn", label, objectId, copyObjectId);
		s_indentLevel++;
	}

	~CScopeMessage()
	{
		s_indentLevel--;
	}

	static void StartNewTest()
	{
		s_indentLevel = 0;
		s_lineNumber = 0;
		printf("rn");
	}

private:
	void PrintIndent()
	{
		s_lineNumber++;
		printf("%2i:  ", s_lineNumber);
		for(int index = 0; index < s_indentLevel; ++index)
			printf("  ");
	}

private:
	static int s_indentLevel;
	static int s_lineNumber;
};

int CScopeMessage::s_indentLevel = 0;
int CScopeMessage::s_lineNumber = 0;

//=============================================================
class CTestClass
{
public:
	CTestClass()
	{
		m_objectID = s_objectID;
		s_objectID++;

		// this is just noise in the test, but feel free to
		// comment out if you want to see for yourself
		//CScopeMessage msg("Constructing", m_objectID);
		m_value = new char[4];
		strcpy(m_value, "one");
	}

	CTestClass(const CTestClass &other)
	{
		m_objectID = s_objectID;
		s_objectID++;

		CScopeMessage msg("Copy Constructing", other.m_objectID, m_objectID);
		m_value = new char[strlen(other.m_value) + 1];
		strcpy(m_value, other.m_value);
	}

	~CTestClass()
	{
		CScopeMessage msg("Destroying", m_objectID);
		delete[] m_value;
	}

    // preincrement
	CTestClass &operator++()
	{
		CScopeMessage msg("Pre Increment", m_objectID);
		DoIncrement();
		return *this;
	}
 
	// postincrement
	CTestClass operator++(int)
	{
		CScopeMessage msg("Post Increment", m_objectID);
		CTestClass result(*this);
		DoIncrement();
		return result;
	}

	void DoIncrement()
	{
		CScopeMessage msg("Doing Increment", m_objectID);
	}

private:
	char *m_value;
	int m_objectID;

	static int s_objectID;
};

int CTestClass::s_objectID = 0;

//=============================================================
int main (int argc, char **argv)
{
	CTestClass test;
	{
		CScopeMessage msg("--Post Increment--");
		test++;
	}

	CScopeMessage::StartNewTest();
	{
		CScopeMessage msg("--Post Increment Assign--");
		CTestClass testB = test++;
	}

	CScopeMessage::StartNewTest();
	{
		CScopeMessage msg("--Pre Increment--");
		++test;
	}

	CScopeMessage::StartNewTest();
	{
		CScopeMessage msg("--Pre Increment Assign--");
		CTestClass testB = ++test;
	}

	system("pause");
	return 0;
}

Debug

Here’s the debug output:

prepostdebug

You can see that in the post increment operator, it calls the copy constructor not once but twice! The first copy constructor is called to create the “result” object, and the second copy constructor is called to return it by value to the caller.

CTestClass operator++(int)
{
	CScopeMessage msg("Post Increment", m_objectID);
	CTestClass result(*this);
	DoIncrement();
	return result;
}

Note that it can’t return the copy by reference because it’s a local variable. C++11’s “std::move” and xvalue type functionality is there to help with this stuff, but if you can’t use that tech yet, it isn’t much help hehe.

Interestingly, we can see that 2 copy constructors get called whether or not we assign the value returned or not.

On the pre-increment side, you can see that it only does a copy construction call if you assign the result. This is nice and is what we want. We don’t want extra object copies or memory allocations and deallocations.

Release

prepostrelease

Things are a little bit better in release, but not by much. The optimizer seems to have figured out that it doesn’t really need to do 2 object copies, since it only ever wants at most one REAL copy of the object, so it gets away with doing one object copy in both situations instead of two.

That’s an improvement, but still not as good as the pre-increment case which hasn’t visibly changed in release (not sure about the assembly of these guys, but if you look and find something interesting, post a comment!)

Summary

As always, you should check your own assembled code, or test your compiler with printf output like this post does to ensure you really know what your code is doing.

But… it seems like you might want to use pre-increment if you ever use increment operators for heavy weight objects (such as iterators), but if you want to use post increment for integral types, it ought to be fine.

That said, a lot of people say “just use pre-increment” because at worst it’ll be the same as a post increment, but at best it will be a lot more efficient.

You do whatever you want, so long as you are aware of the implications of going one way or the other 😛

A Super Tiny Random Number Generator

When I posted the last blog post about shuffling on the GameProgrammer.com mailing list, someone responded back with a super tiny random number generator that is actually pretty damn good. It is this:

x+=(x*x) | 5;

The high bit of X is the source of your random numbers, so if you want to generate an 8 bit random number, you have to call it 8 times. Apparently it passes a lot of tests for randomness really well and is a pretty high quality PRNG. Check this out for more info: http://www.woodmann.com/forum/showthread.php?3100-super-tiny-PRNG

You can start x at whatever you want, but it might take a few iterations to “warm up” especially if you start with a small seed (ie you may want to throw away the first 5-10 random bits it generates as they may not be that random). I adapted it into an example program below, along with some example output. I use time() to set the initial value of x.

#include 
#include 
#include 
#include 

// A super tiny prng
// http://www.woodmann.com/forum/showthread.php?3100-super-tiny-PRNG
//
unsigned int seed = 0;
unsigned int GenerateRandomBit()
{
	seed += (seed * seed) | 5;
	return seed & 0x80000000;
}

template 
void GenerateRandom(T& value)
{
	memset(&value, 0, sizeof(T));
	const unsigned int numBits = sizeof(T) * 8;
	unsigned int* dataPointer = (unsigned int *)&value;
	for (unsigned int index = 0; index < numBits; ++index)
	{
		if(GenerateRandomBit()) {
			unsigned int pointerIndex = index / 32;
			unsigned int mask = 1 << index % 32;
			dataPointer[pointerIndex] |= mask;
		}
	}
}

int main(int argc, char **argv)
{
	seed = (unsigned int)time(NULL);
	printf("seed = %urn", seed);

	printf("9 random uints...rn");

	for (unsigned int index = 0; index < 9; ++index)
	{
		unsigned int random;
		GenerateRandom(random);
		printf("%2u: %10u (%x)rn", index, random, random);
	}

	printf("3 random floats...rn");
	for (unsigned int index = 0; index < 3; ++index)
	{
		float f;
		GenerateRandom(f);
		printf("%2u: %f (%x)rn", index, f, *((unsigned int*)&f));
	}

	printf("8 random characters...rn");
	char text[8];
	GenerateRandom(text);
	for (unsigned int index = 0; index < 8; ++index)
	{
		printf("%2u: %crn", index, text[index]);
	}
	system("pause");
	return 0;
}

tinyprng1

tinyprng2

tinyprng3

tinyprng4

Fast & Lightweight Random “Shuffle” Functionality – FIXED!

In this post I’m going to show a way to make an iterator that will visit items in a list in a random order, only visit each item once, and tell you when it’s visited all items and is finished. It does this without storing a shuffled list, and it also doesn’t have to keep track of which items it has already visited.

This means you could have a list that is a million items long and no matter how many you have visited, it will only take two uint32s to store the current state of the iterator, and it will be very fast to get the next item in the list (somewhere around 1 to 4 times the cost of calling the rand() function!).

This is a follow up post to an older post called Fast & Lightweight Random “Shuffle” Functionality.

In that older post, things didn’t work quite like I expected them to so it was back to the drawing board for a little while to do some research and experimentation to find a better solution. I’m not going to go back over the basics I talked about in that article so go back and have a read there first if anything is confusing.

High Level

In the last post on this topic we talked about how the high level goal was to map the index to a random number, and because we were randomizing the bits in a deterministic (and non destructive) way, we needed to iterate over the whole “next power of 2” items and reject any that were too large. Only doing this could we be sure that we visited every index. The problem I hit last time though was that I could not get the numbers to look random enough.

To solve this, i decided what i needed to do was ENCRYPT the index with a block cipher. When you encrypt data, it should come out looking like random data, even though the data you put in may be sequential or have other easily seen patterns. What else is great, is that when using a block cipher, each unique input should equate to a unique output which means that if we encrypt the full power of 2 range as input, we will get the full power of 2 range out as output, but just in a different order.

Once I realized this was a good solution, my next problem to tackle was that I knew of no block algorithms that would work for a variable number of bits. There are block cipher algorithms that will work for LARGE number of bits, but there is no algorithm I knew of where you can tell it “hey, i only want to use 4 bits” or “i only want to use 5 bits”.

In the end the answer I found was to roll my own, but use existing, well established technology to do so. In my specific case, I’m also aiming for high speed functions since I want this functionality used in real time gaming applications.

What I came up with in the end is not cryptographically secure, but using the same techniques I have laid out, you should be able to drop in a different block cipher algorithm if that is a requirement.

Feistel Network

As it turns out, there is a basic primitive of cryptography called a Feistel Network. It’s used by quite a number of modern ciphers and hash functions, and it is surprisingly simple.

For a balanced Feistel Network, you split the data into a left and a right side, and do the following, which consists of a single round (you can do as many rounds as you like):

Left[i+1]  = Right[i];
Right[i+1] = Left[i] ^ RoundFunction(Right[i], key);

After performing however many rounds you wish, you combine the left and the right sides again to get your encrypted data.

To unencrypt, the feistel network works much the same but only in reverse, looking like the below:

Right[i] = Left[i+1];
Left[i] = Right[i+1] ^ RoundFunction(Left[i+1], key);

Check out the wikipedia page if you are interested in more info.

The neat thing about Feistel Networks is that the round function can be any deterministic function that performs whatever operations it wants – even destructive and irreversible operations such as division or bit shifts. Even so, the feistel network as a whole is reversible, no matter what you do in the round function, and you can unencrypt to get your origional data back.

This threw me for quite a loop and I couldn’t get my head around why this worked for a long while until I found a webpage that explained it pretty well. unfortunately I lost the link and cannot find it again but the basic idea is this… For each round of encryption, the right side is encrypted using the key and the left side. This means that at any point, no matter how many rounds you’ve done on your data, the right side should be able to be decrypted using the key and the left side. If you have the key, and you know how many rounds were used in encryption, you have all the data you need to decrypt it again. Hopefully that makes sense… I had to work it out on paper a little bit to see it fully.

The other great thing about Feistel Networks is that you can make them be however many bits you want. So, if i want each side of the Feistel Network to be 1 bit, I can do that. Or, if i want each side to be 128 bits, I can do that too!

You can also tune the quality / performance a bit by doing less or more rounds.

BTW the Tiny Encryption Algorithm uses a Feistel Network if you want to see a simple example in action.

With the “variable bit size support” problem solved, next I needed to come up with a round function that did a good job of taking sequential numbers as input and spitting out seemingly random numbers as output. Thanks to what I was saying before, the round function doesn’t need to be reversible, so there are a lot of options available.

I ended up deciding to go with a hash function, specifically Murmur Hash 2 (which I actually also used in my last post if you’d like to see some more information on it! The Incredible Time Traveling Random Number Generator).

Since the hash spits out numbers that might be anything in the range of an unsigned int, but I only want N bits, I just AND the hash against a mask to get the number of bits I want. There’s probably a higher quality method of smashing down the bits using XOR or something, but my quality needs aren’t very high so I just opted to AND it.

A downside of going with the balanced Feistel Network approach is that before this, I only had to round up to the next power of 2, but now, since each half of the data needs to be a power of 2, I actually have to make sure I have an even number of bits and have to round up to the next power of 4. This means that when it’s looking for valid indices to return in the shuffle, it may have to calculate up to 4 different indices on average before it finds a valid one. Not the greatest thing in the world, but also not the worst and definitely not a deal breaker in my eyes.

The Code

The code can be found on github at source.cpp

Use it in good health (:

Here are some example runs of the program as well.

Note: since writing this blog post, I’ve found out that my setup isn’t all that random and you can do better by changing the round function and/or number of rounds. If you don’t get the results you were hoping with using my code, the technique is solid, even if my implementation isn’t the best.

shuf1

shuf2

shuf3

shuf4

The Incredible Time Traveling Random Number Generator

It isn’t very often that you need a pseudo random number generator (PRNG) that can go forwards or backwards in time, or skip to specific points in the future or the past. However, if you are ever writing a game like Braid and do end up needing one, here’s one way to do it.

At the core of how this is going to work, we are going to keep track of an index, and have some way to convert that index into a random number. if we want to move forward in time, we will just increment the index. If we want to move backwards in time, we will just decrement the index. It’s super simple from a high level, but how are we going to convert an index into a random number?

There are lots of pseudo random number generators out there that we could leverage for this purpose, the most famous being C++’s built in “rand()” function, and another one famous in the game dev world is the Mersenne Twister.

I’m going to do something a little differently though as it leads well into the next post I want to write, and may be a little bit different than some people are used to seeing; I want to use a hash function.

Murmur Hash 2

Good hash functions have the property that small changes in input give large changes in output. This means that if we hash the number 1 and then hash the number 2, that they ought not to be similar output, they ought to be wildly different numbers in the usual case. Sometimes, just like real random numbers, we might get 2 of the same numbers in a row, but that is the desired behavior to have the output act like real random sequences.

There are varying levels of quality of hash functions, ranging from a simple string “hash” function of using the first character of a string (super low quality hash function, but super fast) all the way up to cryptographic quality hash functions like MD5 and SHA-1 which are a lot higher quality but also take a lot longer to generate.

In our usage case, I’m going to assume this random number generator is going to be used for game use, where if the player can discover the pattern in the random numbers, they won’t be able to gain anything meaningful or useful from that, other than at most be able to cheat at their own single player game. However, I really do want the numbers to be fairly random to the casual eye. I don’t want visible patterns to be noticeable since that would decrease the quality of the gameplay. I would also like my hash to run as quickly as possible to keep game performance up.

Because of that level of quality I’m aiming for, I opted to go with a fast, non cryptographic hash function called Murmur Hash 2. It runs pretty quick and it gives pretty decent quality results too – in fact the official Murmur Hash Website claims that it passes the Chi Squared Test for “practically all keysets & bucket sizes”.

If you need a higher quality set of random numbers, you can easily drop in a higher quality hash in place of Murmur Hash. Or, if you need to go the other way and have faster code at the expensive of random number quality, you can do that too.

Speed Comparison

How fast is it? Here’s some sample code to compare it vs C++’s built in rand() function, as well as an implementation of the Mersenne Twister I found online that seems to preform pretty well.

#include 
#include 
#include 
#include 
#include "tinymt32.h" // from http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/TINYMT/index.html

// how many numbers to generate
#define NUMBERCOUNT 10000000  // Generate 10 million random numbers

// profiling macros
#define PROFILE_BEGIN 
{ 
	LARGE_INTEGER freq; 
	LARGE_INTEGER start; 
    QueryPerformanceFrequency(&freq); 
    QueryPerformanceCounter(&start); 
		
#define PROFILE_END(label) 
	LARGE_INTEGER end; 
	QueryPerformanceCounter(&end); 
	printf(label " - %f msrn", ((double)(end.QuadPart - start.QuadPart)) * 1000.0 / freq.QuadPart); 
}

// MurmurHash code was taken from https://sites.google.com/site/murmurhash/
//-----------------------------------------------------------------------------
// MurmurHash2, by Austin Appleby

// Note - This code makes a few assumptions about how your machine behaves -

// 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4

// And it has a few limitations -

// 1. It will not work incrementally.
// 2. It will not produce the same results on little-endian and big-endian
//    machines.

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
	// 'm' and 'r' are mixing constants generated offline.
	// They're not really 'magic', they just happen to work well.

	const unsigned int m = 0x5bd1e995;
	const int r = 24;

	// Initialize the hash to a 'random' value

	unsigned int h = seed ^ len;

	// Mix 4 bytes at a time into the hash

	const unsigned char * data = (const unsigned char *)key;

	while(len >= 4)
	{
		unsigned int k = *(unsigned int *)data;

		k *= m; 
		k ^= k >> r; 
		k *= m; 
		
		h *= m; 
		h ^= k;

		data += 4;
		len -= 4;
	}
	
	// Handle the last few bytes of the input array

	switch(len)
	{
	case 3: h ^= data[2] << 16;
	case 2: h ^= data[1] <> 13;
	h *= m;
	h ^= h >> 15;

	return h;
}

void RandTest()
{
	for(int index = 0; index < NUMBERCOUNT; ++index)
		int i = rand();
}

unsigned int MurmurTest()
{
	unsigned int key = 0;
	for(int index = 0; index < NUMBERCOUNT; ++index)
		key = MurmurHash2(&key,sizeof(key),0);
	return key;
}

// g_twister is global and inited in main so it doesnt count towards timing
tinymt32_t g_twister; 
unsigned int TwisterTest()
{
	unsigned int ret = 0;
	for(int index = 0; index < NUMBERCOUNT; ++index)
		ret = tinymt32_generate_uint32(&g_twister);
	return ret;
}

int main(int argc, char**argv)
{
	// rand() test
	PROFILE_BEGIN;
	RandTest();
	PROFILE_END("rand()");

	// hash test
	unsigned int murmurhash;
	PROFILE_BEGIN;
	murmurhash = MurmurTest();
	PROFILE_END("Murmur Hash 2");

	// twister test
	g_twister.mat1 = 0;
	g_twister.mat2 = 0;
	tinymt32_init(&g_twister, 0);
	unsigned int twister;
	PROFILE_BEGIN;
	twister = TwisterTest();
	PROFILE_END("Mersenne Twister");

	// show the results
	system("pause");

	// this is here so that the murmur and twister code doesn't get optimized away
	printf("%u %urn", murmurhash, twister);

	return 0;
}

Here's the output of that code run in release on my machine, generating 10 million random numbers of each type. You can see that murmurhash takes about 1/3 as long as rand() but is not quite as fast as the Mersenne Twister. I ran this several times and got similar results, so all in all, Murmur Hash 2 is pretty fast!

mrmrrandperf

Final Code & Sample Output

Performance looks good but how about the time traveling part, and how about seeing some example output?

Here’s the finalized code:

#include 
#include 
#include 

// MurmurHash code was taken from https://sites.google.com/site/murmurhash/
//-----------------------------------------------------------------------------
// MurmurHash2, by Austin Appleby

// Note - This code makes a few assumptions about how your machine behaves -

// 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4

// And it has a few limitations -

// 1. It will not work incrementally.
// 2. It will not produce the same results on little-endian and big-endian
//    machines.

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
	// 'm' and 'r' are mixing constants generated offline.
	// They're not really 'magic', they just happen to work well.

	const unsigned int m = 0x5bd1e995;
	const int r = 24;

	// Initialize the hash to a 'random' value

	unsigned int h = seed ^ len;

	// Mix 4 bytes at a time into the hash

	const unsigned char * data = (const unsigned char *)key;

	while(len >= 4)
	{
		unsigned int k = *(unsigned int *)data;

		k *= m; 
		k ^= k >> r; 
		k *= m; 
		
		h *= m; 
		h ^= k;

		data += 4;
		len -= 4;
	}
	
	// Handle the last few bytes of the input array

	switch(len)
	{
	case 3: h ^= data[2] << 16;
	case 2: h ^= data[1] <> 13;
	h *= m;
	h ^= h >> 15;

	return h;
}

class CReversablePRNG
{
public:
	CReversablePRNG()
	{
		m_index = 0;
		m_seed = 0;
	}

	unsigned int NextNumber()
	{
		unsigned int ret = MurmurHash2(&m_index, sizeof(m_index), m_seed);
		m_index++;
		return ret;
	}

	unsigned int LastNumber()
	{
		unsigned int lastIndex = m_index - 2;
		unsigned int ret = MurmurHash2(&lastIndex, sizeof(lastIndex), m_seed);
		m_index--;
		return ret;
	}

	// to be able to save / restore state for a save game or whatever else
	void GetState(unsigned int &index, unsigned int &seed)
	{
		index = m_index;
		seed = m_seed;
	}

	void SetState(unsigned int index, unsigned int seed)
	{
		m_index = index;
		m_seed = seed;
	}

private:
	unsigned int m_index;
	unsigned int m_seed;
};

int main(int argc, char**argv)
{
	// create and seed our random number generator.  If two similar numbers are hashed
	// they should give very different results usually, so for a seed, we can hash the
	// time in seconds, even though the number from run to run should be really similar
	CReversablePRNG prng;
	unsigned int currentTime = time(NULL);
	unsigned int seed = MurmurHash2(&currentTime, sizeof(currentTime), 0x1337beef);
	prng.SetState(0, seed);

	// display our seed and our table header
	printf("seed = %urn", seed);
	printf("index | raw number | mod 10rn");
	printf("---------------------------rn");

	// generate 10 numbers forward
	for (int index = 0; index < 10; ++index)
	{
		unsigned int nextNumber = prng.NextNumber();
		printf("%2i    | %10u | %urn", index, nextNumber, nextNumber % 10);
	}

	// generate 3 numbers back
	printf("rn");
	for (int index = 0; index < 3; ++index)
	{
		unsigned int lastNumber = prng.LastNumber();
		printf("%2i    | %10u | %urn", 8 - index, lastNumber, lastNumber % 10);
	}

	// generate 5 numbers forward
	printf("rn");
	for (int index = 0; index < 5; ++index)
	{
		unsigned int nextNumber = prng.NextNumber();
		printf("%2i    | %10u | %urn", 7 + index, nextNumber, nextNumber % 10);
	}

	system("pause");

	return 0;
}

mrmrout4

mrmrout3

mrmrout2

mrmrout1

Next Up

Hopefully you enjoyed this post!

Next up I’m going to be applying this code to the problem of shuffling to continue on from the post where I tried to do that before: Fast & Lightweight Random “Shuffle” Functionality.