The Black Art of sizeof() aka Compile Time Type Deduction

Over the past couple decades of C++ programming, I’ve found sizeof useful, if relatively boring, and have found it handy for making things like memcpy and fread usable in a safe way. Little did I know that there were dark secrets lurking just below the surface, and that to truly understand sizeof is to dangle precariously above the black pit of infinity and see the secrets of creation laid bare.

I might be dramatizing things a little bit, but not by much (;

Sizeof Basics

Let’s start with the basics before working to the really interesting stuff:

#include <stdio.h>
void main(void)
{
	printf("size = %i",sizeof(int));
}

Running the above gives the output “size = 4”.

Luckily, instead of always having to give a type as a parameter to sizeof, we can also give a variable name. This saves us typing (and reduces potential bugs) when we change the type of a variable. For instance the below gives the same output.

#include <stdio.h>
void main(void)
{
	int index = 3;
	printf("size = %i",sizeof(index));
}

What’s great about the above is you can change what type index is defined as, and you don’t have to update the sizeof call. It knows what type index is and will always print out the right size for you. The less things you have to do manually or remember, the easier it is to maintain code and the less bugs you are likely to have.

Ok next up… sizeof() happens at compile time. It isn’t a function call, it does it’s work at compile time and has no runtime cost. This code below proves that and gives the same output:

#include <stdio.h>

int index = 3;

enum
{
	c_sizeOfIndex = sizeof(index)
};

void main(void)
{
	printf("size = %i",c_sizeOfIndex);
}

A Little More Interesting

Did you know that you can “call functions” inside of sizeof? Check out the below which gives the same output again:

#include <stdio.h>

int myFunc()
{
	return 3;
}

enum
{
	c_sizeOf = sizeof(myFunc())
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}

But wait a second… if sizeof happens at compile time, how is it able to call a function? Does it evaluate the function at compile time somehow? Does it just do a sizeof on the function pointer?

What happens is it doesn’t actually evaluate anything that you pass to sizeof. it just looks at what’s inside, figures out what type it evaluates to, and gives you the size of that resulting type.

In the example above, it’s just looking at the return type of myFunc and giving you the sizeof that. That’s all ūüėõ

To prove that it doesn’t actually execute the function, check this program out, which gives the same output and does not crash!

#include <stdio.h>

int myFunc()
{
	// this is safe, right?
	((int *)0)[0] = 0;

	// this is too right?
	int a = 0;
	int b = 3;
	int c = b / a;

	// this text never gets printed out
	printf("sup dawg, i heard you liked...");

	// lastly, infinite recursion..  i feel like i'm trying to kill rasputin...
	return 3 + myFunc();
}

enum
{
	c_sizeOf = sizeof(myFunc())
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}

here’s one more interesting example, to show that sizeof has all the power that the compiler has to figure out data types at compile time, without actually evaluating any of the code. Running the below, you can see by observation that the result of what’s inside sizeof is a bool, and when you run the program, it reports sizeof(bool) which is 1 for me on my machine.

#include <stdio.h>

int myFunc(int someNumber)
{
	// this is safe, right?
	((int *)0)[0] = 0;

	// this is too right?
	int a = 0;
	int b = 3;
	int c = b / a;

	// this text never gets printed out
	printf("sup dawg, i heard you liked...");

	// lastly, infinite recursion..  i feel like i'm trying to kill rasputin...
	return someNumber + myFunc(someNumber * 2);
}

int someGlobalVariable = 3;

enum
{
	c_sizeOf = sizeof((myFunc(someGlobalVariable) * someGlobalVariable << 1) == 0)
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}
&#91;/cpp&#93;

So basically, we now know that we can pass any arbitrarily complex expression to sizeof - even ones that call functions and use variables - and the compiler will figure out the resulting type and give us the result of that.

<h2>True Power Revealed</h2>

The real power with sizeof comes in when we start using function overloads.  If we have 2 versions of the same function, we can pass some parameters to that function, and the compiler will figure out which function is the best match.  If each function has differently sized return types, we can use sizeof to know which one the compiler chose for any given parameters.  Even better... we can know this at compile time with absolutely no run time cost.

Check out what I mean:


#include <stdio.h>

int someFunc(int someNumber);
char someFunc(const char *someString);

int myNumber = 3;
const char *myString = "hello!";

enum
{
	c_size1 = sizeof(someFunc(myNumber)),
	c_size2 = sizeof(someFunc(myString))
};

void main(void)
{
	printf("sizes = %i,  %i", c_size1, c_size2);
}

If you run that program, it prints out “sizes = 4, 1”. We now have a program that can tell at compile time whether a value (and even a variable) is a string or an int. Note that we don’t actually need to define bodies for our functions, because they aren’t ever actually called.

What if we wanted to just make it be able to tell us if it was a string or not, and we didn’t want to have to make a separate function (and return type) for each possible type to be able to tell it whether it was a string or not? Luckily we can very easily.

When the compiler tries to figure out the best matching function for a given function call, if there is a variable argument function that is a possibility (a function that has … as it’s parameter list), that function will always be least wanted. It’s what you get if nothing else matches. In our situation we’re going to use that kind of a function effectively as an “else” statement. Check it out:

#include <stdio.h>

char stringTester(const char *someString);
int  stringTester(...);

int myNumber = 3;
const char *myString = "hello!";

void main(void)
{
	if (sizeof(stringTester(myNumber)) == sizeof(char))
		printf("myNumber is a string\r\n");
	else
		printf("myNumber is not a string\r\n");

	if (sizeof(stringTester(myString)) == sizeof(char))
		printf("myString is a string\r\n");
	else
		printf("myString is not a string\r\n");
}

Pretty neat right? Running that program will tell you that myNumber is not a string, but myString is a string, just like you already know.

Even though we are using if statements above, it’s still a compile time check and will have no runtime costs, and the if statements should be completely eaten away by the optimizer (check the assembly to see for yourself!)

Lets clean up that last bit of code a bit to be a bit more generalized:

#include <stdio.h>

char stringTester(const char *someString);
int  stringTester(...);

#define REPORT_IF_STRING(x) \
	if (sizeof(stringTester(x)) == sizeof(char)) \
		printf(#x " is a string\r\n"); \
	else \
		printf(#x " is not a string\r\n"); \

int myNumber = 3;
const char *myString = "hello!";

void main(void)
{
	REPORT_IF_STRING(myNumber);
	REPORT_IF_STRING(myString);
}

That gives the exact same output, but as you can see, is a bit more generalized… something you could re-use in code easily.

If you look at the final compiled code (in release), you’ll basically just see 2 printf statements, because the rest of it all happened at compile time.

Working with only testing if a variable is a string or not a string, and printing that result is not exactly something you are likely to need on a daily basis, but there are a lot more useful examples. For instance, what if you wanted to be able to tell whether one object’s type could safely be cast to another object type?

You might be asking “Isn’t that what RTTI and dynamic_cast are for?”. Yep. But what if you could determine this at compile time, so that there was no cost to the dynamic cast, and the code was optimized to the point of not even having an if statement to check the type… it just did the right thing at runtime because it already knew. You’d end up with some pretty fast code!

Here’s how you might do that for a specific class hierarchy:

#include <stdio.h>

class CFruit {};
class CApple : public CFruit {};
class CBanana: public CFruit {};
class CPeach : public CFruit {};

class CVegetable {};
class CCarrot: public CVegetable {};
class CCelery: public CVegetable {};

char FruitTester(const CFruit &);
int  FruitTester(...);

#define REPORT_IS_FRUIT(x) \
	if (sizeof(FruitTester(x)) == sizeof(char)) \
		printf(#x " is a fruit\r\n"); \
	else \
		printf(#x " is not a fruit\r\n"); \

CFruit  fruit;
CApple	apple;
CBanana banana;
CPeach  peach;

CVegetable vegetable;
CCarrot    carrot;
CCelery    celery;

void main(void)
{
	REPORT_IS_FRUIT(fruit);
	REPORT_IS_FRUIT(apple);
	REPORT_IS_FRUIT(banana);
	REPORT_IS_FRUIT(peach);
	REPORT_IS_FRUIT(vegetable);
	REPORT_IS_FRUIT(carrot);
	REPORT_IS_FRUIT(celery);
}

Running that program will give you output like the below:

fruittest

Again, if you look at the compiled code in release, you’ll see what is effectively just a series of printf statements, because all the checking of types and inheritance happened at compile time and there was no runtime cost. Check out the disassembly below. It’s just pushing the address of the strings to show onto the stack, and then calling printf to show the string. No if statements, no jumps, nothing at all at runtime other than printing the right string:

disassembly

There is a way to generalize that so that you pass the type in that you want to test against, or other random variations for how it might function in a general case, but I leave that to you to figure out!

Not Quite a Dynamic Cast

Ok so the above is not quite a dynamic cast. This is an upcast and gives you the ability to tell if an object derives from a specific type, but it can’t work in the opposite direction which is what a dynamic cast can do.

What the above is good for is mainly for situations where you don’t know the object type because it came in as a macro parameter, or as a template parameter, but you still want to do specific logic on it if it’s a certain type, or derives from a certain type. That’s where this stuff comes in useful.

There may be a clever way to do a downcast (dynamic cast) using this functionality, but at this point in time, I’m not sure of how you might do that ūüėõ

Other Uses

Looking online I’ve found some really interesting uses of this stuff beyond what I’ve shown you. Some people have mixed in a little other forms of black magic including templates, macros and template argument deduction and have done things as wild as being able to tell if an object has a specifically named function or member variable.

I personally feel like this trick of using sizeof to do compile time type deduction must have a million uses, but that my brain isn’t used to thinking in this way so it’ll probably be a little time before using it becomes more natural (whenever it’s appropriate that is hehe).

Hopefully you guys enjoyed learning about this as much as I did… soon I’ll be writing about another neat trick I came across called SFINAE. Give it a google if you want… it’s another pretty odd thing and I look forward to digging into it to be able to write up another post with some everyday useful examples.

Until next time!

Comments

comments

Lists of Macro Lists

Example Code: Code_051113

In the two previous posts I showed some ways to use macro lists that I’ve found useful in the past. A downside to those methods though was that you had to manually expand / use each macro list individually. Whenever you have to do extra steps that are always the same, you have a possible point of failure in your code / development process that doesn’t need to be there (when you forget to do the step that’s always the same).

Wouldn’t it be great if you could make a list of macro lists, such as a list of enums that each had their own enum values, and say “expand all of these into all the enums described, with to and from string conversion functions” and it would do it?

How about describing a bunch of events that can happen in your game, specifying what data fields the events should have, what their types and default values are, as well as whether they are optional and required – and it would build full on classes or structs based on your settings to facilitate strongly typed, compile time protected message and event passing that also supported inheritance and custom dynamic cast functionality?

The example code for this post shows you how to do both of those things.

Boost Preprocessor Library (BOOST_PP)

I had to pull out the “big guns” for making this example work, specifically the boost preprocessor library. Yes it’s true that anything the boost preprocessor library can do, you can also do yourself without the boost preprocessor library. However, the neat thing about using the boost preprocessor library is that there are variations in the behaviors of the preprocessors for different compilers, and boost abstracts that away from you so that you know if you use their functions, they ought to work and behave the same way on other compilers. That’s a bunch of compatibility headaches that you don’t have to worry about.

The boost preprocessor library has a very developer friendly license too. It’s very similar to the zlib / mit license which in effect says “don’t call us and we won’t call you”. You can use it commercially for free, without having to give them any credit for anything, and also you aren’t required under any circumstances to link dynamically to avoid having to make your code open sourced (a problem you commonly hit with some other software licenses). That being said, there are a couple small requirements about keeping the license file in tact when redistributing the source code of their library etc so have a look at the license just to make sure you dot all your i’s and cross all your t’s, but if you work somewhere that is hesitant to use open sourced software for legal reasons (I have worked at my share of those places), the license is literally the same as zlib and i’ll bet your project is already using zlib, or one of the libraries you are using is internally using zlib. So seriously… there ought not be any problems with using boost_pp in your code, unless your legal department is retarded (again, something i’ve experienced in times past LOL).

Best of all, the boost preprocessory library is SUPER EASY to download, integrate into your project and start using. I’ve included a version in the sample code. It’s just a folder with some headers in it. The sample code #include’s some of those header files and it works. No other setup needed. You didn’t even need to build boost or link to it or anything crazy like that. SUPER EASY, don’t be afraid of boost_pp! Boost.org

FWIW I had my doubts about boost_pp too, but a fellow game dev friend of mine turned me onto it and I’m really glad he did. As they say, T-Law is the law.

Main.cpp – What our program is able to do in the end

Here’s the main.cpp file to show the capabilities of our sample code. Note specifically the custom dynamic cast functions, enum to/from string functions, and the CGameEvent objects. The constructors to the CGameEvent objects have required parameters for all required fields, and defaulted, optional parameters for all of the optional fields of the defined message.

#include "GameEvents.h"

int main(int argc, char **argv)
{
	// make some test events
	CGameEventPlayerDamaged damaged(ePlayer1, 5.0f, eDamageTypeElectricity);
	CGameEventPlayerDeath death(ePlayer1, eDeathTypeElectrocution);

	// get base class pointers to them
	CGameEvent *gameEvent1 = &damaged;
	CGameEvent *gameEvent2 = &death;

	// do a dynamic cast on the 1st game event (the damaged event)
	// test1a will be non null, and test1b will be null
	CGameEventPlayerDamaged *test1a = gameEvent1->GetAs<CGameEventPlayerDamaged>();
	CGameEventPlayerDeath *test1b = gameEvent1->GetAs<CGameEventPlayerDeath>();

	// do a dynamic cast on the 2nd game event (the death event)
	// test2a will be null, test2b will be non null
	CGameEventPlayerDamaged *test2a = gameEvent2->GetAs<CGameEventPlayerDamaged>();
	CGameEventPlayerDeath *test2b = gameEvent2->GetAs<CGameEventPlayerDeath>();

	// set and get some values using the accessors
	damaged.SetIsFatal(!damaged.GetIsFatal());
	damaged.SetDamageTaken(damaged.DefaultDamageTaken());
	damaged.SetAttackerPlayerID(EPlayerFromString(EPlayerToString(ePlayer2)));
	death.SetDeathType(EDeathTypeFromString("drowned"));

	return 0;
}

GameEvents.h

This is the file where the game events are defined. For each event, you specify the data type, the field name, the default value, and whether the field is required or optional. Note that you must specify all required field before all optional fields. It’s possible that there’s a way to make it so you don’t have to do that, but I couldn’t figure out a way. Note that if you mess that up, you’ll get a compile error.

Optional / required settings for fields are enforced by the constructor in each game event class.

//=====================================================================================
//
//	GameEvents.h
//
//	Define game events here.  They are expanded into full classes for each event.
//
//=====================================================================================

// include our enums so we can use them in our event definitions.
#include "GameEnums.h"

// Note that we could also use complex structures, even other events in our event
// definitions.  Also, this code could be expanded to allow you to define inheritance
// for each event.  So for instance, the PlayerDeath event could inherit the
// PlayerDamaged event if you wanted it to, and it would contain all of the same fields
// that the PlayerDamaged event had.  It might get a bit tricky making the constructor
// though.

#define EventList \
/*===================================================================================*/ \
/*                                   Settings                                        */ \
/*===================================================================================*/ \
(GameEvent) /*Event Prefix*/ \
( \
/*===================================================================================*/ \
/*                                  PlayerDamaged                                    */ \
/*===================================================================================*/ \
( \
	(PlayerDamaged) \
	( \
		((EPlayer)     (VictimPlayerID)   (ePlayerUnknown)    (ELB_REQUIRED)) \
		((float)       (DamageTaken)      (0.0f)              (ELB_REQUIRED)) \
		((EDamageType) (DamageType)       (eDamageTypeNormal) (ELB_OPTIONAL)) \
		((EPlayer)     (AttackerPlayerID) (ePlayerUnknown)    (ELB_OPTIONAL)) \
		((bool)        (IsFatal)          (false)             (ELB_OPTIONAL)) \
	) \
) \
/*===================================================================================*/ \
/*                                  PlayerDeath                                      */ \
/*===================================================================================*/ \
( \
	(PlayerDeath) \
	( \
		((EPlayer)    (VictimPlayerID) (ePlayerUnknown)   (ELB_REQUIRED)) \
		((EDeathType) (DeathType)      (eDeathTypeNormal) (ELB_OPTIONAL)) \
	) \
) \
/*===================================================================================*/ \
) \
/*===================================================================================*/ \

// build our event classes
#include "EventListBuilder.h"

GameEnums.h

Here’s where game enums are defined. Note that it makes an enum per entry here, but it also makes ToString and FromString functions for each enum type.

//=====================================================================================
//
//	GameEnums.h
//
//	Game enums are defined here
//=====================================================================================

#define EnumList \
/*===================================================================================*/ \
/*                                  EDeathType                                       */ \
/*===================================================================================*/ \
( \
	(DeathType) \
	( \
		(Normal) \
		(Incineration) \
		(Electrocution) \
		(Drowned) \
		(Crushed) \
		(Telefrag) \
	) \
) \
/*===================================================================================*/ \
/*                                  EDamageType                                      */ \
/*===================================================================================*/ \
( \
	(DamageType) \
	( \
		(Normal) \
		(Fire) \
		(Electricity) \
		(Ice) \
		(Arcane) \
	) \
) \
/*===================================================================================*/ \
/*                                    EPlayer                                        */ \
/*===================================================================================*/ \
( \
	(Player) \
	( \
		(1) \
		(2) \
		(3) \
		(4) \
	) \
) \
/*===================================================================================*/ \

#include "EnumBuilder.h"

EnumBuilder.h

The simpler of the two files used to generate code from our data, here’s EnumBuilder.h.

Note that I use both BOOST_PP_SEQ_FOR_EACH as well as BOOST_PP_SEQ_FOR_EACH_I. The reason for that is because with BOOST_PP you are unable to do nested for loops. However, if you use two different looping methods, it works just fine. So… i use FOR_EACH for one level of looping, and FOR_EACH_I for another level of looping. I’m not sure if you could do 3 or more levels of looping by switching between which loop method you use or not.

//=====================================================================================
//
//	EnumBuilder.h
//
//  Define "EnumList" and then include this header file to expand it into an enum and
//  also provide ToString and FromString functions.
//
//  Here's an example of what EnumList might look like
//
//	#define EnumList \
//  ( \
//		(FruitType) \
//		( \
//			(Apple) \
//			(Pear) \
//			(Banana) \
//			(Orange) \
//		) \
//	) \
//  ( \
//		(VegieType) \
//		( \
//			(Potato) \
//			(Broccoli) \
//		) \
//  ) \
//
//  The first entry is the name of the enum, and then after that are the enum values.
//  You can define as many enums as you would like I believe.  Boost might have an
//  internal limit, but I'm not sure...
//
//  The above would make two enums like below:
//
//  enum EFruitType
//  {
//		eFruitTypeUnknown = -1,
//
//		eFruitTypeApple,
//		eFruitTypePear,
//		eFruitTypeBanana,
//		eFruitTypeOrange,
//
//		eFruitTypeCount,
//		eFruitTypeFirst = 0
//	};
//
//  enum EVegieType
//  {
//		eVegieTypeUnknown = -1,
//
//		eVegieTypePotato,
//		eVegieTypeBroccoli,
//
//		eVegieTypeCount,
//		eVegieTypeFirst = 0
//  };
//
//  It will also make these functions:
//
//  const char *EFruitTypeToString( EFruitType value );
//  EFruitType EFruitTypeFromString( const char *value );
//
//  const char *EVegieTypeToString( EVegieType value );
//  EVegieType EVegieTypeFromString( const char *value );
//
//=====================================================================================

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>
#include <string.h>

//=====================================================================================

// define an enum value
#define ENB_ENUMVALUE(depth, enumName, enumItem) \
	BOOST_PP_CAT(e, BOOST_PP_CAT(enumName, enumItem)), \

// convert something to a string
#define ENB_TOSTRING(x) #x

// ToString() enum case
#define ENB_TOSTRINGCASE(depth, enumName, enumItem) \
	case BOOST_PP_CAT(e, BOOST_PP_CAT(enumName, enumItem)): return ENB_TOSTRING(enumItem);

// FromString() string test
#define ENB_FROMSTRINGTEST(depth, enumName, enumItem) \
	if (!stricmp(ENB_TOSTRING(enumItem), value)) \
		return BOOST_PP_CAT(e, BOOST_PP_CAT(enumName, enumItem));

// define the enum, including an "unknown" value at -1, and also "first" and "count"
// values which are useful for iterating through enum values
#define ENB_MAKEENUM(depth, data, index, enumList) \
	enum BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, enumList)) \
	{ \
		BOOST_PP_CAT(BOOST_PP_CAT(e,BOOST_PP_SEQ_ELEM(0, enumList)),Unknown) = -1, \
		BOOST_PP_SEQ_FOR_EACH(ENB_ENUMVALUE, BOOST_PP_SEQ_ELEM(0, enumList), BOOST_PP_SEQ_ELEM(1, enumList)) \
		BOOST_PP_CAT(BOOST_PP_CAT(e,BOOST_PP_SEQ_ELEM(0, enumList)),Count), \
		BOOST_PP_CAT(BOOST_PP_CAT(e,BOOST_PP_SEQ_ELEM(0, enumList)),First) = 0, \
	}; \

// make the <EnumName>ToString function
#define ENB_MAKETOSTRING(depth, data, index, enumList) \
	const char *BOOST_PP_CAT(BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, enumList)),ToString) (BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, enumList)) value) \
	{ \
		switch(value) \
		{ \
			BOOST_PP_SEQ_FOR_EACH(ENB_TOSTRINGCASE, BOOST_PP_SEQ_ELEM(0, enumList), BOOST_PP_SEQ_ELEM(1, enumList)) \
		} \
		/* error case */ \
		return "Unknown"; \
	}

// make the <EnumName>FromString function
#define ENB_MAKEFROMSTRING(depth, data, index, enumList) \
	BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, enumList)) BOOST_PP_CAT(BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, enumList)),FromString) (const char *value) \
	{ \
		BOOST_PP_SEQ_FOR_EACH(ENB_FROMSTRINGTEST, BOOST_PP_SEQ_ELEM(0, enumList), BOOST_PP_SEQ_ELEM(1, enumList)) \
		/* error case */ \
		return BOOST_PP_CAT(BOOST_PP_CAT(e,BOOST_PP_SEQ_ELEM(0, enumList)),Unknown); \
	}

//=====================================================================================

// make all of the enums
BOOST_PP_SEQ_FOR_EACH_I(ENB_MAKEENUM, _, EnumList)

// make the ToString functions
BOOST_PP_SEQ_FOR_EACH_I(ENB_MAKETOSTRING, _, EnumList)

// make the FromString functions
BOOST_PP_SEQ_FOR_EACH_I(ENB_MAKEFROMSTRING, _, EnumList)

//=====================================================================================

// don't pollute other code with our macros
#undef ENB_ENUMVALUE
#undef ENB_TOSTRING
#undef ENB_TOSTRINGCASE
#undef ENB_FROMSTRINGTEST
#undef ENB_MAKEENUM
#undef ENB_MAKETOSTRING
#undef ENB_MAKEFROMSTRING

// this was defined by the caller but clean it up here for convinience
#undef EnumList

EventListBuilder.h

Here’s the code that expands an event list into classes with constructors, accessors and dynamic casting functionality:

//=====================================================================================
//
//	EventListBuilder.h
//
//  Define "EventList" and then include this header file to expand it into classes
//
//  The EventList schema is:
//
//  (ExpansionPrefix)
//  (Events)
//
//  ExpansionPrefix - a prefix added to every event name in the group
//  Events          - the list of events, see the schema below
//
//	The Events schema is:
//
//	(EventName)
//	(
//		((type)(Name)(Default)(Required))
//	)
//
//	EventName - The name of the event. A "C" and the expansion prefix are added
//				to the front when making it into a class.
//
//	Type	  - The data type of the field.  Make sure the data type is "viewable"
//				before including EventListBuilder.h
//
//	Name	  - The name of the data field, used to name member vars and accessors
//
//	Default   - The default value of the field.
//
//	Required  - Specifies whether a value must be provided when creating a new event
//				of this type or not.  If optional fiels are not provided, the default
//				value will be used.  ELB_OPTIONAL / ELB_REQUIRED
//
//  Note: All required parameters must come before all optional parameters
//
//  Example EventList:
//
//  #define EventList \
//		(InputEvent) \
//      ( \
//        ( \
//          (ButtonPress) \
//          ( \
//				((EButtonId) (ButtonId) (eButtonInvalid) (ELB_REQUIRED)) \
//				((bool)      (Pressed)  (true)           (ELB_REQUIRED)) \
//          ) \
//        ) \
//        ( \
//          (AnalogStick) \
//          ( \
//				((float)          (PosX)          (0.0f)              (ELB_REQUIRED)) \
//				((float)          (PosY)          (0.0f)              (ELB_REQUIRED)) \
//				((EAnalogStickId) (AnalogStickId) (eAnalogStickMouse) (ELB_OPTIONAL)) \
//          ) \
//        ) \
//      )
//
//	The above example will make two event classes with getters/setters for members
//  and a constructor on each class which has optional and required parameters as
//  specified in the EventList data.
//
//    CInputEventButtonPress
//      EButtonId ButtonId - the button that generated the event
//      bool Pressed       - whether the button was pressed or released
//
//    CInputEventAnalogStick
//      float PosX                   - X position of the stick
//      float PosY                   - Y position of the stick
//      EAnalogStickId AnalogStickId - optional parameter to specify which stick.
//									   defaults to "mouse"
//
//=====================================================================================

#include <boost/preprocessor/cat.hpp>
#include <boost/preprocessor/control/if.hpp>
#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/seq/for_each_i.hpp>

//=====================================================================================
// Define the base class
// C<ExpansionPrefix>
// 
// It has a templated GetAs function that works like a dynamic cast and lets you ask
// the event type too.
//=====================================================================================
#define ELB_DEFINEBASECLASS(expansionSettings) \
	class BOOST_PP_CAT(C, expansionSettings) \
	{ \
	public: \
		/* Constructor */ \
		BOOST_PP_CAT(C, expansionSettings) (BOOST_PP_CAT(E,expansionSettings) eventType) \
		{ \
			m_eventType = eventType; \
		} \
		\
		/* EventType() */ \
		BOOST_PP_CAT(E,expansionSettings) EventType() const {return m_eventType;} \
		\
		/* GetAs() */ \
		template <class EventClass> \
		EventClass *GetAs() \
		{ \
			if (m_eventType == EventClass::StaticEventType()) \
				return (EventClass *)this; \
			else \
				return 0; \
		} \
	private: \
		BOOST_PP_CAT(E,expansionSettings) m_eventType; \
	};

//=====================================================================================
// Define the class
// C<ExpansionPrefix><Eventname>
//
// It has getters and setters for each data member defined and the construct handles
// initialization of members to defaults and enforces required data fields.
//=====================================================================================
#define ELB_DEFINECLASS(depth, expansionSettings, classData) \
	class BOOST_PP_CAT(C, BOOST_PP_CAT(expansionSettings, BOOST_PP_SEQ_ELEM(0, classData))) \
	: public BOOST_PP_CAT(C, expansionSettings) \
	{ \
	public: \
		/* Constructor */ \
		BOOST_PP_CAT(C, BOOST_PP_CAT(expansionSettings, BOOST_PP_SEQ_ELEM(0, classData))) \
		( \
			BOOST_PP_SEQ_FOR_EACH_I(ELB_CONSTRUCTPARAMS, _, BOOST_PP_SEQ_ELEM(1, classData)) \
			void *dummyParamIgnore = 0 \
		) \
		: BOOST_PP_CAT(C, expansionSettings) (BOOST_PP_CAT(e,BOOST_PP_CAT(expansionSettings, BOOST_PP_SEQ_ELEM(0, classData)))) \
		{ \
			BOOST_PP_SEQ_FOR_EACH_I(ELB_INITIALIZEMEMBERVAR, _, BOOST_PP_SEQ_ELEM(1, classData)) \
		} \
		\
		/* Accessors - Get, Set, static Default*/ \
		BOOST_PP_SEQ_FOR_EACH_I(ELB_DEFINEMEMBERVARACCESSORS, _, BOOST_PP_SEQ_ELEM(1, classData)) \
		\
		/* StaticEventType() */ \
		static BOOST_PP_CAT(E,expansionSettings) StaticEventType() \
		{ \
			return BOOST_PP_CAT(e,BOOST_PP_CAT(expansionSettings, BOOST_PP_SEQ_ELEM(0, classData))); \
		} \
	private: \
		/* Member Variables */ \
		BOOST_PP_SEQ_FOR_EACH_I(ELB_DEFINEMEMBERVAR, _, BOOST_PP_SEQ_ELEM(1, classData)) \
	};

//=====================================================================================

// to make required / optional more apparent in the EventList.
// NOTE: BOOST_PP_IF needs to take a 1 or 0 unfortunately!
#define ELB_REQUIRED 1
#define ELB_OPTIONAL 0

// add member variables to the constructor, optional or required as specified in data
#define ELB_CONSTRUCTPARAMS(depth, data, index, classData) \
	BOOST_PP_SEQ_ELEM(0, classData) BOOST_PP_SEQ_ELEM(1, classData) \
	BOOST_PP_IF(BOOST_PP_SEQ_ELEM(3, classData), , = BOOST_PP_SEQ_ELEM(2, classData)),

// initialize a member variable to the parameter passed into the constructor
#define ELB_INITIALIZEMEMBERVAR(depth, data, index, classData) \
	BOOST_PP_CAT(m_,BOOST_PP_SEQ_ELEM(1, classData)) = BOOST_PP_SEQ_ELEM(1, classData);

// define a single member variable
#define ELB_DEFINEMEMBERVAR(depth, data, idex, classData) \
	BOOST_PP_SEQ_ELEM(0, classData) BOOST_PP_CAT(m_,BOOST_PP_SEQ_ELEM(1, classData));

// define a single member variable accessors
#define ELB_DEFINEMEMBERVARACCESSORS(depth, data, index, classData) \
	BOOST_PP_SEQ_ELEM(0, classData) BOOST_PP_CAT(Get,BOOST_PP_SEQ_ELEM(1, classData))() const { return BOOST_PP_CAT(m_,BOOST_PP_SEQ_ELEM(1, classData)); } \
	void BOOST_PP_CAT(Set,BOOST_PP_SEQ_ELEM(1, classData))(const BOOST_PP_SEQ_ELEM(0, classData) &value) { BOOST_PP_CAT(m_,BOOST_PP_SEQ_ELEM(1, classData)) = value; } \
	static BOOST_PP_SEQ_ELEM(0, classData) BOOST_PP_CAT(Default,BOOST_PP_SEQ_ELEM(1, classData))() { return BOOST_PP_SEQ_ELEM(2, classData); } 

// define an enum value
#define ELB_ENUMVALUE(depth, expansionSettings, classData) \
	BOOST_PP_CAT(e,BOOST_PP_CAT(expansionSettings, BOOST_PP_SEQ_ELEM(0, classData))), \

//=====================================================================================

// make an enum of the event types
enum BOOST_PP_CAT(E,BOOST_PP_SEQ_ELEM(0, EventList))
{
	BOOST_PP_SEQ_FOR_EACH(ELB_ENUMVALUE, BOOST_PP_SEQ_ELEM(0, EventList), BOOST_PP_SEQ_ELEM(1, EventList))
};

// define the base class
ELB_DEFINEBASECLASS(BOOST_PP_SEQ_ELEM(0, EventList))

// build EventList into classes
BOOST_PP_SEQ_FOR_EACH(ELB_DEFINECLASS, BOOST_PP_SEQ_ELEM(0, EventList), BOOST_PP_SEQ_ELEM(1, EventList))

//=====================================================================================

// don't pollute other code with our macros
#undef ELB_DEFINEBASECLASS
#undef ELB_DEFINECLASS
#undef ELB_REQUIRED
#undef ELB_OPTIONAL
#undef ELB_CONSTRUCTPARAMS
#undef ELB_INITIALIZEMEMBERVAR
#undef ELB_DEFINEMEMBERVAR
#undef ELB_DEFINEMEMBERVARACCESSORS
#undef ELB_ENUMVALUE

// this was defined by the caller but clean it up for convinience
#undef EventList

Possible Improvements

  • Imagine if the event classes had binary and/or text serialization for being able to save and load events from disk, or over a network connection. Something like this sample code could be the back bone of your editor, your save game systems, or your network communication protocol. It would be type safe and easily modified. When you changed structures, you’d likely get compile errors or warnings in most of the places you needed to update code to support the changes.
  • The definitions of enums and events are kind of weird in that they are lists of text in parentheses because that’s what a “boost sequence” is (BOOST_PP_SEQ family of functions). Boost also has something called tuples which are comma seperated lists. I think tuples would be more natural for usage in some of the definition cases but i was unable to get that working for whatever reason. If you end up getting that working, post back and let us all know how you did it!
  • There are a lot of “magic numbers” in the macros, such as saying “use sequence field 0 for this” or “use sequence field 2 for this”. It probably would be better (more readable at least) if I defined symbolic constants for various indices. That way you’d see ELB_ENUMNAME instead of “0”, which would make a lot more sense.

That’s All

This post was code heavy and explanation light but hopefully you can understand what’s going on here. If you have any questions about the code, or why I did something one way instead of doing it a different way, post a comment and I’ll get back to you!

Example Code: Code_051113

Comments

comments

Macro Lists For The Win – Side B

Example Code: Code_050713.zip

In the previous post I talked about how you can use macro lists to solve the problem of wanting to generate a bunch of code based on the same set of data. This was useful for doing things like defining a list of resources a player could accumulate, and then being able to generate code to store and manipulate each resource type. You only had to update the resource list to add a new resource and the rest of the code would almost magically generate itself.

What if you wanted the reverse though? What if you had a fixed set of code that you want to apply to a bunch of different sets of data? This post is going to show you a way to do that.

In the example code, we are going to make a way to define several lists of items, and expand each list into an enum that also has a ToString and FromString function associated with it.

Another usage case for this technique might be to define lists of data fields, and expand each list into a data structure that contains serialization and deserialization functions. This would allow you to make data structures that could be saved and loaded to disk, or to sent and received over a network connection, just by defining what data fields they contained.

I haven’t yet seen this technique in the wild, and it kind of makes me wonder why since they are just two sides of the same coin.

GameEnums.h

In the last post, our data was always the same and we just applied it to different code. To do this, we had the code in one .h and the data in another .h that would get included multiple times. This allowed us to define different pieces of code in one .h, then include the other .h file to apply the fixed data to each piece of code.

In this post, it’s going to be the exact opposite. Our code will always stay the same and we will apply it to different data so our data will be in one .h and the code will be in another .h that gets included multiple times.

Here’s GameEnums.h:

//////////////////////
//     EDamageType
//////////////////////
#define ENUMNAME DamageType
#define ENUMLIST \
	ENUMENTRY(Normal) \
	ENUMENTRY(Electricity) \
	ENUMENTRY(Fire) \
	ENUMENTRY(BluntForce)

#include "EnumBuilder.h"
//////////////////////
//     EDeathType
//////////////////////
#define ENUMNAME DeathType
#define ENUMLIST \
	ENUMENTRY(Normal) \
	ENUMENTRY(Electrocuted) \
	ENUMENTRY(Incinerated) \
	ENUMENTRY(Smashed)

#include "EnumBuilder.h"
//////////////////////
//     EFruit
//////////////////////
#define ENUMNAME Fruit
#define ENUMLIST \
	ENUMENTRY(Apple) \
	ENUMENTRY(Banana) \
	ENUMENTRY(Orange) \
	ENUMENTRY(Kiwi)

#include "EnumBuilder.h"
//////////////////////
//     EPlayers
//////////////////////
#define ENUMNAME Player
#define ENUMLIST \
	ENUMENTRY(1) \
	ENUMENTRY(2) \
	ENUMENTRY(3) \
	ENUMENTRY(4)

#include "EnumBuilder.h"

EnumBuilder.h

This header file is where the real magic is; it’s responsible for taking the previously defined ENUMNAME and ENUMLIST macros as input, and turning them into an enum and the string functions. Here it is:

#include <string.h> // for _stricmp, for the enum Fromstring function

// this EB_COMBINETEXT macro works in visual studio 2010.  No promises anywhere else.
// Check out the boost preprocessor library if this doesn't work for you.
// BOOST_PP_CAT provides the same functionality, but ought to work on all compilers!
#define EB_COMBINETEXT(a, b) EB_COMBINETEXT_INTERNAL(a, b)
#define EB_COMBINETEXT_INTERNAL(a, b) a ## b

// make the enum E<ENUMNAME>
#define ENUMENTRY(EnumValue) EB_COMBINETEXT(e, EB_COMBINETEXT(ENUMNAME, EnumValue)),
enum EB_COMBINETEXT(E,ENUMNAME) {
	EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), Unknown) = -1,
	ENUMLIST
	EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), Count),
	EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), First) = 0,
	EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), Last) = EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), Count) - 1
};
#undef ENUMENTRY

// make the E<ENUMNAME>ToString function
const char *EB_COMBINETEXT(EB_COMBINETEXT(E,ENUMNAME), ToString)(EB_COMBINETEXT(E,ENUMNAME) value)
{
	switch(value)
	{
		#define ENUMENTRY(EnumValue) \
			case EB_COMBINETEXT(e, EB_COMBINETEXT(ENUMNAME, EnumValue)): \
			return #EnumValue;
		ENUMLIST
		#undef ENUMENTRY
	}
	return "Unknown";
}

// make the E<ENUMNAME>FromString function
EB_COMBINETEXT(E,ENUMNAME) EB_COMBINETEXT(EB_COMBINETEXT(E,ENUMNAME), FromString)(const char *value)
{
	#define ENUMENTRY(EnumValue) \
	if(!_stricmp(value,#EnumValue)) \
		return EB_COMBINETEXT(e, EB_COMBINETEXT(ENUMNAME, EnumValue));
	ENUMLIST
	#undef ENUMENTRY
	return EB_COMBINETEXT(EB_COMBINETEXT(e, ENUMNAME), Unknown);
}

// clean up
#undef EB_COMBINETEXT
#undef EB_COMBINETEXT_INTERNAL

// these were defined by the caller but clean them up for convinience
#undef ENUMNAME
#undef ENUMLIST

Main.cpp

Now, here’s how you can actually use this stuff!

#include "GameEnums.h"

int main(int argc, char* argv[])
{
	EDamageType damageType = eDamageTypeBluntForce;
	EDeathType deathType = EDeathTypeFromString("smashed");
	EFruit fruit = eFruitLast;
	EPlayer player = EPlayerFromString(EPlayerToString(ePlayer1));
	return 0;
}

Combining the files

As a quick aside, in both this and the last post, I separated the code and data files. This is probably how you would normally want to do things because it’ll usually be cleaner, but it isn’t required. Here’s a cool technique I came across today…

Here’s Macro.cpp:

#ifdef MACROHEADER
	// Put your header stuff here
#else
	// Put cpp type stuff here

	// Include the "header"
	#define MACROHEADER
	#include "Macro.cpp"
	#undef MACROHEADER
#endif

If you ever really just want to combine all your code and data into a single file and not muddy up a directory or project with more files, this technique can help you do that. IMO you really ought to just use separate files, but I wanted to share this for when there are exceptions to that rule (as there always seem to be for every rule!)

Being Data Driven

After my last post, a fellow game developer friend of mine pointed out..

I hope that one day we hurl C++ into a raging sea of fire.

I do like this technique, in theory at least, but whenever I feel like it’s the right solution, a voice in the back of my head yells that I’m digging too greedily and too deeply and should step back for a second and consider what design choices have lead me to this point.

And I think that usually that introspection ends up at the intersection of “we’re not data-driven enough but want to be” and “we decided to use C++ for our engine.”

— jsola

He does have a point. For instance, in the case of our resource list from last post, it would be better if you had some “source data”, such as an xml file, listing all the resources a player could have. The game should load that data in on startup and make a dynamic array, etc to handle those resources. When you had game actions that added or subtracted specific resources from a player, the details of which resources got modified, and by how much, should also be specified in data.

When you save or pack your data, or at runtime (as your situation calls for), it can verify that your data is well formed and makes sure that if a data field is meant to specify a resource type, that it actually corresponds to an actual resource type listed in the list of resources.

That is closer to the ideal situation when making a game – especially when making larger games with a lot of people.

But there are still some good usage cases for this kind of macro magic (and template metaprogramming as well). For instance, maybe you use macros to define your data schemas so that your application can be data driven in the first place – I’ve done that on several projects myself and have seen other well respected people do it as well. So, add these things to your toolbox I say, because you never know when you might need them!

Next Post…

The next post will be what i promised at the end of the last one. I’m going to talk about a way to define a list of lists and then be able to expand that list of lists in a single go, instead of having to do a file include for each list individually.

Example Code: Code_050713.zip

Comments

comments

Macro Lists For The Win

Around 6 years ago I was introduced to a programming technique that really blew my mind. John, My boss at the time and the tech director at inXile, had written it as part of the code base for the commercial version of a game called Line Rider and I believe he said he first heard about the technique from a friend at Sony.

Since seeing it at inXile, I’ve seen the same technique or variations on it at several places including Monolith and Blizzard, and have had the opportunity to use it on quite a few occasions myself.

What is this technique? I’m not sure if it has an actual name but I refer to it as “Macro Lists” and it’s a good tool towards achieving the DRY principle (don’t repeat yourself – http://en.wikipedia.org/wiki/Don’t_repeat_yourself)

Macro lists are often useful when you find yourself copy / pasting code just to change a couple things, and have multiple places you need to update whenever you need to add a new entry into the mix.

For example, let’s say that you have a class to store how much of each resource that a player has and lets say you start out with two resources – gold and wood.

To implement this, you might write some code like this:

enum EResource
{
	eResourceGold,
	eResourceWood,
};

class CResourceHolder
{
public:
	CResourceHolder()
	{
		m_resources[eResourceGold] = 100.0f;
		m_resources[eResourceWood] = 0.0f;
	}

	float GetGold() const
		{ return m_resources[eResourceGold ]; }
	void SetGold(float amount)
		{ m_resources[eResourceGold ] = amount; }

	float GetWood() const
		{ return m_resources[eResourceWood]; }
	void SetWood(float amount)
		{ m_resources[eResourceWood] = amount; }
private:
	float m_resources[2];
};

That seems pretty reasonable right?

Now let’s say that you (or someone else who doesn’t know the code) wants to add another resource type. What do you need to do to add a new resource?

  1. Add a new enum value to the enum
  2. Initialize the new value in the array to zero
  3. Make a Get and Set function for the resource
  4. Increase the array size of m_resources to hold the new value

If #1 or #3 are forgotten, it will probably be really obvious and it’ll be fixed right away. If #2 or #4 are missed though, you are going to have some bugs that might potentially be very hard to track down because they won’t happen all the time, and they may only happen in release, or only when doing some very specific steps that don’t seem to have anything to do with the resource code.

Kind of a pain right? As the code gets more mature and more features are added, there will likely be other places that need to be updated too that will easily be forgotten. Also, when this sort of stuff comes up, people tend to copy/paste existing patterns and then change what needs to be changed – which can be really dangerous if people forget to change some of the values which need to be changed.

Luckily macro lists can help out here to ensure that it’s IMPOSSIBLE for you, or anyone else, to forget the steps of what to change. Macro lists make it impossible to forget because they do the work for you!

Check out this code to see what I mean. It took me a little bit to wrap my head around how this technique worked when I first saw it, so don’t get discouraged if you have trouble wrapping your head around it as well.

#define RESOURCE_LIST \
	RESOURCE_ENTRY(Gold, 100.0) \
	RESOURCE_ENTRY(Wood, 0)

// make the enum
#define RESOURCE_ENTRY(resource, startingValue) \
	eResource#resource,
enum EResource
{
	eResourceUnknown = -1,
	RESOURCE_LIST
	eResourceCount,
	eResourcefirst = 0
};
#undef RESOURCE_ENTRY

class CResourceHolder
{
public:
	CResourceHolder()
	{
		// initialize to starting values
		#define RESOURCE_ENTRY(resource, startingValue) \
			m_resources[eResource#resource] = startingValue;
		RESOURCE_LIST
		#undef RESOURCE_ENTRY
	}

// make a Get and Set for each resource
#define RESOURCE_ENTRY(resource, startingValue) \
	float Get#resource() const \
	{return m_resources[eResource#resource];} \
	void Set#resource(float amount) \
	{m_resources[eResource#resource] = amount;} \
RESOURCE_LIST
#undef RESOURCE_ENTRY

private:
	// ensure that our array is always the right size
	float m_resources[eResourceCount];
};

In the above code, the steps mentioned before happen automatically. When you want to add a resource, all you have to do is add an entry to the RESOURCE_LIST and it does the rest for you. You can’t forget any of the steps, and as people add new features, they can work with the macro list to make sure people in the future can add resources without having to worry about the details.

Include File Variation

If you used the above technique a lot in your code base, you could imagine that someone might name their macros the same things you named yours which could lead to a naming conflict.

Keeping the “global macro namespace” as clean as possible is a good practice to follow and there’s a variation of the macro list technique that doesn’t pollute the global macro namespace like the above.

Basically, you put your macro list in a header file, and then include that header file every place you would normally put a RESOURCE_LIST.

Here’s the same example broken up that way. First is ResourceList.h:

///////////////////////////////////
//	RESOURCE_ENTRY(ResourceName, StartingValue)
//
//	ResourceName - the name of the resource
//	StartingValue - what to start the resource at
//
RESOURCE_ENTRY(Gold, 100.0)
RESOURCE_ENTRY(Wood, 0)
///////////////////////////////////

And now here is CResourceHolder.h:

///////////////////////////////////
// make the enum
#define RESOURCE_ENTRY(resource, startingValue) \
	eResource#resource,
enum EResource
{
	eResourceUnknown = -1,
	#include "ResourceList.h"
	eResourceCount,
	eResourcefirst = 0
};
#undef RESOURCE_ENTRY

class CResourceHolder
{
public:
	CResourceHolder()
	{
		// initialize to starting values
		#define RESOURCE_ENTRY(resource, startingValue) \
			m_resources[eResource#resource] = startingValue;
		#include "ResourceList.h"
		#undef RESOURCE_ENTRY
	}

// make a Get and Set for each resource
#define RESOURCE_ENTRY(resource, startingValue) \
	float Get#resource() const \
	{return m_resources[eResource#resource];} \
	void Set#resource(float amount) \
	{m_resources[eResource#resource] = amount;}
#include "ResourceList.h"
#undef RESOURCE_ENTRY

private:
	// ensure that our array is always the right size
	float m_resources[eResourceCount];
};

The Downside of Macro Lists

So, while doing the above makes code a lot easier to maintain and less error prone, it comes at a cost.

Most notably is it can be really difficult to figure out what code the macros will expand to, and it can be difficult to alter the functionality of the macros. A way to lessen this problem is that you can tell most compilers to make a file that shows what your code looks like after the preprocessor is done with it. It can still be difficult even with this feature, but it does help a lot.

When you have compiler errors due to macros, because perhaps you forgot a parameter, or it’s the wrong type, the compiler errors can be pretty difficult to understand sometimes.

Another problem with macros is that I don’t know of any debuggers that will let you step through macro code, so in a lot of ways it’s a black box while you are debugging, which sucks if that code malfunctions. If you keep your functionality simple, straightfoward and format it cleanly, you ought not to hit many of these problems though.

Instead of using macro lists, some people prefer to put their data into something like an xml or json data file, and then as a custom build step, use XSLT or the like to convert that data into some code, just like the C++ preprocessor would. The benefit here is that you can see the resulting code and step through it while debugging, but of course the downside is it can be more difficult for someone else to get set up to be able to compile your code.

To Be Continued…

Macro lists are great, but what if you want your lists to have sublists? For instance, what if you wanted to define network messages for your game in a format like this, and have it automatically expand into full fledged classes to be able to ensure that message parsing and data serialization was always done in a consistent way to minimize bugs and maximize efficiency (less code to write and less testing to do)?

As you might have noticed, macro lists can take parameters to help them be flexible (like, the starting value of the resources… you could add more parameters if you wanted to), but, a macro list can’t contain another macro list. At least not how the above implementations work.

I’m going to show you how to tackle this problem in the next post, so stay tuned! (:

Comments

comments

Permutation Programming Without Maintenance Nightmares

Example Code: TemplateSlots.cpp

When writing real time software, we sometimes hit situations where we have code that needs to be lightning fast, but also needs to be configurable to change how it behaves.

For instance, if you are writing something that renders 3d graphics (either traditional rasterized 3d graphics, or raytraced 3d graphics) you might be forced to put an if statement in a deep inner loop saying “is this object textured? Does this object have emissive lighting? If so, call the appropriate functions”. ¬†Or, in the case of something like DSP / Audio processing, you might have an if statement saying “do we need to increase the amplitude of our samples? ¬†if so do that”.

Depending on your use case, this can result in tens of thousands of checks per second or more, which can impact performance quite a bit.

This puts us in a pickle between two extremes:

  1. We can decide to live with the the checks to see which functionality is enabled.  The end result is code that is easy to maintain, but it comes at a cost in execution speed.
  2. We can hand craft each permutation of functionality needed, and call the correct function based on parameters in an outer loop. For example, in an outer loop, we call a function like “RenderPolygonNoTextureYesEmissive”. ¬†This means that we copy / paste a lot of code and change small pieces of each function to act like it should for the specific permutation. ¬†This results in form fitted code that can be a lot faster, but will be harder to maintain.

Someone might look at the above and say “#1 makes slower code that is easier to maintain? ¬†Programmers need to stop being lazy and just go with #2 so that it’s faster at runtime”. ¬†When code is easier to maintain though, it means that programmers can spend less time working on bugs and features in this code, freeing them up to work on other things, and also it means that there will be less bugs in the code to begin with. ¬†Making code that’s easier to maintain is a decision that affects team productivity and the business as a whole, so it isn’t something we can very easily dismiss, even at the cost of runtime performance.

Quick aside: there are features in modern CPUs like branch prediction, as well as features in modern compilers / linkers / optimizers that can mitigate some of this stuff, but they can’t always, and if you rely on the behavior of a specific compiler or CPU, you’ll have code maintenance problems when you have to start supporting a different compiler or CPU. ¬†Console and mobile processors are very different beasts than desktop processors for example, which can make relying on assumptions like this a big problem.

One common solution to the code permutation problem is to generate code for all the permutations needed to get the speed of solution #2, but have that code generated based on a few core pieces of code to get the maintainability of solution #1.   People can do this with clever macro usage, or even actually generate code using a custom build step while compiling.

In the graphics world, it’s fairly common that AAA game engines actually have a shader permutation compiler built into the engine to handle this exact problem. ¬†The situation there is a little bit different because shaders aren’t written in C++, but it’s the same basic problem.

I’m going to show a variation to the code generation solution using templates, template parameters, and inline functions.

Before we get into it I want to note two important things:

  • All observations I make about function inlining and optimizations preformed by the compiler were made using Microsoft Visual Studio 2010 using the default settings for a release built console application. ¬†Your mileage may vary!
  • The example code attached to this post works as a makeshift compressor and limiter. ¬†That DSP code is just there as an example of using this technique and this should in no way be used as a lesson on how to write a compressor or limiter. ¬†Many features are missing and major shortcuts have been taken to keep the code simple!

OOPS!

After posting this, a friend pointed out two things to me that I want to share…

  1. Apparently this technique is already well known and used. It’s called “traits” and it’s talked about in the book Modern C++ Design by Andrei Alexandrescu.
  2. The sample code doesn’t seem to compile in llvm or clang. I don’t have easy access to those guys so… sorry about that.

Inline Functions – The Silent Killer

Ok, I’m joking, inline functions aren’t the #1 cause of death for males between the age of 23 and 25.

When people talk about using inline functions, they always seem to only talk about how inline functions remove the overhead of the function call, setting up the stack for the local parameters, the cost of the return, and any object copies that might have happened for the parameters or return value.

A far less talked about feature of inline functions is that it promotes the function code to be a full sibling to the calling code in the eyes of the optimizer.

If your inlined function checks a boolean parameter to take different action whether it’s true or false, and you call that inlined function using a compile time constant for that parameter (ie just pass “true” , not a variable), that means that the optimizer has the opportunity to completely get rid of the branch of code that will never get executed and get rid of the if statement all together. ¬†The optimizer just ditched the inner loop “if check” we talked about earlier that was happening tens of thousands of times a second.

If you call the same inlined function in another place, passing a boolean variable for the parameter, that other location will keep the if statement intact and will act as you would expect.

This is the first step in our battle against code permutations – inline functions can be used to write code once that shrink wraps itself to whatever your specific use cases may be!

This also gives the optimizer the ability to combine various pieces of  code together in clever ways, especially if you call multiple inline functions and it has more code to work with.

Not All Unicorns and Lollipops – Code Bloat and Cache Misses

Inline functions do come at a cost though and are not a magical solution to every problem. ¬†Most notably, inline functions bloat the size of your compiled code. ¬†For most people, code size on disk isn’t really a huge problem because hard drives are huge and loading an exe sized file into ram is not going to be a problem for normal cases.

The problem with increasing code size mainly comes up when dealing with the instruction cache.

Processors have built in cache memory that they can read and write to super quickly, but the cache memory can only hold so much data. ¬†Whenever the processor needs to read something from RAM that isn’t in the cache, it has to load it from RAM into the cache, and then it can read it. ¬†This is called a cache miss and can a major source of performance problems. ¬†Desktop CPUs can get around this problem by doing other things while waiting for the data to come in (http://en.wikipedia.org/wiki/Out-of-order_execution), but other processors like consoles and mobile devices aren’t able to do that, so they just sit there stalled while waiting for the memory to come back. ¬†Very slow!

To minimize cache misses, you should set up your code and data structures so that they don’t have to jump around in memory a lot, which can cause the CPU to have to load and unload the same peices of RAM into the cache needlessly.

Another thing you can do to minimize cache misses is to try to make your data as small as possible by storing less data or using smaller data types. ¬†This way, the CPU is able to hold more meaningful data in it’s cache at a time, which means less RAM access needed.

These techniques also apply to your program itself inside the instruction cache. ¬†If you have code that jumps around a lot (e.g. gotos that jump really far away for those that use gotos, or excessive function calls), that can cause the CPU to have to excessively load different parts of your program from RAM into it’s cache which can be a costly operation. ¬† Also just like with data, if you keep your code smaller, the CPU will be able to hold more meaningful data in the cache at a time and have to hit RAM less.

Lastly, if you ever make a for or while loop that is too large to fit in the instruction cache, that means that you can get a cache miss each time you iterate through the loop!  Making functions inlined, especially if they are called multiple times inside of the loop, can cause the contents of a loop to become larger and cause this problem.

In the end though, if you’d like you can just mark your functions as inline and let the compiler decide if it would like to actually inline them or not since it’s just a hint. ¬†A lot of the times the compiler ought to make decent choices for you, but of course you are free to go down into this rabbit hole as you want. ¬†Since the compiler doesn’t actually understand your algorithm, you can certainly do better than it can in some situations and make better trade offs. In MSVC there is an option to disallow the compiler from going against your wishes for inlining a function if you want to go this route. Other compilers probably have similar options.

The Main Course – Templated Classes With Worker Slots

Now that you see how inline functions can be used to write custom fitting code permutations for you, you can use this to your advantage by having the class or function that does your work take a template parameter for each piece of configurable work that you would like it to do.

You might define a class like this:

template &lt;class EmissiveLighter, class Texturer&gt;
class CRenderer
{
public:
	static void Render(const CSomeDummyParams &amp;params)
	{
		// render
		...
		// now apply emissive lighting and texture
		EmissiveLighter::ApplyEmissive(params);
		Texturer::ApplyTexture(params);
	}
};

Then, you might define “Slot Classes” like the below

//do nothing for emissive lighting and pay no runtime cost for using this
class EmissiveLighter_None
{
public:
	static inline void ApplyEmissive(const CSomeDummyParams &amp;) {}
};

//apply the emissive color defined in the params struct
//Note, we are using templates but not interfaces so the parameter declarations can change to suit our needs
class EmissiveLighter_Params
{
public:
	static inline void ApplyEmissive(CSomeDummyParams &amp;params)
	{
		params.m_outColor += params.m_emissive;
	}
};

Now, you can call the render function and supply what operations you want to use for each slot and it will generate the form fitted permutations for you.

// no emissive lighting, no texture
Render&lt;
	EmissiveLighter_None,
	Texturer_None&gt;
	::Render(params);

// emissive lighting from params, no texture
Render&lt;
	EmissiveLighter_Params,
	Texturer_None&gt;
	::Render(params);

// emissive lighting from params, texture from params
Render&lt;
	EmissiveLighter_Params,
	Texturer_Params&gt;
	::Render(params);

// emissive lighting from params, use procedural noise texture generator
Render&lt;
	EmissiveLighter_Params,
	Texturer_ProceduralNoise&gt;
	::Render(params);

Also, a quick note… templates also cause code bloat like inline functions do, because behind the scenes, it will create a different class for each permutation of template parameters that you actually use. ¬†Just something to be wary of!

Why No Interface Classes?

You might be saying “you know, you could make an interface class with pure virtuals to make sure that your slot classes implemented the required functions in the required ways. ¬†That would increase type safety and such”.

That is definitely true, but in this case I think there is actually good value in NOT making interface classes that the various slot classes derive from.

To see what I mean, check out the difference between the two emissive lighter classes i defined.  One takes the params parameter as a const reference, and the other takes it as a non const reference.  This lets you decide what you want to do with the parameters on an implementation by implementation basis.

This lets you give further optimization hints to the compiler, allowing you to define unused parameters as const refs.

It lets you further “shrink wrap” the code to suit your specific needs in each permutation, and you still get a lot of compile time protection when you use the type in a template.

Last Words

If you go this route, it definitely could make some issues more difficult to debug, especially if you stray from the straight and narrow usage. For instance, you might make the EmissiveLighter_Params set some value on the params that the Texturer_Params uses because Texturer_Params assumes that emissive will always use params too when texture params is used. If you switch the emissive to something else but leave the texturer alone, the texturer could stop working and it could be hard to realize why. My advice if using this technique is to make each slot class as isolated as possible and make as few assumptions as possible about other slots. Also make sure and name slot classes so that it’s very obvious what they do from the name and make sure they don’t do anything other than the name implies. If they do anything else, change the name to reflect that! Don’t worry about not being able to share calculations between slot classes to improve efficiency either… since they ought to all be inlined, the optimizer ought to be able to combine some of that and increase re-use for you when it’s appropriate.

When using this in the real world, you might have to come up with some sort of “multiplexer” function where you pass it your variable parameters and it will return a function pointer to the specific function you ought to use. ¬†A function pointer points at the function in the specific template instantiation asked for, so getting a function pointer back would be enough to translate your parameters into the right template instantiation to use. I’m not sure of a way to do this other than a bunch of embeded switch statements but I’ll be thinking about it and report back or write a new article if I come up with an interesting way to do that. The main problem there comes in mapping run time dynamic values to compile time static values.

Since this technique is using template classes and/or template functions you might be able to do some interesting things with template specialization, partial template specialization and template parameter deduction.

For instance, you could introduce SSE calls if you have to process more than a specific amount of data, but just do simple loops if doing less than that.

I made some example code that goes into this stuff in some more depth and has a few interesting techniques I didn’t go over in the article.

The example source code makes frequent use of casting like “(DataType)0.5” but don’t worry, those ought to be compile time operations for all basic types and have no runtime cost (verified in my compiler anyways)

Example Code: TemplateSlots.cpp

Comments

comments

Mechanical Computer Quest Part I

For years I’ve been wanting to make mechanical logic gates that i could put together into mechanical logic circuits. I want to do this mostly for novelty purposes, but I also think it could be leveraged into either an entertaining learning device, or some sort of game or puzzle.

Since I’m a software engineer, one of the main problems I’ve had is my inability to manifest my ideas in a physical way. Thanks to some modern technology, I think I might be able to partner up with my good buddy Eric and finally make this idea into a reality!

I’m sure some of the things I’m trying to figure out are solved problems, but I want to try and figure it out as much as I can before researching the right way to do it hehe.

Here is some basic background knowledge relating to what I’m trying to achieve, as well as my plans for moving forward.

Logic Gate Basics

Logic gates are the basic building blocks of logic circuits and computers. You can put them together in various ways to make lots of different things happen such as adding or multiplying numbers together, or comparing two numbers to see which is bigger.

The basic logic gates themselves take in either 1 or 2 inputs, and give one output. The inputs and outputs are binary where each one is either a one or a zero.

The 3 basic textbook logic gates are AND, OR and NOT.

AND: this takes in two inputs and gives one output. If both inputs are 1, it gives a 1 as output, else it gives a 0 as output.

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

OR: this takes in two inputs and gives one output. If either input is 1, it gives a 1 as output. If the inputs are both 0, it gives a 0 as output.

0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1

NOT: this takes in one input and gives one output. If you give it a 1, it gives out a 0. If you give it a 0, it gives out a 1.

NOT 0 = 1
NOT 1 = 0

If you are able to do these 3 things, and combine them together arbitrarily, that is considered “Turing Complete”. That basically means it’s capable of doing anything a computer can do, but check out this wikipedia page for more information http://en.wikipedia.org/wiki/Turing_completeness.

Mechanical Logic Gates

To make mechanical versions of these things I was thinking of having each logic gate a self contained object that had sticks for input and output. If a stick was pushed “up” it would be a 1, if a stick wasn’t pushed up, it would be a 0.

I then set out to make a wooden prototype of each gate to make sure I had the basic designs worked out.

NOT

To make a NOT gate, you would have a box with an input stick going into it and an output stick coming out of it. If you pushed the input stick up, the output stick would go down.

If you pulled the input stick back down, the output stick would go up again.

I made one of these out of wood by making a box with two sticks going into it, that inside were attached with a cross peice that could rotate but not move.

This way, when you push up the input stick, the crosspeice rotates and the output stick goes down. When you pull down the input stick, the reverse happens.

It worked fairly decently.

NotGate

OR

To make an OR gate, I made a box with two input sticks going in and one output stick coming out. I had the output stick attached to a cross peice that rested above the input sticks. This way, when you pushed either input stick up, the output stick would be pushed up.

One thing I didn’t like about this design was that it relied on gravity to reset the output after changing the input pins. I’d prefer the mechanical logic gates made as few assumptions about their environment as possible so that there are less points of failure. That way, the mechanical computer would work in space, or on it’s side, etc. A spring could be used, but springs wear out and that’s just another moving part that can fail.

It wasn’t 100% ideal but it worked “ok”.

OrGate

AND

To make an AND gate, if you push up one input pin, nothing should happen to the output pin and it should remain at zero. If you push up both input pins, the output pin should then push up.

The best way i could think of how to do this was to have the output pin rest on a slack string. The string should be slack just enough such that if you push up one input pin, it takes the slack out of the string and rests against the bottom of the output pin. If you push up the second pin, it should make the output pin raise up to the full output pin level.

Unfortunately I never got this working correctly – I’m really bad at making things, I told you! haha.

I wasn’t a fan of the string since it was a finicky, error prone part of the design that could easily fail.

Also, you would need to attach the string ends to the side of the box or something to make sure that as you raised both pins, you didn’t just re-introduce the slack in the string between the sticks. It sounds in my head like it would work, but again, unfortunately I’m a SOFTWARE engineer and this is not really my forte so I’m not sure hehe.

AndGate

Connecting Logic Gates, Metrics and Physical Requirements

So, assuming the mechanical logic gates actually work like they should, there are still some problems to solve, as well as some in general design requirements about how these gates need to work.

  1. Gate pins have to be able to connect to eachother somehow. An input pin needs to be able to connect to an output pin in a rigid way such that if one moves, the other one moves as well.  It would be nice if there was a way for the peices to easily snap together and easily be taken apart again. During normal use they should stay snapped though obviously!
  2. The difference between zero and one needs to be standardized. For instance, the difference between a pin being pushed out and pushed in could be one inch, and all logic gates and all logic gate configurations need to obey this measurement in both their input and output pins.  Some metric needs to be decided on when prototypes start getting designed.
  3. The different logic gates should be the same dimensions (or compatible dimensions) so that you dont end up with a situation where you are trying to connect two logic gates together but can’t because there is a gap between them. ¬†We’ll have to figure out the metrics when prototypes start getting designed.
  4. If i have a gate which takes two inputs, changing one input value should not change the other input value. For example, if in the OR gate, the input pins were rigidly attached to the cross bar, pushing up a pin on either side would cause the other input pin to also change from 0 to 1.  Gates will have to be designed so that this is true.
  5. It would be nice if the gates were set up such that you could see their internals working. This is just for fun to be able to see the computations at work. ¬†We’ll see if we can make this happen when making prototypes.
  6. The gates should be as simple as possible, and manufacturing should be simple. each gate should be made of similar basic building blocks with as simple design and as few moving parts as possible. ¬†We’ll have to keep the design simple when making prototypes. I want to stay away from strings, springs, wheels etc. Nothing that wears out quickly or has any significant fail rate.
  7. When you push a pin up that results in a cascading change across several other gates, friction / resistance shouldn’t be so hard that it gets unreasonably hard to push a pin up, or cause damage to the gates themselves when they are used reasonably. ¬†Hopefully this will be true if we use a light but strong material.
  8. It would be nice if gates liked to rest at whatever setting they were at (1 or 0), as opposed to easily sliding between values (0.8 for instance), or some gates having a tendancy to want to rest at zero instead of one. ¬†I’m not sure if we are good enough engineers to make this happen haha… we may just have to say that these mechanical gates have to operate on their sides or something unfortunately…We’ll have to see!
  9. Changing the input pins should be action enough to change the output pins of each gate. What i mean is that if you had one of your OR inputs as a 1, making it so the output is 1, when you pull down the input pin back to 0, the output pin should also move back to 0, and not require any extra action, such as pushing down the output pin manually.  Just something to be thinking about in the design.

NOR and NAND Logic Gates

There are actually a couple other ways to make logic circuits besides using AND, OR and NOT.

There is something called a “Universal Gate”, which is just a logic gate that can be combined with itself in various ways to make the functionality of AND, OR and NOT (and thus others such as XOR, XNOR, and everything else).

The two universal gates are NOR and NAND. Here are the truth tables:

0 NOR 0 = 1
0 NOR 1 = 0
1 NOR 0 = 0
1 NOR 1 = 0

0 NAND 0 = 1
0 NAND 1 = 1
1 NAND 0 = 1
1 NAND 1 = 0

For more information on how these gates are universal, check out these Wikipedia pages!

http://en.wikipedia.org/wiki/NOR_logic

http://en.wikipedia.org/wiki/NAND_logic

While figuring out how to make a NOR or NAND mechanically would solve my problem of not having a working design for the AND gate, it goes against one of my design goals, which is to be able to give someone AND, OR and NOT building blocks to put together and so learn logic circuits physically the same way they are shown in text books and online.

One possibility would be to take NANDs or NORs and put them together to make AND, OR and NOT gates that came pre-assembled, but that goes against my design goal of keeping things as simple as possible with as few moving parts as possible. It would also add needless visual complexity to the computations, which might LOOK NEAT, but would just add confusion to what was going on.

I haven’t yet decided on a solution to these issues.

Manufacture

For manufacturing, I wanted to work with my friend Eric to get some 3d models made up and get them printed up into 3d objects by http://www.shapeways.com/.

I’m sure it’ll take multiple iterations to get it right, but this seems like a pretty nice solution.

After we have proven our idea on a small scale, maybe in the future we’ll be able to do something like get a kick starter project together to make some kind of mechanical logic gate set you can buy, or some sort of board or puzzle game.

Next Steps

For the next step, I want to think about our logic gate options a bit more, talk to Eric about them, and try and get some prototypes modeled and printed out.

I’ll post an update when we make some progress (:

Comments

comments

B.A.M. Neural Networks

Neural networks (officially “Artificial Neural Networks”) are computer simulations of neurons. ¬†Simulating neurons in software allows programs to do things that you would normally need a human brain to do, such as recognizing patterns, learning over time, or making non-obvious decisions based on complex data.

Simulating neurons is not enough to create human levels of intelligence however.  Last I heard, someone could make toddler level intelligence via neural networks, but even that is somewhat misleading since toddlers can walk, use vocal chords, swim and understand complex emotions, while the neural network could do none of those things.

Despite the limitations of neural networks, there are quite a number of practical uses of artificial neural networks in the real world. ¬†These uses include…

  • Helping missiles identify enemy tanks or combatants on the battlefield
  • Helping to predict stock market trends (It’s rumored that several top traders have proprietary neural networks which help them preform better)
  • OCR (turning scanned images into text documents based on the text in the image)
  • General machine vision (like, for robots or security systems)
  • Controlling complex machinery at speeds a human wouldn’t be able to keep up with
  • Facial recognition in computers
  • Diagnosing medical conditions

I was reading an article in Scientific American recently about how a girl in high school trained a neural network to recognize certain types of cancer with 99% accuracy.   She trained a neural network to analyze the results of a non-invasive cancer test which up til then had been too unreliable to use in any realistic situation.  Her neural network learned some hidden pattern in the data that we have not yet discovered or understood.

Just like in that example, you can feed a network complex data for it to look for patterns in, but unfortunately it won’t be able to explain to you what it learned, or what it looks for when trying to recognize patterns. ¬†It can learn, but it can’t tell you what it learned.

My friend Doug often tells a funny story where this didn’t work out so well. ¬†I’m not sure if it’s true exactly as told or not, but it definitely is plausible. ¬†Apparently the US army for whatever reason was training a neural network to recognize tanks in the battlefield (surely for a missile or perhaps some kind of recon drone).

They fed the network hundreds or thousands of photos which either contained a tank or did not. ¬†While they fed each picture to the neural network, they also told it “yes this photo contains a tank” or “no, this photo does not contain a tank” so that it was able to infer what it was that people were trying to teach it.

The network learned well, and with their sample photos, it was getting a very high rate of correct answers about whether a tank was there or not.

However, when they deployed this to the battle field (or perhaps, just a realistic live test run), it failed miserably and was getting near 0% accuracy.

Since the network wasn’t able to explain it’s thought process, the people involved were forced to try and deduce what the problem was and eventually they noticed a pattern….

The photoshoots apparently happened on 2 seperate days… on the first day they took pictures of tanks, and the second day they took pictures without tanks. ¬†The unfortunate truth is that it was overcast on the first day, but clear skies and sunny on the second, which means…. the neural network learned to distinguish between a sunny day and an overcast day, not whether nor not there were tanks present!

A pretty funny story, but it shows the importance of giving good, realistic data to your network to learn from, or else you can run into silly problems like that!

Types of Neural Networks

There are a lot of different types of neural networks that have different properties and so thus are useful in different situations.

Some of the main types of flavors are…

  • Supervised learning – Just like in the tank example above, you give information to the neural network to learn, and you also tell it the information it should learn from each peice of data (ie… here’s a picture, it DOES contain a tank).
  • Unsupervised learning ¬†– These work by finding natural groupings of input data. ¬†You can then look at the groupings (or ask it to group more data) and can gain information about the nature of the data itself. ¬†This is often used for data mining, having the neural network pick out interesting correlations in the data that a human might not figure out.
  • Some networks are static once they are created and are unable to learn further.
  • Other networks are able to continuously learn as they get more and more data.

Local Minima vs Global Minimum

Just like a human brain, a neural network is not infallible. ¬†It can often think that is has found an answer, or something of interest, when in fact it hasn’t.

Similarly, we as humans can sometimes think we’ve found a solution to a problem, and then someone comes along and says “You forgot to consider this part of it!” and suddenly you realize your solution is not the right answer and it’s back to the drawing board.

A neural network can have the same issue believe it or not. ¬†This can come from the fact that it wasn’t provided with good enough data to learn from, or, just because! ¬†Just like humans, it can either just learn something wrong, or be incorrect about an answer.

If you think of a problem space as a graph, you can think of the lowest point on the graph being the optimal solution.  How neural networks work is by starting at some point on the graph (often chosen randomly) and then traveling downhill til it finds the bottom of a dip.

This works great if you happen to find the lowest dip in the graph (also called the global minimum), but if you find a dip that isn’t the lowest, you have effectively become trapped in a local minima, and end up with the wrong answer, or an imperfect understanding of the problem space.

minima

There are ways of dealing with this problem luckily.  One way is that if you find a minima, remember where it was at, but then choose another random point on the graph and try again to see if you find a deeper minima.  Rinse and repeat until you are reasonably satisfied with the results.

Have you ever been stuck trying to figure something out and forgotten about it (or went to sleep), only to come back to it later and find the answer. ¬†I personally attribute that phenomenon to this “randomization” effect. ¬†I don’t know the results of studies to this effect, but if you stop thinking about something for a while, then come back to it, you often see it from a different angle (essentially starting at a different spot on the problem space graph) and can sometimes figure out a better solution, or a deeper understanding of the problem (pun intended!)

Anyways, for normal problem spaces, they probably aren’t going to be just 2d like the above, but are perhaps 3d, 4d, or even higher dimensions. ¬†In the end though, the neural network still is just trying to find the deepest valley it can find by essentially traveling downhill.

Now that you know the basics of neural networks, let’s get onto the implementation of one type of neural network!

Bidirectional Associative Memory (AKA BAM)

Bidrectional associative memory is perhaps the easiest useful neural network to create.  All you need is the ability to multiply vectors by other vectors, multiply vectors by matrices, and add matrices together.  If you know how to do those 3 things, you will be able to program your own neural network very quickly and easily.

In fact, I learned about this guy when I was 14 or so, and was able to implement a simple OCR system using microsoft excel (seriously!) ūüėõ

The main point of BAM is to act as memory, where you can teach it to associate several patterns together. ¬†This way, if you teach it to associate a pattern A with a pattern B, when you give it A again, it will spit out B. ¬†Since it’s bidirectional, you can also give it pattern B and it will spit out pattern A in response. ¬†You can teach it several pattern pairs to associate, and can even corrupt some of the data, and it will give you what it thinks is the best match for the data you gave it. ¬†Just like a human brain, it sort of uses it’s best judgement and can say “umm… i think you meant pattern D but I’m not quite sure”.

Besides associating different patterns, you can also associate patterns with themselves (such as associating pattern A with pattern A and pattern B with pattern B). ¬†If you do this, you are able to put in possibly corrupted data and it will give you what it thinks the data really is. ¬†This way, if you have data that is noisy because it came over the radio, or because you scanned a document with a low quality scanner, it will be able to see through the noise and pick out the correct data (hopefully) in the same way a human could. ¬†There are limits to this of course though, just like sometimes we can’t make out messy handwriting sometimes.

While BAM is useful, even in some realistic uses of neural networks, it also is a little bit limited compared to more sophisticated neural network implementations.

  • BAM is fairly limited in how many patterns you can teach it
  • You have to teach it in advance via supervised learning. ¬†No further learning happens after it’s created.
  • It’s fairly strict in it’s mapping from input to output. ¬†This means if you use it to recognize written or typed letters, it will be thrown off by variations in handwriting or different fonts.

That being said, it’s still pretty cool, and lots of fun to play with.

Creating a BAM Network

Creating a BAM network is pretty straight forward.  It has M input bits (you decide how many that is) and N output bits (again, you decide how many that is).

Once you have all of your input / output data pairs, the first step is to convert all the zeros in your pattern pairs to -1’s. ¬† Where 0’s and 1’s is called binary,¬†this form of -1’s and 1’s is called “unipolar”.

Then, for each pattern you multiply the input pattern vector, by the transpose of the output pattern vector (turn it on it’s side) so that when you multiply them together, you get a matrix that is MxN in size.

Continue this for each data pair so that you end up with one matrix per data pair.

Then, add all the matrices together so that you end up with a final matrix.  This is your trained neural network!

In the BAM neural network, the neural topology is that there are M input neurons and N output neurons, with no neurons¬†in between. ¬† ¬†The more neurons you have in your network, the more data the neural network is able to store, and the more distinctions between different types of data it’s able to make. ¬†The fact that there are only 2 layers of neurons in BAM is part of the reason for it’s limitations.

For more information about why 2 layers of neurons are limited in their learning, I¬†recommend¬†searching for information on the perceptron xor problem and linear¬†separability. ¬†A 2 layer network is¬†inherently¬†incapable of preforming (and perhaps “understanding”) xor!

Using a BAM Network

To use a neural network, you take an input vector (in binary) of size M and multiply it by the matrix. This will result in a vector of size N that it made up numbers which may be positive, negative, or zero. Convert all positive numbers to 1 and all negative numbers to 0, and you’ll end up with the N sized output pattern that the neural network associated with.

Dealing with zeros is sort of up to your own discretion unfortunately. I’ve seen some people say that zeros should be treated as 1’s, other people say that zeros should be treated as 0’s, and other people have other rules such as “if it’s zero, set it to whatever that bit output the last time you had it output something” which IMO is a pretty odd way to deal with it. ¬†I think this is an unfortunate flaw in how the BAM network works, but you can also chalk this up to the network being uncertain of the result, which it basically is.

Since BAM networks are bidirectional, you can also take the transpose of your matrix (turn it on it’s side) and then multiply it by a vector sized N to get a vector of size M as output, which is the vector associated with your N sized input vector. ¬†So, it works both ways; you can put in an input pattern and get an output pattern out, or you can put in an output pattern and get an input pattern back.

Don’t forget… you can also associate patterns with themselves if you want¬†it to do “pattern recognition” instead of “pattern association”.

Example

Let’s say we want to have an input size of 6 bits and an output size of 4 bits and that we have these 2 data pairs that we want to associate in the neural network:

  1. 101011 <-> 0010
  2. 110010 <-> 0101

The first step is to convert all zeros to -1’s. ¬†Doing so our data pairs become this:

  1. 1 -1 1 -1 1 1 <-> -1 -1 1 -1
  2. 1 1 -1 -1 1 -1 <-> -1 1 -1 1

The next step is to multiply the input patterns with the output patterns to make a 6×3 matrix for each pattern pair.

First Pair:
1 * [-1 -1 1 -1]
-1
1
-1
1
1

=

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

Second Pair:
1 * [-1 1 -1 1]
1
-1
-1
1
-1

=

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

Next up, you add all the matrices together to get the final trained neural network:

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

+

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

=

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

Now that we have our trained neural network, let’s plug in our first input pattern to make sure we get our first output pattern

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

*

1
0
1
0
1
1

=

-4 -4 4 -4

When we convert negatives to zeros, and positives to ones we get:

0 0 1 0

Which is the first output pattern.  It recalled our pattern correctly!

Next, let’s put the second output pattern into the transposed matrix to see if we can go the opposite direction and recall the second input pattern.

-2 0 0 2 -2 0
0 2 -2 0 0 -2
0 -2 2 0 0 2
0 2 -2 0 0 2

*

0
1
0
1

=

0 4 -4 0 0 0

Converting that to binary by turning negatives into zeros, and non negatives into ones, and zeros into question marks we get:

? 1 0 ? ? ?

The pattern it was supposed to recall is:

1 1 0 0 1 0

The two bits that it did recall are correct, but as you can see it only recalled 2 of the 6 bits. Not very good!

With just two patterns, the network was unable to recall some of the info it was trained with.

Normally BAM isn’t this bad, it looks like I just chose some unfortunate input / output pairs. If you encounter problems with a network recalling data, sometimes adding more neurons (larger input or output patterns) can help, but sometimes that will be ineffective too. Like i mentioned earlier, a neural network that has only 2 neuron layers – like BAM does – is incapable of learning XOR, no matter how many input or output neurons you have, so these types of networks are somewhat limited.

Using for OCR

If you wanted to use a BAM network for being able to recognize drawn or written letters, one way to do so would be to say “we are going to store our letters in an 8×8 black and white grid”.

That means that you have a grid of binary (black / white) that is 8×8. ¬†Another way to represent the grid of binary would be just to have 64 bits in a row.

So, for the letters you want to train the network to recognize, you would just draw out your letters in an 8×8 grid, take each letter as it’s 64 bits, and associate each letter with itself.

Your network matrix will be 64×64 but will be able to do simple OCR on 8×8 black and white images.

Often times, images will come to you in color, or not in 8×8 resolution, but what neural network engineers often do in this situation is they will process the images in advance to make them black and white, and 8×8, before feeding them into the neural network.

Now, you are able to feed characters into your neural network and it will attempt to correct any corruption in the image, and return to you an 8×8 image of what it think you entered.

Instead of associating a letter’s image with itself, you can also associate it with a number (say, the ascii code?) so that when you put in the image of a character, it will spit out the number¬†corresponding¬†to the closest match it can find instead of the raw character image itself.

That’s All!

BAM is a nice introductory neural network that nearly anyone can implement.  It may be limited in some ways, but it actually is used in the real world for some applications.

In the future I’ll write about some more advanced neural networks, but until then, I hope you found this informative, or at least interesting! (:

Comments

comments

How to Render the Mandelbrot Set

mandelbrot set

The Mandelbrot set is a beautiful creation of mathematics discovered by a French-American mathematician named Benoit Mandelbrot. it is also a fractal, meaning that it’s infinitely detailed and that it’s self-similar and made up of smaller versions of itself.

Wikipedia does a better job of explaining the history and the background more than I do so please check out this link for more info!

Mandelbrot Set

I also wrote an HTML5 powered mandelbrot set viewer that you can use to explore the fractal and manipulate colors to create your own mathematical works of art.

HTML5 Mandelbrot Explorer

In this article I’m going to explain how to render a Mandelbrot set yourself. ¬†It’s going to be from a programming slant more than a mathematical slant so if you want the raw unadulterated math, I¬†recommend¬†checking out¬†Wikipedia¬†or other sources.

Rendering the Mandelbrot Set

The first step is to have some way to draw individual pixels on a canvas of some sort. It doesn’t matter what method you go with, it just matters that you are able to draw pixels somehow.

Various ways include:

  • direct pixel accesss in HTML5 (what my Mandelbrot Explorer uses)
  • drawing pixels into an image file
  • using DirectX or OpenGL to render it to a 2d screen buffer.
  • using graph paper, a calculator and some colored pencils to create it by hand (Possible, but ouch! Send me a picture if you actually do this! hehe)

Viewport

Now that you have a rectangle that you are able to render pixels to, we need to define a viewport.

In the Mandelbrot Set image at the top of this article, my rectangle is about 500×500 pixels big, and the x axis ranges from -2.5 to 2.5 and the y axis also ranges from -2.5 to 2.5.

I like to define my viewport in terms of the center point, and the width and height, so our viewports parameters are a center point of (0,0) and width and height of 5.

We have now established the parameters of our viewport!

We now need to iterate through each pixel in our rectangle and do the following steps for each…

Pixel Space to Viewport Space

The first thing we need to do is convert from pixel coordinates to viewport coordinates.

How we do that is like this:
ViewportX = ViewportMinX + (PixelX / PixelWidth) * ViewportWidth
ViewportY = ViewportMinY + (PixelY / PixelHeight) * ViewportHeight

After we have converted our pixel’s location from pixel space to viewport space, we are ready to do some math.

The Magical Mandelbrot Function

Like I mentioned earlier, the Mandelbrot set is a work of mathematical art. ¬†The function itself isn’t very complex but it involves imaginary numbers. ¬†To calculate the Mandelbrot set itself, you plug the viewport location of the pixel into the function. ¬†After that, you take the output of the function and plug it back into the input of the function. ¬†You continue this until the output of the function goes above some value (the common value to use is 2. ¬†You’ll see me compare against 4 because I’m comparing squared numbers). ¬†If the output goes above the threshold, it has essentially “escaped”.

There are pixels which may take a very very very large amount of iterations to escape, and as far as I know, they haven’t proven that all values will escape (I could be wrong though), so besides waiting for the function to “escape”, you should also set a maximum iteration count to keep it from iterating forever (or for a very long time).

The number of iterations it took to escape is what you use to set the pixel color for that pixel.

Here is some code (pseudo javascript) to show you the details of that process:

var g_maxIterations = 255; //TODO: set this to however many iterations you want to allow maximum
var currentX;  //TODO: need to set this to the viewport X location of the current pixel
var currentY;  //TODO: need to set this to the viewport Y location of the current pixel

var z = 0;
var zi = 0;
var inset = true;
var numInterations = 0;

var newz;
var newzi;

for(indexIter=0; indexIter<g_maxIterations; ++indexIter)
{
	newz = (z*z)-(zi*zi) + currentX;
	newzi = 2*z*zi + currentY;
	z = newz;
	zi = newzi;

	if(((z*z)+(zi*zi)) > 4)
	{
		inset = false;
		numInterations = indexIter;
		indexIter = g_maxIterations;
	}
}

if (inset)
{
	//we never escaped, this pixel is the default color
	//TODO: render a default color pixel
}
else
{ 
	//we escaped!  numInterations is how many iterations it took
	//TODO: convert iterations to a color and render the pixel
}

Colorizing The Pixel

There are several ways you could turn an iteration count into a color for a pixel. There are some ways listed below, but this is definitely not an exhaustive list! Play around with your own techniques and see what sort of interesting things you can create!

  • Make a maximum iteration time of 255. ¬†Make your output image be an 8bit greyscale (or color palleted) image, taking the iteration count and writing that out raw as the output color.
  • Make several ranges of iteration values (for instance… 0-255, 256-511, 512-767, etc) where you define a full RGB color at each edge of the value ranges. ¬†From there, figure out where your iteration count falls within the value ranges, and do a lerp between the color to the left and the color to the right based on your distance in the specific value range. ¬†This way, you have smoothly blending color gradients and can go well beyond 255 maximum iterations. ¬†I use a variation of this in my HTML5 Mandelbrot Explorer.
  • Use arbitrary math functions to figure out the RGB of each pixel. ¬†Such as R = Iterations % 256, G = (Iterations * 3) % 256, B = (Iterations * 7 + 39) % 256.

After you have the color for your pixel, you are done. Render that pixel, then move to the next until you have rendered them all.

Zooming and Panning (Scrolling)

By virtue of setting up a viewport as a centerpoint and a width and height, and making the code use that information to convert from pixel space to viewport space, we have made it really simple to implement zooming and panning.

To zoom in, just set your viewport width and height to be smaller (like divide them by 2 for instance). ¬† To zoom out, set your viewport width and height to be larger. ¬†You want to make sure and keep the same aspect ratio (width / height) in your viewport as your rendering rectangle to avoid distortion though, so be careful. ¬† OR, you may want the distortion… it’s up to you (:

To pan the screen, or scroll it left, right, up or down, you just change the centerpoint to be more to the left, the right, higher, or lower.

Very simple, but it is really fun to scroll around and zoom in and out on the fractal to discover new and interesting features to share with your friends.

If you zoom in far enough, you might notice that at some point you have vertical or horizontal lines instead of the fractal shape, and that if you zoom in a little bit more, you’ll get a solid color.

You might ask yourself “Hey, I thought he said fractals were infinitely detailed?”

Well they are infinitely detailed, and in theory you could zoom in FOREVER and always see more and more things, but computers themselves don’t have infinite precision (it would take a computer infinitely large to let you zoom in infinitely), and you are just seeing the edge of the precision of your computer.

If you are using a language like C++, you can change your code to use doubles instead of floats to get a little more breathing room, or another option is to use a scientific mathematics library that is capable of a lot more precision than floats or doubles.

That’s All!

That’s really all there is to it, not that complex is it?

As I keep mentioning, I made something that allows you to explore the Mandelbrot Set in your browser.  You can find that here: Mandelbrot Explorer

Anyways, if you have any questions or comments or want to share some screenshots of creations you made, drop me a line or leave a comment in the comments section with a link to your creation for other people to check out too!

Comments

comments

Bias And Gain Are Your Friend

Often times in game development, you have situations where you want an object to move from one place to another, you want something to grow or shrink from one size to another, you want a color to change from A to B, or any other one of the myriad tasks where you want to do something from A to B over time (or over distance).

That’s pretty abstract but let’s take some examples:

  1. You want to move a camera along a straight line from A to B
  2. You want to raise the lighting from dark to bright in a room
  3. When the player clicks an icon, you want to grow a window from small to big
  4. You want to cross fade one skeletal animation to another via the blend weights of the animations (an example from my Anatomy of a Skeletal Animation System articles)
  5. You want to use a gradient in a shader for some effect.

When you are doing these things, it’s real easy to take a percent based on time or distance and just use that percent raw to make a linear effect. ¬† Often times a linear effect just isn’t good enough though because it looks or feels mechanical instead of organic, and unpolished.

Often times the way these things are softened and made more organic is by giving a content creator a curve editor so that they can soften the edges, speed up or slow down the processes over time or distance.

Many game engines don’t come with curve editors that can be easily used for these purposes, and other times you just want to deal with it in code for one reason or another, so don’t have the luxury of giving a content creator carte blanche with a curve editor.

There are a couple techniques for handling these situations but I want to talk to you about 2 of my favorite techniques, which are Ken Perlin’s bias and gain functions. ¬†I actually use Christophe Schlick’s faster¬†approximation¬†functions (as seen in game programming gems 2), but the end result is the same thing.

If you want to skip ahead and see these things in action, I made an interactive demonstration about these functions, check em out! HTML5 Bias and gain

Bias – Not as in bigotry

The bias function takes in a number between 0 and 1 as input (I like to think of this as the percent) and also takes a number between 0 and 1 as the “tuning parameter” which defines how the function will bend your curve.

With a value of 0.5, the percent you put in is the percent you get out (so is linear), but if you put in a number > 0.5 or < 0.5, that’s when the interesting things happen.

Shown here are graphs of the bias function with parameters of 0.5, 0.25, 0.75 and 0.97:

Bias 0.5 Bias 0.25Bias 0.75Bias 0.97

In javascript, the code for bias looks like this:


function GetBias(time,bias)
{
  return (time / ((((1.0/bias) - 2.0)*(1.0 - time))+1.0));
}

Gain – Not as in my weight during the holidays

The gain function is like bias in that it takes in both a 0 to 1 input (I think of this as the percent as well) and also takes a number between 0 and 1 as the “tuning parameter”.

Again, with a value of 0.5, the percent you put in is the percent you get out (again, this makes it linear) but if you put in other numbers, you get interesting curves.

Here are graphs of the gain function with the same parameters of 0.5, 0.25, 0.75 and 0.97:

gain 0.5Gain 0.25Gain 0.75Gain 0.97

In javascript, the code for gain looks like the below. You might notice it makes use of the GetBias function. Gain is just bias and reflected bias.


function GetGain(time,gain)
{
  if(time < 0.5)     return GetBias(time * 2.0,gain)/2.0;   else     return GetBias(time * 2.0 - 1.0,1.0 - gain)/2.0 + 0.5; }

That's It!

Well that's about it, pretty straightforward stuff. Wherever you find yourself using a percent in your code, you can try passing it through a bias / gain function (and optionally exposing the tuning parameter to content creators) and see if you can make things feel a little more organic and polished.

Sometimes its the little things that make the biggest difference!

One again, the link to the interactive example of these things is at:
HTML5 Bias and Gain

Comments

comments

Anatomy of a Skeletal Animation System Part 3

This is part three of “Anatomy of a Skeletal Animation System”

Animation System Optimizations and Features

Here are some various animation system optimizations and techniques that you might find useful…

Multithreaded Animation Blending

If you are even mildly comfortable writing multithreaded code, this one is fairly easy to implement.

Basically every animated model that needs an update goes into a queue every frame. ¬†(Things that haven’t been on screen for a little while could be exempt from the list so you don’t waste time on things that aren’t being rendered)

At some point in your main loop, you do the animation sampling / anim blend tree blending / etc work to come up with the final bone group. You do this by grabbing the first model in the queue, processing it, then moving to the next model.

Your main loop doesn’t continue until all of the models have been processed.

Now, imagine that you had other worker threads also grabbing models from the queue and processing them, and that the main thread will wait to continue the main loop until the queue was empty and all models had been processed.

TA-DA! You are done and have multithreaded animation blending. It can help A LOT, depending on how many hardware threads you have available for helping work.

Bias / Gain Curves in Anim Blends

With normal animation blending, it’s a linear crossfade from one animation to another.

Sometimes, an animator can make things look nicer if they have the option of doing non linear crossfading.   One nice option for doing this is exposing a bias and gain parameter to the blend in / out parameters.

Bias and gain are great ways of letting content creators create non linear curves for a variety of uses. ¬†Ken Perlin did a lot of work in this area, but in “Game Programming Gems 2”, a guy named Cristophe Schlick presented some simplified, quick equations to calculate¬†approximations¬†of bias and gain.

I highly recommend checking that out and using them for this, and everything else in your game. Using bias and gain you can do things like have your camera move from point A to point B, but start out fast and slow down as it gets closer to B, giving it a nice organic feel to it, instead of a rigid lerp.  With bias and gain you pass in a % and get out a different %.  Real simple to use and extremely useful in every part of your game just about.

Here’s an interactive demonstration of the bias/gain functions I made. The source code for the functions are there too:
HTML5 Bias and Gain

Round Robin Anim Evaluation

There are some situations when you don’t need every model to have perfectly up to date animation data every single frame. One example of this is if you are simulating the game world on a server, where skeletal animation data doesn’t need to be perfectly up to date since network latency already makes it somewhat innacurate.

In these cases, one thing you could do is split the list of models you need to update into perhaps 4 different lists. Then, each frame, you only process one of the 4 lists, thus reducing your animation CPU load down to 25% of what it was. Quick and easy way to save some real CPU time quickly if you don’t need the most up to date animation data all the time.

Pose Sharing

Sometimes you have a lot of different models where many of the models are preforming the same animations – such as if you have a crowd of people in a crowded area.

One way to deal with this is to let some of the people doing the same animations SHARE their computed animation data.

If you are in a crowd, and there’s lots of different looking people walking all sorts of different directions, you aren’t going to easily notice that there are people who are using the exact same bone data, but facing different directions.

Going this route, if you have a group of 4 let’s say that all share the same bone data, you only need to calculate it for one person, and the rest of the group uses the data already calculated.

Less animations to sample and blend so you gain some CPU back.

Skeleton LODing

As things get farther away, or smaller, the smaller details are less noticeable. Because of this, you can “remove” bones from a skeleton as a model is farther away. I mentioned this briefly with facial animations, but the same is true of arm bones, leg bones, hand bones, etc.

You just have to make sure your anim system is able to handle LODing out bones gracefully (no popping) and efficiently (no excessive processing to get a lower LOD skeleton, it should just be a flag on the bones or something).

Runtime Debugging Essentials

Here are some debugging tools that I’ve found essential in debugging day to day animation bugs (popping, twitching, incorrect animations, etc).

Real Time Info On Screen

You really need the ability to show some kind of status on screen for a specified model. The info should show what animations are playing on which animation controllers, the current time of the animation controller, the playback rate of the controller, the state of the state machine, etc.

Using this, when you see a pop, you might see that for a fraction of a second, that an animation switches from one animation to another, then back to the first. From there you can go on debugging it further.

Timeline Log

Sometimes it’s useful to be able to turn on animation logging for a specified model. This way, you can generally log more info than you can on the screen in real time, and can also take your sweet time looking at very small intervals of time to see what went wrong and why.

Very useful.

Show the Bones

Sometimes you really just need to be able to look at the skeleton to see an issue more clearly, or be able to determine if the problem is with a model or the animation data.

Having a way to turn on bone rendering such that it draws 2d (unprojected) lines on the screen showing the bones of a specified model is very useful. Also sometimes it’s nice to be able to see the bones of all the animation data that went into the final blended pose, instead of just seeing the final blended pose.

Control Time Itself!

Lastly, sometimes it’s really useful to be able to slow down time to see a problem in greater detail. Rarely, it’s also useful to be able to speed up time. Having the ability to do both while the game running can be a really big help.

That’s All She Wrote

That, and MURDER I mean.

I hope you enjoyed these articles on the anatomy of a skeletal animation system. Drop me a line or post a comment if you have any questions or comments (:

Comments

comments