The Black Art of sizeof() aka Compile Time Type Deduction

Over the past couple decades of C++ programming, I’ve found sizeof useful, if relatively boring, and have found it handy for making things like memcpy and fread usable in a safe way. Little did I know that there were dark secrets lurking just below the surface, and that to truly understand sizeof is to dangle precariously above the black pit of infinity and see the secrets of creation laid bare.

I might be dramatizing things a little bit, but not by much (;

Sizeof Basics

Let’s start with the basics before working to the really interesting stuff:

#include 
void main(void)
{
	printf("size = %i",sizeof(int));
}

Running the above gives the output “size = 4”.

Luckily, instead of always having to give a type as a parameter to sizeof, we can also give a variable name. This saves us typing (and reduces potential bugs) when we change the type of a variable. For instance the below gives the same output.

#include 
void main(void)
{
	int index = 3;
	printf("size = %i",sizeof(index));
}

What’s great about the above is you can change what type index is defined as, and you don’t have to update the sizeof call. It knows what type index is and will always print out the right size for you. The less things you have to do manually or remember, the easier it is to maintain code and the less bugs you are likely to have.

Ok next up… sizeof() happens at compile time. It isn’t a function call, it does it’s work at compile time and has no runtime cost. This code below proves that and gives the same output:

#include 

int index = 3;

enum
{
	c_sizeOfIndex = sizeof(index)
};

void main(void)
{
	printf("size = %i",c_sizeOfIndex);
}

A Little More Interesting

Did you know that you can “call functions” inside of sizeof? Check out the below which gives the same output again:

#include 

int myFunc()
{
	return 3;
}

enum
{
	c_sizeOf = sizeof(myFunc())
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}

But wait a second… if sizeof happens at compile time, how is it able to call a function? Does it evaluate the function at compile time somehow? Does it just do a sizeof on the function pointer?

What happens is it doesn’t actually evaluate anything that you pass to sizeof. it just looks at what’s inside, figures out what type it evaluates to, and gives you the size of that resulting type.

In the example above, it’s just looking at the return type of myFunc and giving you the sizeof that. That’s all 😛

To prove that it doesn’t actually execute the function, check this program out, which gives the same output and does not crash!

#include 

int myFunc()
{
	// this is safe, right?
	((int *)0)[0] = 0;

	// this is too right?
	int a = 0;
	int b = 3;
	int c = b / a;

	// this text never gets printed out
	printf("sup dawg, i heard you liked...");

	// lastly, infinite recursion..  i feel like i'm trying to kill rasputin...
	return 3 + myFunc();
}

enum
{
	c_sizeOf = sizeof(myFunc())
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}

here’s one more interesting example, to show that sizeof has all the power that the compiler has to figure out data types at compile time, without actually evaluating any of the code. Running the below, you can see by observation that the result of what’s inside sizeof is a bool, and when you run the program, it reports sizeof(bool) which is 1 for me on my machine.

#include 

int myFunc(int someNumber)
{
	// this is safe, right?
	((int *)0)[0] = 0;

	// this is too right?
	int a = 0;
	int b = 3;
	int c = b / a;

	// this text never gets printed out
	printf("sup dawg, i heard you liked...");

	// lastly, infinite recursion..  i feel like i'm trying to kill rasputin...
	return someNumber + myFunc(someNumber * 2);
}

int someGlobalVariable = 3;

enum
{
	c_sizeOf = sizeof((myFunc(someGlobalVariable) * someGlobalVariable << 1) == 0)
};

void main(void)
{
	printf("size = %i",c_sizeOf);
}

So basically, we now know that we can pass any arbitrarily complex expression to sizeof – even ones that call functions and use variables – and the compiler will figure out the resulting type and give us the result of that.

True Power Revealed

The real power with sizeof comes in when we start using function overloads. If we have 2 versions of the same function, we can pass some parameters to that function, and the compiler will figure out which function is the best match. If each function has differently sized return types, we can use sizeof to know which one the compiler chose for any given parameters. Even better… we can know this at compile time with absolutely no run time cost.

Check out what I mean:

#include 

int someFunc(int someNumber);
char someFunc(const char *someString);

int myNumber = 3;
const char *myString = "hello!";

enum
{
	c_size1 = sizeof(someFunc(myNumber)),
	c_size2 = sizeof(someFunc(myString))
};

void main(void)
{
	printf("sizes = %i,  %i", c_size1, c_size2);
}

If you run that program, it prints out “sizes = 4, 1”. We now have a program that can tell at compile time whether a value (and even a variable) is a string or an int. Note that we don’t actually need to define bodies for our functions, because they aren’t ever actually called.

What if we wanted to just make it be able to tell us if it was a string or not, and we didn’t want to have to make a separate function (and return type) for each possible type to be able to tell it whether it was a string or not? Luckily we can very easily.

When the compiler tries to figure out the best matching function for a given function call, if there is a variable argument function that is a possibility (a function that has … as it’s parameter list), that function will always be least wanted. It’s what you get if nothing else matches. In our situation we’re going to use that kind of a function effectively as an “else” statement. Check it out:

#include 

char stringTester(const char *someString);
int  stringTester(...);

int myNumber = 3;
const char *myString = "hello!";

void main(void)
{
	if (sizeof(stringTester(myNumber)) == sizeof(char))
		printf("myNumber is a stringrn");
	else
		printf("myNumber is not a stringrn");

	if (sizeof(stringTester(myString)) == sizeof(char))
		printf("myString is a stringrn");
	else
		printf("myString is not a stringrn");
}

Pretty neat right? Running that program will tell you that myNumber is not a string, but myString is a string, just like you already know.

Even though we are using if statements above, it’s still a compile time check and will have no runtime costs, and the if statements should be completely eaten away by the optimizer (check the assembly to see for yourself!)

Lets clean up that last bit of code a bit to be a bit more generalized:

#include 

char stringTester(const char *someString);
int  stringTester(...);

#define REPORT_IF_STRING(x) 
	if (sizeof(stringTester(x)) == sizeof(char)) 
		printf(#x " is a stringrn"); 
	else 
		printf(#x " is not a stringrn"); 

int myNumber = 3;
const char *myString = "hello!";

void main(void)
{
	REPORT_IF_STRING(myNumber);
	REPORT_IF_STRING(myString);
}

That gives the exact same output, but as you can see, is a bit more generalized… something you could re-use in code easily.

If you look at the final compiled code (in release), you’ll basically just see 2 printf statements, because the rest of it all happened at compile time.

Working with only testing if a variable is a string or not a string, and printing that result is not exactly something you are likely to need on a daily basis, but there are a lot more useful examples. For instance, what if you wanted to be able to tell whether one object’s type could safely be cast to another object type?

You might be asking “Isn’t that what RTTI and dynamic_cast are for?”. Yep. But what if you could determine this at compile time, so that there was no cost to the dynamic cast, and the code was optimized to the point of not even having an if statement to check the type… it just did the right thing at runtime because it already knew. You’d end up with some pretty fast code!

Here’s how you might do that for a specific class hierarchy:

#include 

class CFruit {};
class CApple : public CFruit {};
class CBanana: public CFruit {};
class CPeach : public CFruit {};

class CVegetable {};
class CCarrot: public CVegetable {};
class CCelery: public CVegetable {};

char FruitTester(const CFruit &);
int  FruitTester(...);

#define REPORT_IS_FRUIT(x) 
	if (sizeof(FruitTester(x)) == sizeof(char)) 
		printf(#x " is a fruitrn"); 
	else 
		printf(#x " is not a fruitrn"); 

CFruit  fruit;
CApple	apple;
CBanana banana;
CPeach  peach;

CVegetable vegetable;
CCarrot    carrot;
CCelery    celery;

void main(void)
{
	REPORT_IS_FRUIT(fruit);
	REPORT_IS_FRUIT(apple);
	REPORT_IS_FRUIT(banana);
	REPORT_IS_FRUIT(peach);
	REPORT_IS_FRUIT(vegetable);
	REPORT_IS_FRUIT(carrot);
	REPORT_IS_FRUIT(celery);
}

Running that program will give you output like the below:

fruittest

Again, if you look at the compiled code in release, you’ll see what is effectively just a series of printf statements, because all the checking of types and inheritance happened at compile time and there was no runtime cost. Check out the disassembly below. It’s just pushing the address of the strings to show onto the stack, and then calling printf to show the string. No if statements, no jumps, nothing at all at runtime other than printing the right string:

disassembly

There is a way to generalize that so that you pass the type in that you want to test against, or other random variations for how it might function in a general case, but I leave that to you to figure out!

Not Quite a Dynamic Cast

Ok so the above is not quite a dynamic cast. This is an upcast and gives you the ability to tell if an object derives from a specific type, but it can’t work in the opposite direction which is what a dynamic cast can do.

What the above is good for is mainly for situations where you don’t know the object type because it came in as a macro parameter, or as a template parameter, but you still want to do specific logic on it if it’s a certain type, or derives from a certain type. That’s where this stuff comes in useful.

There may be a clever way to do a downcast (dynamic cast) using this functionality, but at this point in time, I’m not sure of how you might do that 😛

Other Uses

Looking online I’ve found some really interesting uses of this stuff beyond what I’ve shown you. Some people have mixed in a little other forms of black magic including templates, macros and template argument deduction and have done things as wild as being able to tell if an object has a specifically named function or member variable.

I personally feel like this trick of using sizeof to do compile time type deduction must have a million uses, but that my brain isn’t used to thinking in this way so it’ll probably be a little time before using it becomes more natural (whenever it’s appropriate that is hehe).

Hopefully you guys enjoyed learning about this as much as I did… soon I’ll be writing about another neat trick I came across called SFINAE. Give it a google if you want… it’s another pretty odd thing and I look forward to digging into it to be able to write up another post with some everyday useful examples.

Until next time!


5 comments

  1. This seems like how type inference was done before C++. I’d imagine template specializations would provide the same functionality in a safer/more robust manner.

    Like

    • They’re not exclusive but to me it seems like mixing OOP with goto.

      Check the code sample I have in your Permutation Programming Without Maintenance Nightmares article. I’m guessing that the understanding sfinae author wasn’t aware of the stl’s true types, decltype, etc (or his compiler didn’t support it in 2009).

      Don’t get me wrong, sizeof is a nifty trick but with a large part of C++11 being devoted towards type inference it seems like the old standard (meaning reliable, semi-commonly known and arcane). But if you’re going to start using C++11 features like sfinae then you might as well use the other C++11 features as they’re designed to work together.

      Like

      • You might be right poday… I gotta watch more of those videos you posted about this stuff. When i learn more, if i realize you are right ill come back and edit these posts or note it in them (:

        Like

  2. Poday, I’m starting to see what you are talking about.

    The stuff I’ve seen and talked about so far uses the sizeof trick to determine what function was chosen or which template was chosen due to SFINAE.

    The stuff you are talking about makes it so you just use template specialization instead of if(sizeof(foo)==sizeof(bar)) to branch.

    I’m coming around, that does seem a lot cleaner (:

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s