Soft Maximum vs Hard Maximum

The other day i stumbled on an interesting concept called a “Soft Maximum”.

If you think of the normal maximum, you might have something like this:

float maxValue = max(valueA, valueB);

if valueA and valueB come from functions, there’s usually going to be a sharp bend in the graph of the above where the maximum value changes from valueA to valueB or vice versa.

Sometimes, instead of a sharp bend, you would like a smooth transition between the two values – like when using this for graphics or advanced mathematics.

Here’s the formula for soft max:

double SoftMaximum(double x, double y)
	double maximum = max(x, y);
	double minimum = min(x, y);
	return maximum + log( 1.0 + exp(minimum - maximum) );

Here are 2 really interesting links on computing and using soft max:

Soft Maximum

How to Compute the Soft Maximum

Check out the images below for an example of when you might use this. This is from a shadertoy shader The Popular Shader. The first image is with using normal max, and the second image uses soft max.



Converting RGB to Grayscale

If you were converting an RGB pixel to grayscale, you might be like me and be tempted to just add the red, green and blue components together and divide by 3 to get the grayscale equivalent of the color.

That’s close, but not quite correct!

Red, green and blue are not equal brightness, so doing a straight average gives you biased results.

There’s a wikipedia page on this topic here, but the equation to use is below:
grayScale = red * 0.3f + green * 0.59f + blue * 0.11f;

Here are some sample images to show you the difference.



Weighted Average Equation:


You might be wondering “why the heck would i want to convert RGB to grayscale?”

Well… if you render a scene once, convert it to grayscale and shove it into the red channel, then render the scene again slightly offset to the side, convert that to grayscale and shove it into the blue channel, you can get some neat images like the below. Red/Blue 3d glasses required, click the images to view the full size versions (;



Converting To and From Polar / Spherical Coordinates Made Easy

As a game developer there is just too much darn stuff to learn. You could spend your entire life learning things and never know it all.

It’s a blessing in that you are seldom bored, but also sometimes a curse in that there almost always is a better way to do something, and that you would know about it if you had spent your time learning X instead of Y ūüėõ

I find that you sort of have to triage what you learn and what you choose to keep fresh in your mind, which can be a challenge sometimes. If you can find the commonalities between things that can help some – like understanding how encryption, hashing, pseudo random number generators and chaos theory all overlap – or how skeletal animation blending and audio synthesis are both trying to be continuous waves above all else. Also, if you put in the time investment to learn something to where it becomes intuitive, that frees up neurons to make room for other stuff. But, of course, we have a finite amount of time, so can’t always spend the time needed to get to that level on every single topic.

How about you… do you have to do a juggling act like this to keep sharp and stay effective as a game (or non-game) programmer? I’d be interested to hear how others deal with this sort of thing with such a large knowledge space that we work in.

In any case, I usually work with spherical or polar coordinates only on rare occasions, so whenever i do, the process usually is to google the equations, drop them in, and move on with my life. I was recently implementing an orbit camera for a raytracer on (Raytraced Refraction) and when my copy/pasting wasn’t working, I was forced to take a deeper look at why it wasn’t working. Amazingly, this time around, it finally clicked and is now an intuitive thing so I figured I’d share the explanation that makes most sense to me in case it helps anyone else.

Converting Polar Coordinates to Cartesian (2D)

Polar coordinates have two components – a distance and an angle – and represent a point in 2d space.

The distance is called the radial coordinate, or the radius and the angle is called the angular coordinate or polar angle.

Just like you probably expect, the angle defines what direction the point is in, and the radius defines how far away it is. Super simple. Below is a picture of a polar coordinate point at (3, 45) where 3 is the distance and 45 is the angle.


So how do we convert that to rectangular coordinates? well, first thing to do is to convert the angle to rectangular coordinates on a unit circle to get a direction vector. Then, you multiply that direction vector by the radius to get the final coordinate.

To convert the angle to a point on a unit circle and get the direction vector it’s super simple…

X = cos(angle)
Y = sin(angle)

For every point on the unit circle, it’s X coordinate is the cosine of the angle, and it’s Y coordinate is the sine of the angle.

Looking at the diagram below, see if you can figure out why arccosine only returns an angle between 0 and 180, and why arcsine only returns an angle between -90 and 90 (hint, what if i asked you to tell me what angle gives 0.7 in the x component). Also see if you can understand why sin(x)^2 + cos(x)^2 = 1 (hint: distance formula).


Ok so now that we can get our direction vector, we just need to multiply it by the radius. So… to convert from polar to rectangular (cartesian) coordinates, you do this:

X = cos(angle) * radius
Y = sin(angle) * radius

Converting Cartesian to Polar Coordinates (2D)

So how do we convert from rectangular coordinates to polar?

Well, since we have the X and the Y coordinates, and we know that tangent(angle) = Y / X, we can use arctangent to get our angle. Unfortunately atan has a similar problem to asin and acos in that it doesn’t know which quadrant you are talking about. For instance, look at the diagram above again and tell me “which angle gives me a value of 1 when i divide the Y component by the X component?”. The answer is 45 degrees and 225 degrees both. This is because with a positive value of 1, we don’t know if X and Y were both negative, or if they were both positive. Similarly, if i asked which angle gave an answer of -1, we wouldn’t know if it was the X or Y that was negative.

Instead of using atan, we want to use atan2, which takes 2 parameters – Y and X – so that it can figure out the correct angle for you.

Next is the easy part of finding the radius. treating your point as a vector (or continuing to treat it like a vector if it IS a vector), the radius is just the magnitude of the vector (or distance from the origin if you want to think of it in “point” terms instead of vectors).

So, converting rectangular to polar coordinates is done like this:

radius = sqrt(X * X + Y * Y)
angle = atan2(Y, X)

Converting Spherical Coordinates to Cartesian (3D)

Spherical coordinates have the same components as polar coordinates, but then an added component: an angle which determines pitch / vertical rotation (think: looking up and looking down, instead of the polar angle which is in charge of looking left and right).

In math, they usually call the radius rho, the polar angle theta, and the azimuth angle phi, so a formal polar coordinate looks like this:

(rho, theta, phi)

For our examples let’s assume that X and Y make up the horizontal plane and that Z is the vertical (3d) axis.

If you are scared to make the jump from 2D polar coordinates to 3D spherical coordinates don’t be! The way to deal with these guys is to break the 3d problem into two 2d problems, using the exact same stuff as described above.

So, the first thing we want to do is completely ignore this new 3rd component phi and think back to our 2d case. We are also going to ignore the radius for now.

XTheta = cos(theta)
YTheta = sin(theta)

This is our direction vector on the horizontal plane (same as the 2d case, not accounting for radius yet).

Next we want to pretend like we are looking at our 3d world from the side and use our phi angle to convert from polar to rectangular coordinates:

XPhi = cos(phi)
YPhi = sin(phi)

One way to think of what this other angle phi means, is that it is controlling where in the unit sphere the theta circle sits. As the theta circle gets higher or lower on the sphere, it shrinks or grows. It’s only at zero angle that it has a radius of 1.0. So, calculating these values, YPhi represents how high on the sphere the theta circle should sit, and XPhi is how large the circle should be.

So, to combine the X,Y theta values and the X,Y Phi values, we use YPhi as the vertical component, and XPhi as a radius for the theta circle, which we can do like this:

X = XTheta * XPhi
Y = YTheta * XPhi
Z = YPhi

The above equation will give us a point on a unit sphere, so from here, we need to multiply in the radius and our equation becomes:

X = XTheta * XPhi * radius
Y = YTheta * XPhi * radius
Z = YPhi * radius

If we put sin and cos back in, instead of xtheta (etc), the equation becomes that familiar, and previously complex equation:

X = cos(theta) * cos(phi) * radius
Y = sin(theta) * cos(phi) * radius
Z = sin(phi) * radius

Hopefully the equation makes more sense now, and hopefully you can look at that and intuitively understand why those values are what they are.

Converting Cartesian to Spherical Coordinates (3D)

To convert from spherical coordinates to rectangular, the first thing to do is to get the radius, which is done in the exact same way as in 2d. We just take the magnitude of the vector (aka the distance of the point from the origion) and we are done.

To get theta and phi, we do the same thing of separating this 3d problem into two 2d problems.

In fact, to get theta, we do the exact same thing as we do for polar coordinates! We use atan2(Y,X) to get our angle… THAT’S ALL!

So, we have this so far:

radius = sqrt(X * X + Y * Y + Z * Z)
theta = atan2(Y, X)

How do we figure out phi? Well, if you said that we should do atan2(Z,Y) or atan2(Z,X) you were pretty close but it’s actually arccos(Z / radius).

The reason for this is because neither X, nor Y is the “X” component of the “phi 2d polar coordinate”. you’d have to take the length of the (X,Y) vector and use that if you wanted to use atan2 to calculate phi. Instead of calculating that vector length, we can instead use a value we already have. cosine is Y / hypotenuse length, and hypotenuse length is the radius (length of our vector), so we might as well use that radius we already have to be able to use arccos.

The final equations for converting rectangular to spherical are:

radius = sqrt(X * X + Y * Y + Z * Z)
theta = atan2(Y, X)
phi = acos(Z / radius)

More info / alternate forms available on wikipedia here:
From Cartesian to Spherical Coordinates
Spherical Coordinate System

How to Render the Mandelbrot Set

mandelbrot set

The Mandelbrot set is a beautiful creation of mathematics discovered by a French-American mathematician named Benoit Mandelbrot. it is also a fractal, meaning that it’s infinitely detailed and that it’s self-similar and made up of smaller versions of itself.

Wikipedia does a better job of explaining the history and the background more than I do so please check out this link for more info!

Mandelbrot Set

I also wrote an HTML5 powered mandelbrot set viewer that you can use to explore the fractal and manipulate colors to create your own mathematical works of art.

HTML5 Mandelbrot Explorer

In this article I’m going to explain how to render a Mandelbrot set yourself. ¬†It’s going to be from a programming slant more than a mathematical slant so if you want the raw unadulterated math, I¬†recommend¬†checking out¬†Wikipedia¬†or other sources.

Rendering the Mandelbrot Set

The first step is to have some way to draw individual pixels on a canvas of some sort. It doesn’t matter what method you go with, it just matters that you are able to draw pixels somehow.

Various ways include:

  • direct pixel accesss in HTML5 (what my Mandelbrot Explorer uses)
  • drawing pixels into an image file
  • using DirectX or OpenGL to render it to a 2d screen buffer.
  • using graph paper, a calculator and some colored pencils to create it by hand (Possible, but ouch! Send me a picture if you actually do this! hehe)


Now that you have a rectangle that you are able to render pixels to, we need to define a viewport.

In the Mandelbrot Set image at the top of this article, my rectangle is about 500×500 pixels big, and the x axis ranges from -2.5 to 2.5 and the y axis also ranges from -2.5 to 2.5.

I like to define my viewport in terms of the center point, and the width and height, so our viewports parameters are a center point of (0,0) and width and height of 5.

We have now established the parameters of our viewport!

We now need to iterate through each pixel in our rectangle and do the following steps for each…

Pixel Space to Viewport Space

The first thing we need to do is convert from pixel coordinates to viewport coordinates.

How we do that is like this:
ViewportX = ViewportMinX + (PixelX / PixelWidth) * ViewportWidth
ViewportY = ViewportMinY + (PixelY / PixelHeight) * ViewportHeight

After we have converted our pixel’s location from pixel space to viewport space, we are ready to do some math.

The Magical Mandelbrot Function

Like I mentioned earlier, the Mandelbrot set is a work of mathematical art. ¬†The function itself isn’t very complex but it involves imaginary numbers. ¬†To calculate the Mandelbrot set itself, you plug the viewport location of the pixel into the function. ¬†After that, you take the output of the function and plug it back into the input of the function. ¬†You continue this until the output of the function goes above some value (the common value to use is 2. ¬†You’ll see me compare against 4 because I’m comparing squared numbers). ¬†If the output goes above the threshold, it has essentially “escaped”.

There are pixels which may take a very very very large amount of iterations to escape, and as far as I know, they haven’t proven that all values will escape (I could be wrong though), so besides waiting for the function to “escape”, you should also set a maximum iteration count to keep it from iterating forever (or for a very long time).

The number of iterations it took to escape is what you use to set the pixel color for that pixel.

Here is some code (pseudo javascript) to show you the details of that process:

var g_maxIterations = 255; //TODO: set this to however many iterations you want to allow maximum
var currentX;  //TODO: need to set this to the viewport X location of the current pixel
var currentY;  //TODO: need to set this to the viewport Y location of the current pixel

var z = 0;
var zi = 0;
var inset = true;
var numInterations = 0;

var newz;
var newzi;

for(indexIter=0; indexIter 4)
		inset = false;
		numInterations = indexIter;
		indexIter = g_maxIterations;

if (inset)
	//we never escaped, this pixel is the default color
	//TODO: render a default color pixel
	//we escaped!  numInterations is how many iterations it took
	//TODO: convert iterations to a color and render the pixel

Colorizing The Pixel

There are several ways you could turn an iteration count into a color for a pixel. There are some ways listed below, but this is definitely not an exhaustive list! Play around with your own techniques and see what sort of interesting things you can create!

  • Make a maximum iteration time of 255. ¬†Make your output image be an 8bit greyscale (or color palleted) image, taking the iteration count and writing that out raw as the output color.
  • Make several ranges of iteration values (for instance… 0-255, 256-511, 512-767, etc) where you define a full RGB color at each edge of the value ranges. ¬†From there, figure out where your iteration count falls within the value ranges, and do a lerp between the color to the left and the color to the right based on your distance in the specific value range. ¬†This way, you have smoothly blending color gradients and can go well beyond 255 maximum iterations. ¬†I use a variation of this in my HTML5 Mandelbrot Explorer.
  • Use arbitrary math functions to figure out the RGB of each pixel. ¬†Such as R = Iterations % 256, G = (Iterations * 3) % 256, B = (Iterations * 7 + 39) % 256.

After you have the color for your pixel, you are done. Render that pixel, then move to the next until you have rendered them all.

Zooming and Panning (Scrolling)

By virtue of setting up a viewport as a centerpoint and a width and height, and making the code use that information to convert from pixel space to viewport space, we have made it really simple to implement zooming and panning.

To zoom in, just set your viewport width and height to be smaller (like divide them by 2 for instance). ¬† To zoom out, set your viewport width and height to be larger. ¬†You want to make sure and keep the same aspect ratio (width / height) in your viewport as your rendering rectangle to avoid distortion though, so be careful. ¬† OR, you may want the distortion… it’s up to you (:

To pan the screen, or scroll it left, right, up or down, you just change the centerpoint to be more to the left, the right, higher, or lower.

Very simple, but it is really fun to scroll around and zoom in and out on the fractal to discover new and interesting features to share with your friends.

If you zoom in far enough, you might notice that at some point you have vertical or horizontal lines instead of the fractal shape, and that if you zoom in a little bit more, you’ll get a solid color.

You might ask yourself “Hey, I thought he said fractals were infinitely detailed?”

Well they are infinitely detailed, and in theory you could zoom in FOREVER and always see more and more things, but computers themselves don’t have infinite precision (it would take a computer infinitely large to let you zoom in infinitely), and you are just seeing the edge of the precision of your computer.

If you are using a language like C++, you can change your code to use doubles instead of floats to get a little more breathing room, or another option is to use a scientific mathematics library that is capable of a lot more precision than floats or doubles.

That’s All!

That’s really all there is to it, not that complex is it?

As I keep mentioning, I made something that allows you to explore the Mandelbrot Set in your browser.  You can find that here: Mandelbrot Explorer

Anyways, if you have any questions or comments or want to share some screenshots of creations you made, drop me a line or leave a comment in the comments section with a link to your creation for other people to check out too!

Anatomy of a Skeletal Animation System Part 3

This is part three of “Anatomy of a Skeletal Animation System”

Animation System Optimizations and Features

Here are some various animation system optimizations and techniques that you might find useful…

Multithreaded Animation Blending

If you are even mildly comfortable writing multithreaded code, this one is fairly easy to implement.

Basically every animated model that needs an update goes into a queue every frame. ¬†(Things that haven’t been on screen for a little while could be exempt from the list so you don’t waste time on things that aren’t being rendered)

At some point in your main loop, you do the animation sampling / anim blend tree blending / etc work to come up with the final bone group. You do this by grabbing the first model in the queue, processing it, then moving to the next model.

Your main loop doesn’t continue until all of the models have been processed.

Now, imagine that you had other worker threads also grabbing models from the queue and processing them, and that the main thread will wait to continue the main loop until the queue was empty and all models had been processed.

TA-DA! You are done and have multithreaded animation blending. It can help A LOT, depending on how many hardware threads you have available for helping work.

Bias / Gain Curves in Anim Blends

With normal animation blending, it’s a linear crossfade from one animation to another.

Sometimes, an animator can make things look nicer if they have the option of doing non linear crossfading.   One nice option for doing this is exposing a bias and gain parameter to the blend in / out parameters.

Bias and gain are great ways of letting content creators create non linear curves for a variety of uses. ¬†Ken Perlin did a lot of work in this area, but in “Game Programming Gems 2”, a guy named Cristophe Schlick presented some simplified, quick equations to calculate¬†approximations¬†of bias and gain.

I highly recommend checking that out and using them for this, and everything else in your game. Using bias and gain you can do things like have your camera move from point A to point B, but start out fast and slow down as it gets closer to B, giving it a nice organic feel to it, instead of a rigid lerp.  With bias and gain you pass in a % and get out a different %.  Real simple to use and extremely useful in every part of your game just about.

Here’s an interactive demonstration of the bias/gain functions I made. The source code for the functions are there too:
HTML5 Bias and Gain

Round Robin Anim Evaluation

There are some situations when you don’t need every model to have perfectly up to date animation data every single frame. One example of this is if you are simulating the game world on a server, where skeletal animation data doesn’t need to be perfectly up to date since network latency already makes it somewhat innacurate.

In these cases, one thing you could do is split the list of models you need to update into perhaps 4 different lists. Then, each frame, you only process one of the 4 lists, thus reducing your animation CPU load down to 25% of what it was. Quick and easy way to save some real CPU time quickly if you don’t need the most up to date animation data all the time.

Pose Sharing

Sometimes you have a lot of different models where many of the models are preforming the same animations – such as if you have a crowd of people in a crowded area.

One way to deal with this is to let some of the people doing the same animations SHARE their computed animation data.

If you are in a crowd, and there’s lots of different looking people walking all sorts of different directions, you aren’t going to easily notice that there are people who are using the exact same bone data, but facing different directions.

Going this route, if you have a group of 4 let’s say that all share the same bone data, you only need to calculate it for one person, and the rest of the group uses the data already calculated.

Less animations to sample and blend so you gain some CPU back.

Skeleton LODing

As things get farther away, or smaller, the smaller details are less noticeable. Because of this, you can “remove” bones from a skeleton as a model is farther away. I mentioned this briefly with facial animations, but the same is true of arm bones, leg bones, hand bones, etc.

You just have to make sure your anim system is able to handle LODing out bones gracefully (no popping) and efficiently (no excessive processing to get a lower LOD skeleton, it should just be a flag on the bones or something).

Runtime Debugging Essentials

Here are some debugging tools that I’ve found essential in debugging day to day animation bugs (popping, twitching, incorrect animations, etc).

Real Time Info On Screen

You really need the ability to show some kind of status on screen for a specified model. The info should show what animations are playing on which animation controllers, the current time of the animation controller, the playback rate of the controller, the state of the state machine, etc.

Using this, when you see a pop, you might see that for a fraction of a second, that an animation switches from one animation to another, then back to the first. From there you can go on debugging it further.

Timeline Log

Sometimes it’s useful to be able to turn on animation logging for a specified model. This way, you can generally log more info than you can on the screen in real time, and can also take your sweet time looking at very small intervals of time to see what went wrong and why.

Very useful.

Show the Bones

Sometimes you really just need to be able to look at the skeleton to see an issue more clearly, or be able to determine if the problem is with a model or the animation data.

Having a way to turn on bone rendering such that it draws 2d (unprojected) lines on the screen showing the bones of a specified model is very useful. Also sometimes it’s nice to be able to see the bones of all the animation data that went into the final blended pose, instead of just seeing the final blended pose.

Control Time Itself!

Lastly, sometimes it’s really useful to be able to slow down time to see a problem in greater detail. Rarely, it’s also useful to be able to speed up time. Having the ability to do both while the game running can be a really big help.

That’s All She Wrote

That, and MURDER I mean.

I hope you enjoyed these articles on the anatomy of a skeletal animation system. Drop me a line or post a comment if you have any questions or comments (:

Anatomy of a Skeletal Animation System Part 2

This is part two of “Anatomy of a Skeletal Animation System”

Animation Controller v3 – Bone Groups

In part 1, we talked about how to make a skeletal animation system that was able to play smooth, non popping animations on a model, it could communicate back to the engine to play sound effects, spawn objects in specific spots, and many other things as well.  What it could not do however, was play a different animation on the upper body and lower body.

To solve this, instead of having a single animation controller for our model, we need to have multiple animation controllers, where each controller controls a specific set of bones. ¬†Note that multiple controllers should be able to affect the same set of bones, and in the end result, a bone’s position is made up by blending the data from all animation controllers that affect it.

Each animation controller should have a blend weight so that it can be blended in and out to keep animation motion smooth and continuous, and also the blend weighting allows you to turn on and off specific animation controllers as needed.

Some great example uses for this are…

  • Having a seperate animation controller for the upper and lower body so that they can work¬†independently (the lower body can look like it’s jumping, without having to care if the upper body is firing a gun or not).
  • Having a seperate full body animation controller that affects all bones. ¬†In most situations, this animation controller would be off, but in the rare cases that you want to play a full body animation, you turn this one on and play an animation on it.
  • Having a facial animation anim controller that only turns on if the camera is close enough to a characters’s face. ¬†This way, if you look closely at another player, you can see their face moving, but if you are far away from them, the game engine doesn’t bother animating the facial bones since you can’t see them very well anyways.

The order that these animation controllers are evaluated should be explicit (instead of left up to load order or things like that).  You want to be very clear about which animation controllers over-ride which other animation controllers for the case of having multiple on at the same time, affecting the same bones.

For the sake of efficiency, when trying to blend the animation data together from each animation controller that affects that bone, you should start at the last fully weight (100% weight) anim controller in the anim controller list. ¬†This way, you don’t bother evaluating animations for anim controllers that are just going to be completely masked out by other animation controllers.

If there is no full weight anim controller in the list that affects the specific bone, initialize the bone data to the “T-Pose” animation position before blending the other anim controller bone data on top of it.

We now have a very robust animation system, but it isn’t quite there yet. ¬†Interacting with this animation system from game code means you having to tell specific game controllers when to play specific animations. ¬† This is quite cumbersome and not very maintainable. ¬†Ideally, the animation logic would be¬†separated¬†from the game play logic. Besides making the code more maintainable, this means that non animation programmers will be able to write game play code that interacts with the animation system which is a big win for everyone. Fewer development bottlenecks.

Animation Selection

There are two good techniques i’ve seen for separating the logic and preforming animation selection for you.

The first way is via “animation properties” and the second way is by using an animation state machine. There are pros and cons to each.

Animation Properties

For the animation properties method, you essentially have a list of enums that describe the player’s state. ¬†These enums include things such as being able to say whether the player is crouched or standing, whether the player is unarmed, holding a pistol, or holding a rifle, or even how injured the player is (not injured, somewhat injured, or near death).

The game play code would be in charge of making sure these enums were set to the right values, and the animation controller(s) would use these values to determine the appropriate animations to play.

For instance, the game code may set the enum values to this:

  • WeaponType = Rifle (vs Unarmed, Pistol, etc)
  • WeaponAction = Idle (vs Firing, Reloading, etc)
  • PlayerHealth = NearDeath (vs healthy, injured, etc)
  • MovementType = WalkForward¬†(vs Idle, Running, LungeRight, etc)

From here, the animation system takes over.

The lower body animation controller perhaps only cares about “MovementType” and “PlayerHealth”. ¬†It notices that the player is walking forward (WalkForward) and that they have very low health (NearDeath). ¬†From this, it uses a table that animators created in advance that says for this combination of animation properties, the lower body animation controller should play the “WalkNearDeathFwd” animation. ¬†So, the lower body animation controller obliges and plays that animation for the lower body bones.

The upper body animation controller perhaps just cares about WeaponAction, WeaponType and PlayerHealth. ¬†It notices that the player has a rifle, they aren’t shooting it, and they have very low health. ¬†From this, the upper body animation controller looks into it’s animation properties table and sees that it should play the “RifleIdleInjured” animation, so it plays that animation on the upper body bones.

The logic of game play and animation are completely seperate, and the animators have a lot of control over what animations to play in which situations.

Once again, you’d want an editor of some sort for animators to set up these animation properties tables so that it’s easier for them to work with, it verifies the data to reduce the bug count, and everyone wins.

Your tool also ought to pack each animation properties table (upper body, lower body, facial animation, full body animation, etc) into some run-time friendly structure, such as perhaps a balanced decision tree to facilitate quick lookups based on animation properties.

Animation State Machine

Another way to handle animation selection is to have the animation controllers run animation state machines, having the game code send animation events to the state machines. Each state of the state machine corresponds to a specific animation.

When the player presses the crouch button for instance, it could send an event to all of the animation controllers saying so, maybe ACTION_BEGINCROUCH.

Depending on the logic of the state that each anim controller state machine is in, it may respond to that event, or ignore it.

The upper body anim controller may be in the “Idle” state. The logic for the idle state says that it doesn’t do anything if it recieves the ACTION_BEGINCROUCH event, so it does nothing and keeps doing the animation it was doing before.

The lower body anim controller may also be in a state named “Idle”. The logic for the lower body idle state says that if it recieves the ACTION_BEGINCROUCH event, that it should transition to the “StartCrouch” state. So, it transitions to that state which says to play the “CrouchBegin” animation (also says to ignore all incoming events perhaps), and when that animation is done, it should automatically transition to the “CrouchIdle” state, which it does, and that state says to play the “Crouching” animation, so it does that, waiting for various events to happen, including an ACTION_ENDCROUCH event to be sent from game code when the player lets go of the crouch button.

The interesting thing about the anim state machine is that it gives content creators a lot more control over the actual control of the player himself (they can say when the player is allowed to crouch for instance!) which can be either a good or bad thing, depending on your needs, use cases and skill sets of your content creators.

Going this route, you are going to want a full on state machine editor for content people to be able to set up states, the rules for state switching, and they should be able to see a model and simulate state switches to see how things look. If you DO make such an editor, it’s also a great place to allow them to define and edit bone groups. You might even be able to combine it with the key string editor and make a one stop shop editor for animation (and beyond).

Animation Controller v4 – Animation Blend Trees

At this point, our animation system is in pretty good shape, but we can do a bit better before calling it shippable.

The thing we can do to really spruce it up is instead of dealing with individual animations (for blending, animation selection, etc), is to replace them with animation blend trees like the below:


In the animation blend tree above, you can see that it’s playing two animations (FireGun and GunSight) and blending them together to create the final bone data.

As you can imagine, you might have different nodes that preformed different functionality which would result in lots of different kinds of animations using the same animation blend tree.

You will be in good shape if you make a nice animation blend tree editor where a content creator can create an animation blend tree, set parameters on animation blend tree nodes, and preview their work within that editor to be able to quickly iterate on their changes. ¬†Again, without this tool, everyone’s lives will be quite a bit harder, and a little less happy so it’s in your interest to invest the effort!

Some really useful animation nodes for use in the blend trees might include:

  • PlayAnimation – definitely needed!
  • AnimationSequence – This node has N number of “children” and will play each child in order from 1 to N in a sequence. ¬†You may optionally specify (in the editor) that you want the children chosen at random and you specify a weighting to each child for the random choosing. ¬†This is useful for “idle animations” so that periodically an idle character will do silly things.
  • AimGrid – this animation node uses the player data to see yaw and pitch of the player’s aim. ¬†It uses this information to figure out how to blend between a grid of 9 animations of the player pointing in the main directions to give a proper resulting aim. ¬†This node has 9 children, which specify the animations that specify the following aiming animations: Up Left, Up, Up Right, Left, Forward, Right, Down Left, Down, Down Right. ¬†Note that since this is a generalized anim blend tree, these child nodes can be ANY type of animation node, they aren’t required to be a “PlayAnimation” node. ¬†This in essence is the basis of parametric animation (which i mentioned at the beginning of part 1), so this is a way to get some parametric animation into your system without having to go full bore on it.
  • IK / FK Nodes – get full or partial ragdoll on your model. ¬†Also get it to do IK solving to position hands correctly for specified targets and such.
  • BlendBySpeed – You give N number of children, and movement speeds for each child. ¬†This animation node will choose the correct animation, or blend between the correct animations, based on the current traveling speed of the player. ¬†This way you get a smooth blend between walk, run and sprint animations and the player can move at whatever speed they ought to (perhaps the speed is defined by the pathing system, or the player’s input). ¬†To solve the problem of feet “dancing” as they blend, you need to make sure the footfalls happen on the same time (in %) on each animation that will blend together. ¬†This way, the animations don’t fight eachother, and the feet will appear to move properly.
  • BlendByHealth – if you want the player to walk differently when they are injured, this node could be used to specify various walk animations with matching health levels so that it will blend between them (for upper or lower body or whatever else) as is appropriate for the player’s current health level.
  • Additive Blending – to get gun recoils and such

As you can see, animation blend trees have quite a bit of power.  They are also very technical which means engineers may need to help out content folk in making good trees to resolve some edge case bugs.  In my experience, animators are often very technical folk themselves, so can do quite a bit on their own generally.

Combine anim blend trees with the animation selection systems (FSM or anim properties) and the ability to smoothly blend an animation controller between it’s internal animations (or anim trees) it’s playing and you have a really robust, high quality animation system.

Often time with this work flow, an animator will just say “hey i need an anim node which can do X”, so an animation engineer creates the node and the animators start using it to do interesting things. ¬†No need for an engineer to be deeply involved in the process of making the animation work like the animator wants, or having to worry about triggering it in the right situations etc.

Sure there will be bugs, and some things will be more complex than this, but by and large, it’s a very low hassle system that very much empowers content creators, and removes engineers from needing to be involved in most changes – which is a beautiful thing.

End of Part 2

This is the end of part 2. In the next and final part, we’ll talk about a few other miscellaneous features and optimizations.

Anatomy of a Skeletal Animation System Part 1

This is part one of “Anatomy of a Skeletal Animation System”

There is quite a bit of information out there on the basics of skeletal animation, including how to export and read animation and model data, how to animate bones and thus transform a mesh, how to blend bone data together and other related animation topics.

However, there is a lot less information out there about how to set up a system to use these techniques in a realistic way, such as you might find in your average modern 3d video game.

I myself have been an animation programmer on a few games including an open world unreal engine game called “This is Vegas” (unfortunately cancelled due to Midway going bankrupt) and also a multiplayer only first person shooter called “Gotham City Impostors” which was released earlier this year for PC, 360 and PS3. ¬†The info I’m presenting is based on experience developing those games, as well as info i gathered from other developers or read about in books or online.

In this article I’m going to assume you already know how to get animation bone data into memory, how to use that animation data to animate models (meshes), and also how to blend animation bone data together. ¬†I’m going to start off with the most simple animation system possible and slowly introduce features until we end up at something that would be fully featured for a typical modern game.

The “next generation” of skeletal animation seems like it’s going to be heavily based on parametric animation, and while we will TOUCH on the basics of parametric animation, we won’t dig into it very much beyond that. ¬† If you are making a next gen AAA title, parametric animation may possibly be for you (and maybe not), but with the rise of 3d in flash, the rise of mobile games, and also indie game development, I think traditional pose driven skeletal animation is here to stay at least for a while.

Depending on the needs of your project, and how high a quality bar you want vs how much CPU time you want to spend on animation, some of these features may not be appropriate. ¬†Feel free to take what is useful to you, and leave what isn’t. ¬†Every game is different.

Animation Controller v1 – Super Simple

The simplest point we will start out is that if you have a mesh with an animation controller on it (to control what animations should play on it and such), it has these features:

  • If you tell it to play a looping animation, it will continue playing that looping animation forever.
  • If you tell it to play a non looping animation, it will play the animation and have some way of notifying you when the animation is done. ¬†This is either by having it call a callback when it’s done, or by setting some flag on itself saying that the animation is done (won’t ever get set on a looping animation)
  • You should be able to tell it a playback multiplier to play the animation at, such as if you tell it to play at 3.0, it will play 3 times as fast, or if you tell it to play at 0.5, it will play half as fast and look like slow motion.
  • If you tell it to play an animation while another animation is playing, it will instantly stop the animation it’s playing and start playing the new animation.

With this simple animation system, we could conceivably make a game that has animated characters.

That being said, the animation system is lacking in a few ways:

  1. ¬†You can only play full body animations, meaning if you want the lower body to look like it’s jumping, and the upper body to look like it’s firing a rifle, you have to make an animation that looks like that. ¬†If you want the same thing, but you want the lower body to look like it’s standing around while the upper body is firing a rifle, you have to make an entirely different animation that looks like that! ¬†The permutations of actions can get quite large and you have to decide in advance which animation you want to use. ¬†That is, when the player is jumping, they cant change their mind that they suddenly want to start shooting.
  2. When you switch animations, there is visible “popping”. ¬†Popping is when a bone goes from doing one thing to doing something else instantly. ¬†It looks like the bone teleported and is very visible to players. ¬†It looks buggy and unpolished.
  3. If you are doing something like having the player throw a grenade, you have no way of knowing when to actually spawn the grenade model, and where to spawn it. ¬†You could “hard code” it to spawn at the same place relative to the player each time, when the animation stops playing, but that is pretty hackish and not very maintainable.

Lets start off by working on solving problem #3 of not being able to specify where to spawn a grenade or when to spawn it.

Keyframe Strings

To solve the problem of WHEN to spawn it, a feature common to nearly all animation systems is the ability to put game engine events on animation key frames.

This way, when the arm is at the correct position in the throw animation, someone would be able to put an event like “throw grenade” on that animation key. ¬†When the animation reaches that animation frame, it sends the message to the game engine, which can then create a grenade (with any specified parameters to the event).

Often times I’ve seen this implemented as an actual string that is associated with an animation key frame. ¬†The strings might be things like:

Playsound Laugh.wav   (to play a sound to go along with the animation)

SpawnPhysicsProjectile  Grenade.mdl 0 0 5   (to spawn a projectile with the specified mesh and velocity vector)

FootFallSound (This would tell the engine to play a footstep sound, based on the material the player was standing on, such as a metalic sound if on metal, or a duller thud if walking on dirt)

You could also use it to hide and show attachments or a myriad of other things.  Basically you can use it for anything that you want to be tied to an animation.

Usually you’ll want some kind of editor for animators and other content creators to be able to associate these key strings with specific key frames. ¬† If they have to work with a text file where they have to hand enter times and key strings associated with those times, it’s going to be really tedious and they are going to be sad. ¬†Also, it will be very error prone which makes everyone sad when it generates more bugs than it needs to, slowing down dev time.

On the topic of creating¬†unnecessary¬†bugs, while i’ve often seen keystrings implemented as actual strings, it’s actually a lot less error prone if you have some kind of structured input system in your key string editor.

For instance, instead of them typing a command name and supplying any required parameters, it would be a lot better for them to have to choose a key string command from a drop down list.  When they choose one, it should display any parameters that might be needed, and have some way of validating that their input is valid.

This editor should be tightly coupled with your game engine.  Example ways for doing this including having a shared header file that defines all key string commands and what parameters they require, or having the key string editor load a game dll to get at the data that way.

If you have to manually maintain the tool to match game code, it will often get out of sync and cause you pain you don’t need. ¬†Avoiding that pain means you can work on developing more features instead of fighting¬†reoccurring¬†bugs, and means QA can focus on finding harder to find bugs. ¬†In the end it means a better product which is great for the company, your continued paycheck, and the player’s experience.

Some other potential bugs can come up with key frames that I don’t have a good answer for, it’s just something you have to mindful of.

One of these bugs is that when an animation is¬†interrupted, a key frame might not get hit when you expect a key frame to get hit. ¬†For instance if an animation attaches something to a players hand, and at the end of the animation hides that attached object, if you¬†interrupt¬†the animation midway through, it won’t get hidden and the attachment will be stuck to the hand as the player does other things – which looks very weird. ¬†Your best bet is to design things in such a way that if key strings are missed, it isn’t a problem. ¬†Not always possible with all features unfortunately though…

Another problem that comes up when you have more advanced anim systems is that you may be blending out an animation which is no longer relevant, but while it is blending out, it hits a key frame. ¬†For instance if you a player is holstering a weapon, but blending out a fire animation that got interupted, you may get a “firegun” key string command, when you really don’t want it because it’s not relevant anymore. ¬†Sometimes you would want a key string to fire in that case though, so there is no real global solution to the problem that I’m aware of.


Now that we have a way of knowing WHEN to spawn a grenade in a grenade throw animation, we don’t know WHERE to spawn it. ¬†This is where sockets come in – no I’m not talking about TCP/IP or UDP sockets!

A seemingly obvious solution is probably to say which bone to spawn the grenade on in the “throw grenade” animation key string. ¬† ¬†An issue here though is that maybe if you spawn it right on the “rhand” bone, it might clip through the hand (inter-penetrate the hand) and look sloppy. ¬†Also, for other use cases, you might want to attach something where there isn’t a bone nearby.

Another seemingly obvious solution might be to add extra bones to the animation data that aren’t tied to any real geometry. ¬†This way, you can use the bones to attach things to, or spawn things at, but they aren’t tied to any real model geometry so you can make them move however you want.

The problem with this solution is that you are paying the cost of animating those bones even if you aren’t using them for anything. ¬†Enter sockets!

Sockets are a transformation (translation and rotation) away from a specified bone. ¬†They are usually only calculated on demand so that when you aren’t using them, you don’t pay a price for having them.

This way, sockets act as very cheap attachment / reference points on a model during animations to attach other models to (such as capes, helmets, guns, grenades).

When a key string command takes a socket or bone as a parameter, you should have it accept either a bone or a socket. ¬†They should be¬†usable¬†interchangeably, because sometimes you really do want to attach something to a bone, and you shouldn’t make an animator make an extra socket just to make it match a bone.

We now have a way of specifying WHEN to spawn a grenade (via a key string), and also WHERE to spawn it (specifying a socket to spawn it at as a parameter to the key string command).

Animation Controller v2 – Blending

I mentioned popping earlier and said it was caused by a bone changing where it is or how it’s moving by a drastic amount in a single frame. ¬†If you’ve read my DIY Synth articles, you probably remember how important in audio programming it is to make sure that your sound data stays continuous. ¬† The same is true of animation data, you have to make sure that bone motion / position stays continuous always, or else you’ll get popping.

Just like in audio programming, you use envelopes to help keep things continuous when you add a new animation into the mix, or remove an old animation.

For instance, If a model is playing one animation and you tell it to play another, the new animation should start at a blend weight of 0.0 and slowly increase while the old animation decreases from a blend weight of 1.0 down to 0.0.  This gives you a nice smooth blend between two animations and works for MOST animations (more on that in a second).

Typically, when crossfading from one animation to another, the magic number is to blend over 0.2 seconds, but certain uses may warrant a longer or shorter blend time. ¬†You might also blend out the old animation at a different rate than you blend in the new animation. ¬†Give your animators the option to choose so they can do whatever they need. ¬†They will be happy that they have the control, and you will be happy that you don’t have to one off program things all the time for them. ¬†Everyone wins!

What happens if you want to play an animation while an animation blend is in progress already?  0.2 seconds of blend time sounds like a short amount of time, but this actually comes up ALL THE TIME.

There are two ways to deal with this issue that I’m going to talk about.

The first way to deal with this problem is to keep a list of all the animations that are currently playing, so that if you tell the animation controller to play a bunch of different animations really quickly, it will end up sampling a bunch of different animations as various ¬†ones blend out, and the final one is blending in. ¬†This can result in A LOT of animation sampling which can take a serious toll on your game’s¬†performance. ¬†I encountered a bug on a game I worked on once that caused around 100 animations to be getting sampled on a single model for several frames due to this problem and it made the game tank HARD.

The second way to deal with this, and how I like to implement it usually, is to make it so only two animations can play at once (a main animation and a blend animation) and you have another field on the animation controller which says what the next animation  to blend in is.

Going this route, when you say to play a new animation while a blend is in progress, it goes into the “next animation” field. ¬†When the current blend is done, that next animation will blend in and the last one will blend out.

If there is already another animation in the “next animation field”, it’s replaced and it’s never seen.

This way, only two animations will be sampled / blended at a time maximum, yet you will get a perfectly smooth blending between animations, and the controls will still feel fairly snappy, although there may be a¬†noticeable¬†delay in control response if animations change a lot really often. ¬†You’ll have to make a judgement call about the needs of your game.

Lastly, I said blending works nicely for most animations but not all. ¬†One exception to this rule is when you try to blend different lower body animations together, such as trying to blend a walk animation and a run animation together. ¬†Often times, the feet will be in different places and when you blend them, it makes the feet look like they are doing a little stuttering dance and it looks ugly. ¬†I’ll talk about getting around this specific problem in the next part, but as a preview, the short version of the solution is to make sure the feet are in the same positions at the same time for the two animations.

End of Part 1

At this point we have a fairly nice animation system but it isn’t quite ready yet. The most glaring problem we have is that we can only play full body animations still, which is not acceptable. ¬†A real animation system NEEDS to be able to play different animations on different sets of bones¬†independently.

We’ll tackle that problem, and others, in part 2.

Recording lagless demo videos of a laggy game

Often times when developing a game, you’ll want to record a demo video to show to a publisher, show at E3, post on kickstarter, youtube, or other places to help generate interest or gain funding to keep your project going.

Unfortunately, the point in time that you need a video is often in the beginning of the project, when your game probably doesn’t run very fast, or might have performance spikes, making it difficult to get a high quality video capture.

Many times, developers will have to have a performance push¬†to get the game up to speed for a demo video, spending time on “demo hacks”, which are often just throw away code for after the video is made. ¬†I’ve been through a couple of these myself and they are not fun, but they are an unfortunate¬†necessity.

This article will explain a fairly simple technique for getting a full speed recording of your game engine with perfectly synchronized sound, no matter what speed your game actually runs at, saving you time and effort, not having to waste time on demo hacks just to get a presentable video.

Playable demos are a whole other beast,¬†¬†and you are on your own there, but if a video will suit your needs, you’ve come to the right place!

I’ve used this technique myself in a couple different games during development, and in fact included it as a feature of one PC game I shipped in the past, called “Line Rider 2:Unbound”, so this is also a technique for adding video recording to any game you might want to add it to.

Out of the box solutions

There are various “out of the box” ways to record a video of your game, but they have some downsides which make them not so attractive.

For instance, you can get fraps which will record any application’s audio and video and you could use that to record a video of your game. The downside here is that if your game lags, so does the video, so we still have that problem. Also, the act of recording competes with your game for resources, causing your game to run at an even lower FPS and making an even worse video. ¬†Fraps is also limited to specific platforms, and you may be working on an unsupported platform.

lastly, if you want to include this feature of video recording in a shipped product, you will have to license fraps for that use, which may be prohibitive to your project’s budget.

Other video recording software has the same or similar issues.

Rolling your own – Video

Making your own video recorder built into your game has some real easy to hit pitfalls that I want to talk about.

When considering only the video portion (not audio yet), our aim is to write all the frames to disk as individual image files (such as png, or raw uncompressed image files), and then after recording is done, use something like ffmpeg to combine the frames into a video. Writing a compressed image file (such as png or jpg) for each frame may save disk space, but may take longer for your computer to be able to process and write to disk. You will likely find that a raw file format is more performant, at the cost of increase disk space usage, but hard drives are cheap these days and everyone has huge ones. Also, at this point you probably want to use lossless image compression (such as png, or a raw image file) for your screen captures so that you don’t have compression artifacts in your screen captures. When you make a final video, you may choose a more highly compressed video format, and it may introduce it’s own artifacts, but you want to keep your source files as clean as possible so that you don’t introduce UNNECESSARY artifacts too early in the process.

If you dump each rendered frame to disk, the disk i/o can drag your game’s frame rate down to a crawl. ¬†You might think about having the disk write happen on another thread so the game isn’t limited by the disk i/o, but then you’ll have to keep a buffer of previous frames which will grow and grow and grow (since you are making frames faster than it can write to disk) until you run out of memory. ¬†It’s a losing battle for longer videos.

You could get a faster drive, configure a striped raid array, use a ram disk, or things like that, but why fix with hardware what you can fix in software?  Save yourself and your company some cash.

Similarly to the fraps problem, when you record video, that will likely affect the frame rate of your game as well, making it run slower, making a lower quality video because frames will be skipped – assuming you are using variable frame rate logic – making it so that you either have to have a “laggy” looking video as output, or your video will actually appear to speed up in the places that you encountered lag while recording, which is very odd looking and definitely not demoable.

The solution (which might be really obvious to the astute reader) is to make your game run your game’s logic at a fixed rate, instead of making it be based on frame time. ¬†For instance, instead of measuring the time between frames and using that delta to control logic (making things move farther when more time has passed etc), you just make your game act as if the same amount of time has always passed between your frames, such as ~16ms for a 60fps recording, or ~33ms for a 30fps recording. ¬†IMPORTANT: make it only behave this way when in “recording mode”. ¬†You don’t need to sacrifice variable frame rate logic just to get the ability to record nice videos. ¬†Also, 30fps is fine for a video. ¬†The more FPS your video has, the larger the video file will be. ¬†Movie and TVs are something like 24 fps, so you don’t need a 60 fps video for your game demo, 30 or less is just fine.

This way, it doesn’t matter how long it took to render each frame, the game will generate a sequence of frames at whatever frame rate you would like your video to be in. ¬†While recording the demo video, the game may run slowly, and be difficult to control if it’s REALLY laggy, but at least the output video will be smooth and perfectly lagless. ¬† Later in this article I present a possible solution to the problem of difficulty playing the game while recording.

Are we done at this point? ¬†NO! ¬†We haven’t talked at all about audio, and as it turns out our audio is in a very odd state going this route.

Rolling your own – Audio

From the section above, we have a nice lagless video stream, but if we just recorded audio as it went, the audio would be out of sync with the frames. This is because we recorded audio in real time, but we recorded the frames in variable time.

You could try to sync the audio in the right places with each frame, but then you’d have to speed up and slow down portions of your audio to hit the right frame numbers, which would make your audio sound really weird as it sped up and slowed down and changed pitch.

Definitely not demoable! So what’s the solution?

The solution is that while you are recording your video frames, you also make an audio timeline of what audio was triggered at which frame numbers.

For instance, if on frame 20, the player swung his sword and on frame 25 hit an exploding barrel, causing it to explode, your timeline would say “at frame 20, play the sword swing sound effect. at frame 25, play the exploding barrel sound effect”.

I’ve found it really easy to capture an audio timeline by hooking into your game or engine’s audio system itself, capturing all sound events. ¬†It usually is not very difficult to implement this part.

After you have recorded all of your video frames, and have an audio timeline, the next step is to re-create the audio from the timeline, which means you need a way of doing “offline” audio mixing.

If you are using an audio library, check the documentation to see if it has an offline mode, many of them do, including the ever popular fmod. ¬†If your audio library can’t do it for you, there are various command line tools and audio libraries out there that can do this for you. ¬†I believe portaudio (port mixer?) can do this for you, and also another open sourced program called sox.

What you need to do is render each item in the audio timeline onto a cumulative audio stream. ¬†If your video were a 30fps video, and a 500ms sound effect happened at frame 93, that means that you know this sound effect started at 3.1 seconds in (frame 93 * 33.33 miliseconds per frame) and lasts until 3.6 seconds (since it’s 500ms long). ¬† So, you’d mix that into the output audio stream at the appropriate point in time, and then rinse and repeat with the rest of the audio timeline items until you had the full audio stream for the video.

When you are done with this stage, you have your video frames and your audio stream. ¬†With your video creation software (such as ffmpeg) you can combine these into a single video file which shows your game running perfectly at whatever frame rate you specified, and with perfectly synchronized audio. ¬†It’s a beautiful thing and definitely ready to demo to get some funding.


To recap, the steps for creating a perfect video recording of your game are:

  1. When in recording mode, make your game run at a fixed frame rate – no matter how long it really was between frames, lie to your game and tell it that 33.33ms have passed each frame for 30fps video or 16ms for 60fps video (or whatever other frame rate you want to run at)
  2. Write each rendered frame to disk as an uncompressed or lossless compression graphics file.
  3. While rendering each frame, build up a timeline of audio events that you can use to re-create the audio later.
  4. After all the frames are captured, render your audio timeline into an audio stream.
  5. After you have your audio stream and each video frame, use software such as ffmpeg to combine them into a perfect, lagless video.

Bonus Points – Or making this feature a shippable feature of your game for players to use

At this point, the final product (the video) is as nice as it can possibly be.  However, the process of actually recording the video can be cumbersome because even though you are making a nice and smooth 30fps video, during recording it may be running at 2fps (depending on your machine) making it very difficult to control the game.  Also, in the final video it will appear that the user is traversing menus, inputting commands, and reacting at superhuman speeds.

A good way to handle this is instead of recording during play, what you do is record all the input that happens during the recording process.  This way you have an input timeline that is tied to frame numbers, the same way the audio timeline is tied to frame numbers.

When the recording process is done, you then put up a nice dialog for the end user saying something like “Rendering video please wait….” with a progress bar, and then re-simulate the user input that¬†occurred¬†during the recording phase, and render all those frames to disk (well, screen capture them as image files just like usual, just dont display them to the end user).

Since building an input timeline is relatively cheap computationally, you should have no slow down during the “recording” phase of the video while you (or the end user) is actually playing the game.

The “Gotcha” here is that your game needs to be deterministic for fixed rate time steps (or at least everything that really matters needs to be deterministic, maybe not particles or something) which can potentially be a bit tricky, but the upside is if you actually make this happen, you can record light weight playbacks as “videos” and have users share these feaux-videos with¬†each other¬†to watch playbacks of gameplay that other players had. ¬†When you want to export these playbacks as real videos, you can put it through the regular video recording steps and spit out a full mpeg, suitable for sharing, uploading to youtube (from within the app perhaps even?) just like normal. ¬†But, until you need to use the video outside of your application, you have very small files users can share with¬†each other¬†to view “videos” of in game gameplay.

Final tip: if doing this in windows, I’ve found that in recent versions of windows, doing ¬†the screen capture using GDI functions instead of DirectX is actually WAY faster so use that if you can. ¬†I’m thinking this must be because windows already has a screen cap in memory to show those little icons when you mouse over the minimized application or something.

That’s all folks!

That’s all there is to it. With luck this will save some fellow engineers from having to crunch up some “demo hacks” to get performance up for an E3 demo video or the like. If you have any questions or comments, drop me a line (:

MoriRT: Pixel and Geometry Caching to Aid Real Time Raytracing

About half a year ago, some really intriguing ideas came to me out of the blue dealing with ways to speed up raytracing.  My plan was to create a couple games, showing off these techniques, and after some curiosity was piqued, write an article up talking about how it worked to share it with others in case they found it useful.

For those of you who don’t know what raytracing is, check out this wikipedia article:

Due to some distractions and technical setbacks unrelated to the raytracing itself, I’ve only gotten one small game working with these techniques. ¬†One of the distractions is that I implemented the game using google’s native client (NaCl) and for some reason so far unknown to me, it doesn’t work on some people’s machines which is very annoying! ¬†I plan to get a mac mini in the next couple months to try and repro / solve the problem.

Check out the game if you want to.  Open this link in google chrome to give it a play:

The sections of this article are:

  1. Limitations
  2. Geometry Caching
  3. Pixel Caching
  4. Future Work
  5. Compatible Game Ideas


Admittedly, these techniques have some pretty big limitations. ¬†I’ve explained my techniques to quite a few fellow game devs and when I mention the limitations, the first reaction people give me is the same one you are probably going to have, which is “oh… well THAT’S dumb!”. ¬†Usually after explaining it a bit more, people perk up again, realizing that you can still work within the limitations to make some neat things.¬† ¬†So please hold off judgement until checking out the rest of the article! (:

The big fat, unashamed limitations are:

  • The camera can’t move (*)
  • Objects in the scene shouldn’t move too much, at least not all at the same time

(* there is a possible exception to the camera not moving limitation in the “Future Work” section)

So…. what the heck kind of games can you make with that? ¬†We’ll get to that later on, but here’s some things these techniques are good at:

  • Changing light color and intensity is relatively inexpensive
  • Changing object color and animating textures is inexpensive
  • These techniques don’t break the¬†parallel-izable¬†nature of raytracing. ¬†Use all those CPU and GPU cores to your heart’s content!

Seems a bit dodgy I’m sure, but read on.

Geometry Caching

The first technique is geometry caching.

The idea behind geometry caching is: ¬†If no objects have moved since the last frame, why should we test which objects each ray hits? ¬†It’s a costly part of the ray tracing, and we already KNOW that we are going to get the same results as last frame, so why even bother? ¬†Let’s just use the info we calculated last frame instead.

Also, if some objects HAVE moved, but we know that the moving objects don’t affect all rays, we can just¬†recalculate¬†the rays that have been affected, without needing to¬†recalculate¬†all rays.

Just because we know the collision points for rays doesn’t mean that we can just skip rendering¬†all together though. ¬†Several things that can make us still need to re-render a ray include: ¬†Animating textures, objects changing colors, lights dimming, lights changing color. ¬†When these things happen, we can re-render a ray much less expensively than normal (just recalculate lighting and shading and such), so they are¬†comparatively¬†inexpensive operations compared to objects actually moving around.

How I handle geometry caching is give each ray (primary and otherwise) a unique ID, and I have a dynamic array that holds the collision info for each ID.

In the part of the code that actually casts a single ray, i pass the ID and a flag saying whether it’s allowed to use the geometry cache. ¬†If it isn’t allowed to use the geometry cache, or there is no entry in the cache for the ID, the code calculates intersection info and puts it into the geometry cache.

It then uses the geometry cache information (whether it was re-calculated, or was usable as is) and applies phong shading, does texture lookups, recurses for ray refraction and reflection, and does the other things to figure out the color of the pixel.

In my first implementation of the geometry cache, it was very fast to render with once it was filled in, but it was really expensive to invalidate individual cache items.  If an object moved and a couple hundred geometry cache items needed to be marked as dirty, it was a really computationally expensive operation!

A better option, the one i use now, involves both a 2d grid (for the screen pixels) and a 3d grid (to hold the geometry of the world).

Breaking the screen into a grid, when each ray is cast into the world, I’m able to tell the ray what screen cell it belongs to. ¬† This way, as a ray traverses the 3d grid holding the world geometry, it’s able to add itself each world grid maintains to keep track of which rays pass through that 3d cell (keeping just the unique values of course!). ¬†Child rays know what screen cell they are in by getting that value from their parent.

If an object moves in the world, you can make a union of which world cells it occupied before it moved, and which world cells it occupies after the move. ¬†From there, you can make a union of which screen cells sent rays into that world cell. ¬†The last step is to mark all those screen cells as “geometry dirty” so that next frame, the rays in those cells are disallowed from using the geometry cache data, and instead will re-calculate new intersection info.

This method makes it so potentially a lot of rays re-calcuate their intersection data that don’t really need to, but by tuning the size of the screen and world grids, you can find a good happy medium for your use cases.

If you have an idea to better maintain the geometry cache, feel free to post a comment about it!

Pixel Caching

The second technique is pixel caching which is a fancy way of saying “don’t redraw pixels that we don’t have to”. ¬†The less rays you have to cast, the faster your scene will render.

The first challenge to tackle in this problem is how do you know which pixels will be affected when an object changes color?  That is solved by the same mechanism that tells us when geometry cache data is invalidated.

When an object changes color (or has some other non-geometry change), you just get the list of world cells the object resides in, and then get the union of screen cells that sent rays through those world cells.

When¬†you have that list, instead of marking the screen cell “geometry dirty”, you mark it as “pixel dirty”.

When rendering the screen cells, any screen cell that isn’t marked as dirty in either way can be completely skipped. ¬†Rendering it is a no-op because it would be the same pixels as last time! (:

This is the reason why you want to minimize geometry changes (objects moving, rotating, resizing, etc) and if you have to,  rely instead on animating textures, object colors, and lighting colors / intensities.

Future Work

Here’s a smattering of ideas for future work that I think ought to bear fruit:

  • Replace the screen and/or world grid with better performing data structures
  • Pre-compute (a pack time process) the primary rays and subsequent rays of static geometry and don’t store static geometry in the world grid, but store it in something else instead like perhaps a BSP tree. ¬†This way, at run time, if a ray misses all objects in the dynamic geometry world grid, it can just use the info from the pre-computed static geometry, no matter how complex the static geometry is. ¬†If something DOES hit a dynamic object however, you’ll have to test subsequent rays against both the dynamic object world grid, and the data structure holding the info about the static geometry but hopefully it’ll be a net win in general.
  • Investigate to see how to integrate this with photon mapping techniques and data structures. ¬†Photon mapping is essentially ray tracing from the opposite direction (from light to camera, instead of from camera to light). ¬†Going the opposite direction, there are some things it’s really good at – like caustics – which ray tracing alone just isn’t suited for:¬†
  • In a real game, some things in the world will be obscured by the UI overlays. ¬†There might be an oportunity in some places to “early out” when rendering a single ray if it was obscured by UI. ¬†It would complicate caching those since an individual ray could remain dirty while the screen cell itself was marked as clean.
  • Orthographic camera: ¬†If the camera is orthographic, that means you could pan the camera without invalidating the pixel and geometry cache. ¬†This would allow the techniques to be used for a side scrolling game, overhead view game, and things of that nature – so long as orthographic projection looks good enough for the needs of the game. ¬†I think if you got creative, it could end up looking pretty nice.
  • Screen space effects: enhance the raytracing visuals with screen space particles and such. ¬†Could also keep a “Z-Buffer” by having a buffer that holds the time each ray took to hit the first object. ¬†This would allow more advanced effects.
  • Interlaced rendering: to halve the rendering time, every frame could render every other horizontal line. ¬†Un-dirtying a screen cell would take 2 frames but this ought to be a pretty straight forward and decent win if losing a little bit of quality is ok.
  • red/blue 3d glasses mode: ¬†This is actually a feature of my snake game but figured i’d call it out. ¬†It works by rendering the scene twice which is costly (each “camera” has it’s own geometry and pixel cache at least). ¬†If keeping the “Z-Buffer” as mentioned above, there might be a way to fake it more cheaply but not sure.

Compatible Game Ideas

Despite the limitations, I’ve been keeping a list of games that would be compatible with these ideas. ¬†Here’s the highlights of that list!

  • Pinball: ¬†Only flippers, and the area around the ball would actually have geometry changes, limiting geometry cache invalidating. ¬†Could do periodic, cycling color / lighting animations on other parts of the board to spice the board up in the “non active” areas.
  • Marble Madness Clone: Using an orthographic camera, to allow camera paning, a player could control a glass or mirrored ball through a maze with dangerous traps and time limits. ¬†Marble Madness had very few moving objects and was more about the static geometry so there’d probably be a lot of mileage here. ¬†You could also have animated textures for pools of acid so that they didn’t impact the geometry cache.
  • Zelda 1 and other overhead view type games: Using ortho camera to allow panning, or in the case of Zelda 1, have each “room” be a static camera. ¬†You’d have to keep re-rendering down somehow by minimizing enemy count, while still making it challenging. ¬†Could be difficult.
  • Metroidvania: side scroller with ortho camera to allow panning. ¬†Could walk behind glass pillars and waterfalls for cool refractive effects.
  • Monkey Island type game: LOTS of static geometry in a game like that which would be nice.
  • Arkanoid type game: static camera, make use of screen space effects for break bricking particles etc
  • Mystery game: Static scenes where you can use a magnifying glass to LITERALLY view things better (magnification due to refraction, just like in real life) to find clues and solve the mystery. ¬†Move from screen to screen to find new clues and find people to talk to, progress the storyline etc.
  • Puzzle Game: could possibly do a traditional “block based” puzzle game like puzzle fighters, tetris, etc.
  • Physics based puzzle game: You set up¬†pieces¬†on a board (only one object moves at once! your active¬†piece!) then press “play”. ¬†Hopefully it’d be something like a ball goes through your contraption which remains mostly motionless and you beat the level if you get the ball in the hole or something.
  • Somehow work optics into gameplay… maybe a puzzle game based on lasers and lights or something
  • Pool and board games: as always, gotta have a chess game with insane, state of the art graphics hehe
  • mini golf: A fixed camera when you are taking your shot, with a minimum of moving objects (windmills, the player, etc). ¬†When you hit the ball, it rolls, and when it stops, the camera teleports to the new location.
  • Security gaurd game: ¬†Have several raytraced viewports which are played up to be security camera feeds. ¬†Could have scenes unfold in one feed at a time to keep screen pixel redraw low.
  • Turn based overhead view game: ¬†Ortho camera for panning, and since it’s turn based, you can probably keep object movement down to one at a time.

Lastly, here’s a video describing this stuff in action. ¬†When you view the video, the orange squares are screen tiles that are completely clean (no rendering required, they are a no-op). ¬†Purple squares are screen tiles that were able to use the geometry cache. ¬† Where you don’t see any squares at all, it had to re-render the screen tile from scratch and wasn’t able to make use of either caching feature.

Feedback and comments welcomed! ¬†I’d be really interested too in hearing if anyone actually uses this for anything or tries to implement in on their own.