Temporal supersampling, flipquads and real time raytracing

Follow me on this train of thought 😛

1) There’s this thing called super sampling where you render an image at a larger resolution, so that you can properly downsample it to the right size (the size of your screen for instance) to avoid aliasing problems. The problem here is that you are rendering more pixels, so it is more expensive to render, which is usually a deal breaker for real time applications that are trying to push the envelope of performance – like modern games.

2) There’s a way to get around this with something called Temporal Supersampling where you use the last frame rendered to provide extra information for the current frame, so that in a way, you get supersampled data by spreading it out over 2 frames. (More info on supersampling: Temporal supersampling). You get better results by jittering (offseting) the pixels you render from frame to frame, by a sub-pixel amount. This is the usual monte carlo sampling kind of situation… find some cheap but well behaving pseudorandom number generator you can run in your pixel shader to offset each pixel by, or use a regular pattern of some sort that gives good enough results.

3) That gives you 2 samples if you only compost the last and current frame, but more samples is better of course. You could keep more frames from the past around, but that takes up the precious resource of memory. Apparently, when the hardware does MSAA (multisampling antialiasing), it has different configurations for different numbers of samples and it’s configurable somehow. If you have 2 samples, they may be 2 vertical dots, or 2 horizontal dots. If you have 3 samples, it might look like a “3” on a domino. If you have 5 samples it might look like a “5” on a domino.

MSAAConfigs

4) Sometimes a corner will be sampled so that a sample can be shared across multiple pixels to increase efficiency. There is this really interesting thing called “flipquads” that samples on an edge for that same reason. You can see some info on here: An Extremely Inexpensive Multisampling Scheme. Basically, you only do two samples per pixel, sampling at 2 of the 4 sample locations on the edge of a pixel, so that the pixels that share the edge can use the results. Effectively, you are doing 2 sample per pixel, but getting 4 samples per pixel due to sample sharing.

5) If you combine flipquads with temporal supersampling, it means that you get 4 samples for the cost of 2, amortized over 2 frames. So, you essentially just render the normal amount of pixels (1 sample per pixel), compost frame N against frame N-1, and get the benefit of a 4 tap MSAA. So, it’s really cheap, and yes… it does actually help significantly, despite the fact that so many samples are redundant.

None of the above is anything new… I watched it all in various SIGGRAPH 2014 presentations earlier today from big name modern games – and man am i amazed what people are doing these days!

Now for the new part…

One way for raytracers to get better visual quality is to do multiple rays per sample, doing monte carlo sampling, where each of the rays in the group is perturbed by tiny amounts. Some details here: Advanced Topics in Computer Graphics: Sampling Techniques

In my own personal OpenCL real time raytracer, I don’t have the luxury of doing multiple rays per pixel – and in fact, I have a graphics option that allows you to render only half the screen (top / bottom) each frame alternating, to cut the number of rays down so that it runs faster!

What if a person was able to do temporal supersampling with a realtime raytracer, using flipquads to make it so it could get the information of 4 rays per pixel, while only taking a single ray cast per pixel each frame? Wouldn’t that be something?

There are some technical details to work out but I think there is some real magic here waiting to happen.

The biggest technical problem I foresee is reprojecting the pixels from the last frame to the current frame. This probably would work ok if your rays had a strict projection matrix governing them, but there may be difficulties with reflection and refraction, and honestly, I personally want to distort camera rays for game effects (like being underwater) so wouldn’t want to be stuck with a strict projection matrix. Maybe there’s some clever solution to make it all ok though…

Also – the link to flipquads is actually an explanation of “fliptris” a technique using 1.25 samples per pixel. If that were amortized across 2 frames, that means you would only need to cast 62% of your rays theoretically. That might be a nice performance win, while gaining the benefits of temporal supersampling and ultimately having 3 samples for each pixel!

Analytic Fog Density

AnalyticFog

There are a number of ways to implement the effect of fog with modern real time rendered graphics. This blog post will explain how to render fog that has varying density, based on a function of X,Y,Z location in space, like in the picture above.

Faked Fog

One way is to “fake it” and do something like set the color of a pixel on an object to be based on it’s height. For instance you might say that pixels with a y axis value above 15 are unfogged, pixels with y axis values between 15 and 10 progressively get more fogged as they get closer to 10, and pixels with y axis values less than 10 are completely fogged. That can make some fog that looks like this:

heightfog

A strange side effect of doing that though, is if you go down “into” the fog, and look out of the fog, things that should be fogged won’t. For instance, looking up at a mountain from inside the fog, the mountain won’t be fogged at all, even though it should be because you are inside of the fog.

A better way to do it, if you intend for the camera to be able to go into the fog, is to calculate a fogging amount for a pixel based on how far away it is from the view point, and how dense the fog is between the view point and the destination point.

If you are doing ray based rendering, like ray tracing or ray marching, you might find yourself trying to find how much fog is between points that don’t involve the view point – like if you are calculating the reflection portion of a ray. In this case, you are just finding out how much fog there is between the point where the reflection happened and the closest intersection. You can consider the point of reflection as the “view point” for the purpose of fogging.

Sometimes, the entire scene might not be in fog. In this case, you have to find where the fog begins and ends, instead of the total distance between the view point and the destination point.

In any case, the first thing you need to do when fogging is figure out the point where the fog begins, and the point where the fog ends. Then, you can figure out how much fog there is based on how the fog density works.

Constant Density Fog

GraphConstant

The simplest sort of fog is fog that has the same density all throughout it.

What you do in this case is just multiply the fog density by the distance spent in the fog to come up with a final fog value.

As an example, your fog density might be “0.04” and if you are fogging a pixel 10 units away, you multiply density by distance. FogAmount = 0.04 * 10.0 = 0.4.

Doing this, you know the pixel should be 40% fogged, so you interpolate the pixel’s color 40% towards the fog color. You should make sure to clamp the fog amount to be between 0 and 1 to avoid strange visual anomolies.

The image below shows a constant fog density of 0.04.

ConstantFog1

Here’s an image of the same constant density fog as viewed from inside the fog:

ConstantFog3

A problem with constant fog density though, is that if you view it from edge on, you’ll get a very noticeable hard edge where the fog begins, like you can see in the image below:

ConstantFog2

Linear Density Fog

GraphLinear

With linear fog density, the fog gets denser linearly, the farther you go into the fog.

With a fog plane, you can get the density of the fog for a specified point by doing a standard “distance from plane to point” calculation and multiplying that by how much the fog density grows per unit of distance. If your plane is defined by A*x+B*y+C*y+D = 0, and your point is defined as X,Y,Z, you just do a dot product between the plane and the point, giving the point a W component of one.

In other words…

FogDensity(Point, Plane) = (Plane.NormX * Point.X + Plane.NormY * Point.Y + Plane.NormZ * Point.Z + Plane.D * 1.0) * FogGrowthFactor

Here’s a picture of linear fog with a fog growth factor of 0.01:

LinearFog1

The same fog viewed from the inside:

LinearFog2

And lastly, the fog viewed edge on to show that the “hard line” problem of linear fog is gone (dramatic difference isn’t it?!):

LinearFog3

Analytic Fog Density – Integrals

GraphAnalytic

Taking a couple steps further, you might want to use equations to define fog density with some function FogDensity = f(x,y,z,).

How could you possibly figure out how much fog there is between two given points when the density between them varies based on some random function?

One way would be to take multiple samples along the line segment between the view point and the destination point, and either calculate the fog amount in each section, or maybe average the densities you calculate and multiply the result by the total distance. You might have to take a lot of samples to make this look correct, causing low frame rate, or accepting low visual quality as a compromise.

If you look at the graphs for the previous fog types, you might notice that we are trying to find the area under the graphs between points A and B. For constant density fog, the shape is a rectangle, so we just multiply width (time in fog) by height (the constant fog density) to get the fog amount. For linear density fog, the shape is a trapezoid, so we use the trapezoid area formula which is height (in this case, the distance in the fog) times the sum of the base lengths (the fog densities at points A and B) divided by 2.

How can we get the area under the graph between A and B for an arbitrary formula though? Well, a way exists luckily, using integrals (thanks to my buddy “Danny The Physicist” for educating me on the basics of integrals!).

There’s a way to transform a formula to get an “indefinite integral”, which itself is also a formula. I won’t go into the details of how to do that, but you can easily get the indefinite integral of a function by typing it into Wolfram Alpha.

Once you have the indefinite integral (let’s call it G(x)) of the fog density formula (let’s call it F(x)), if you calculate G(B) – G(A), that will give you the area under the graph in F(X) between A and B. Yes, seriously, that gives us the area under the graph between our points, thus giving us the amount of fog that exists between the two points for an arbitrary fog density function!

Note that when you plug a value into the indefinite integral and get a number out, that number is called the definite integral.

Analytic Fog Density – Implementation Details

Now that the theory is worked out let’s talk about implementation details.

First off, coming from an additive audio synthesis type of angle, I figured I might have some good luck adding together sine waves of various frequencies and amplitudes, so I started with this:

sin(x*F) * A

F is a frequency multiplier that controls how long the sine wave is. A is an amplitude multiplier that controls how dense the fog gets max.

Next, I knew that I needed a fog density function that never goes below zero, because that would mean if you looked through a patch of negative fog density, it would make the other fog you were looking through be less dense. That is just weird, and doesn’t exist in reality (but maybe there is some interesting visual effect hiding in there somewhere??), so the formula evolved to this, making sure the function never went below zero:

(1 + sin(x*F)) * A

Plugging that equation into wolfram alpha, it says the indefinite integral is:

(x – (cos(x*F)) / F) * A

You can check that out here:
Wolfram Alpha: (1 + sin(x*F)) * A.

It’s also kind of fun to ask google to graph these functions so you can see what they do to help understand how they work. Here are the graphs for A = 0.01 and F = 0.6:
Fog Density: graph (1 + sin(x*0.6)) * 0.01
Indefinite Integral: graph (x – (cos(x*0.6)) / 0.6) * 0.01

So, if you have point A and B where the fogging begins and ends, you might think you can do this to get the right answer:
FogAmount = G(B.x) – G(A.x)

Nope! There’s a catch. That would work if A and B had no difference on the y or z axis, but since they probably do, you need to jump through some hoops. In essence, you need to stretch your answer across the entire length of the line segment between A and B.

To do that, firstly you need to get that fog amount down to unit length. You do that by modifying the formula like so:
FogAmount = (G(B.x) – G(A.x)) / (B.x – A.x)

This also has a secondary benefit of making it so that your fog amount is always positive (so long as your fog density formula F(X) can’t ever go negative!), which saves an abs() call. Making it always positive ensures that this works when viewing fog both from the left and the right.

Now that we have the fog amount down to unit length, we need to scale it to be the length of the line segment, which makes the formula into this:
FogAmount = (G(B.x) – G(A.x)) * Length(B-A)/(B.x – A.x)

That formula will now give you the correct fog amount.

But, one axis of fog wasn’t enough to look very good, so I wanted to make sure and do one sine wave on each axis. I used 0.01 amplitude for each axis, but for the X axis i used a frequency of 0.6, for the Y axis i used a frequency of 1.2 and for the Z axis i used a frequency of 0.9.

Also, I wanted to give a little bit of baseline fog, so I added some constant density fog in as well, with a constant density of 0.1.

As a bonus, I also gave each axis a “movement factor” that made the sine waves move over time. X axis had a factor of 2.0, Y axis had a factor of 1.4 and Z axis had a factor of 2.2.

Putting all of this together, here is the final fog equation (GLSL pixel shader code) for finding the fog amount between any two points at a specific point in time:

//=======================================================================================
float DefiniteIntegral (in float x, in float amplitude, in float frequency, in float motionFactor)
{
	// Fog density on an axis:
	// (1 + sin(x*F)) * A
	//
	// indefinite integral:
	// (x - cos(F * x)/F) * A
	//
	// ... plus a constant (but when subtracting, the constant disappears)
	//
	x += iGlobalTime * motionFactor;
	return (x - cos(frequency * x)/ frequency) * amplitude;
}

//=======================================================================================
float AreaUnderCurveUnitLength (in float a, in float b, in float amplitude, in float frequency, in float motionFactor)
{
	// we calculate the definite integral at a and b and get the area under the curve
	// but we are only doing it on one axis, so the "width" of our area bounding shape is
	// not correct.  So, we divide it by the length from a to b so that the area is as
	// if the length is 1 (normalized... also this has the effect of making sure it's positive
	// so it works from left OR right viewing).  The caller can then multiply the shape
	// by the actual length of the ray in the fog to "stretch" it across the ray like it
	// really is.
	return (DefiniteIntegral(a, amplitude, frequency, motionFactor) - DefiniteIntegral(b, amplitude, frequency, motionFactor)) / (a - b);
}

//=======================================================================================
float FogAmount (in vec3 src, in vec3 dest)
{
	float len = length(dest - src);
	
	// calculate base fog amount (constant density over distance)	
	float amount = len * 0.1;
	
	// calculate definite integrals across axes to get moving fog adjustments
	float adjust = 0.0;
	adjust += AreaUnderCurveUnitLength(dest.x, src.x, 0.01, 0.6, 2.0);
	adjust += AreaUnderCurveUnitLength(dest.y, src.y, 0.01, 1.2, 1.4);
	adjust += AreaUnderCurveUnitLength(dest.z, src.z, 0.01, 0.9, 2.2);
	adjust *= len;
	
	// make sure and not go over 1 for fog amount!
	return min(amount+adjust, 1.0);
}

More Info

I ended up only using one sine wave per axis, but I think with more sine waves, or perhaps different functions entirely, you could get some more convincing looking fog.

At some point in the future, I’d like to play around with exponential fog density (instead of linear) where the exponential power is a parameter.

I also think that maybe squaring the sine waves could make them have sharper density changes perhaps…

One thing that bugs me in the above screenshots is the obvious “hard line” in both constant and linear fog where it seems fog crosses a threshold and gets a lot denser. I’m not really sure how to fix that yet. In traditional rasterized graphics you could put the fog amount on a curve, to give it a smoother transition, but in ray based rendering, that could make things a bit odd – like you could end up with an exponential curve butting up against the start of a different exponential curve (due to reflection or refraction or similar). The fog density would end up looking like log graph paper which would probably not look so great – although honestly I haven’t tried it to see yet!

If you have any questions, or feedback about improvements you know about or have discovered in any of the above, post a comment and let me know!

Here’s a good read on fog defined by a plane, that also gets into how to make branchless calculations for the fog amounts.
Unified Distance Formulas for Halfspace Fog

Interactive ShaderToy.com demo with GLSL pixel shader source code that you can also edit in real time with WebGL:
Fog

Bezier Curves Part 2 (and Bezier Surfaces)

This is a follow up post to Bezier Curves. My plan was to write a post about b-splines and nurbs next, but after looking into them deeper, I found out they aren’t going to work for my needs so I’m scratching that.

Here’s some basic info on b-splines and nurbs though before diving deeper into Bezier curves and surfaces.

B-Splines (Basis Splines)

Bezier curves are nice, but the more control points you add, the more complex the math gets because the degree of the curve function increases with each control point added. You can put multiple Bezier curves end to end to be able to have more intricate curves, but another option is to use B-Splines.

B-Splines are basically Bezier curves which let you specify more control points without raising the degree of the Bezier curve. They do this by having control points only affect part of the total curve.

This way, you could make a quadratic b-spline which had 10 control points. Only a few control points control any given point on the curve, so the curve stays quadratic (and so does the math), but you get a lot more control points. A “Knot Vector” is what controls which parts of the curve the control points control.

A Bezier curve is actually a special case of B-Spline where all control points affect the entire curve.

Nurbs (Non Uniform Rational B-Spline)

Sometimes when working with curves, you want some control points to be stronger that others. You can accomplish this in Bezier curves and B-splines by doubling up or trippling up control points in the same location to make that control point twice, or three times as strong respectively.

What if you want a control point to be 1.3 times stronger though? That gets a lot more complicated.

Nurbs solve that problem by letting you specify a weight per control point.

Just like Bezier curves are a special case of B-Splines, B-Splines are a special case of nurbs. A B-Spline could be thought of as nurbs that has the same weighting for all control points.

Back to Bezier!

My end goal is to find a curve / surface type that is flexible enough to be used to make a variety of shapes by artists, but is efficient at doing line segment tests against on the GPU. To this end, B-Splines and Nurbs add algorithmic and mathematical complexity over Bezier curves, and seem to be out of the running unless I can’t find anything more promising.

My best bet right now looks like a Bezier Triangle. Specifically, a quadratic Bezier triangle, where each side of the triangle is a quadratic Bezier curve that has 3 control points. When I get those details fully worked out, I’ll report back, but for now, here’s some interesting info I found about generalizing bezier curves both in order (linear, quadratic, cubic, quartic, etc) as well as in the number of dimensions (line, curve, triangle, tetrahedron, etc).

Bezier Generalized

I found the generalized equation on the wikipedia page for Bezier triangles and am super glad i found it, it is very cool!

I want to show you some specifics to explain the generalization by example.

Quadratic Curve:
(A * S + B * T) ^ 2

Expanding that gives you:
A^2 * S^2 + A * B * 2 * S * T + B^2 * T^2

In the above, S and T are Barycentric Coordinates in a 1 dimensional Simplex. Since we know that barycentric coordinates always add up to 1, we can replace S with (1-T) to get the below:

A^2 * (1-T)^2 + A * B * 2 * (1-T) * T + B^2 * T^2

Now, ignoring T and the constants, and only looking at A and B, we have 3 forms: A^2, AB and B^2. Those are our 3 control points! Let’s replace them with A,B and C to get the below:

A * (1-T)^2 + B * 2 * (1-T) * T + C * T ^2

And there we go, there’s the quadratic Bezier curve formula seen in the previous post.

Cubic Curve:
(A * S + B * T) ^ 3

To make a cubic curve, you just change the power from 2 to 3, that’s all! If you expand that equation, you get:
A^3*S^3+3*A^2*B*S^2*T+3*A*B^2*S*T^2+B^3*T^3

We can swap S with (1-T) to get:

A^3*(1-T)^3+3*A^2*B*(1-T)^2*T+3*A*B^2*(1-T)*T^2+B^3*T^3

Looking at A/B terms we see that there is more this time: A^3, A^2B, AB^2 and B^3. Those are our 4 control points that we can replace with A,B,C,D to get:
A*(1-T)^3+3*B*(1-T)^2*T+3*C*(1-T)*T^2+D*T^3

There is the cubic Bezier curve equation from the previous chapter.

Linear Curve:
(A * S + B * T) ^ 1

To expand that, we just throw away the exponent. After we replace S with (1-T) we get:
A * (1-T) + B * T

That is the formula for linear interpolation between 2 points – which you could think of as the 2 control points of the curve.

One more example before we can generalize.

Quadratic Bezier Triangle:
(A * S + B * T + C * U) ^ 2

If you expand that you get this:
A^2*S^2+2*A*B*S*T+2*A*C*S*U+B^2*T^2+2*B*C*T*U+C^2*U^2

Looking at combinations of A,B & C you have: A^2, AB, AC, B^2, BC, C^2. Once again, these are your control points, and their names tell you where they lie on the triangle. A Bezier triangle is a triangle where the 3 sides of the triangle are bezier curves. A quadratic bezier triangle has quadratic bezier curves for it’s edges which mean that each side has 3 control points. Those 3 control points are made up of the 3 corners of the triangle, and then 3 more control points, each one being between end points. A^2, B^2 and C^2 represent the 3 corners of the triangle. AB is the third control point for the bezier curve on the edge AB. BC and AC follow that pattern as well! Super easy to remember.

In a cubic Bezier triangle, you get a lot more control points, but a new class of control point too: ABC. This control point is in the middle of the triangle like the name would imply.

Anyways, in the expanded quadratic bezier triangle equation above, when you replace the control points with A,B,C for the triangle corner control points (the squares) and D,E,F for the inbetween control points, you get the bezier triangle equation below:

A*S^2+2*D*S*T+2*E*S*U+B*T^2+2*F*T*U+C*U^2

Note that we are dealing with a simplex in 3d now, so once again, instead of needing ALL Barycentric coordinates (S,T,U) we could pick one and replace it. For instance, we could replace U with (1-S-T) to have one less variable floating around.

All Done for Now

You can use this pattern to expand either in “surface dimension”, or in the dimension of adding more control points (and increasing the order of the equation). I love it because it’s super simple to remember that simple equation, and then just re-calculate the equation you need for whatever your specific usage case is.

If this stuff is confusing, check out the wiki page for Bezier Triangles, it has a great graphic that really shows you what I’m trying to explain:
Bezier Triangle

Next up I either want to make an HTML5 interactive app for messing around with Bezier triangles, or if I can figure out how to intersect a line segment with a quadratic Bezier triangle, i’ll probably just have some real cool looking screenshots to post along w/ the equation I ended up using (;

Special thanks to wolfram alpha for crunching some of these equations. Check it out, it’s really cool!
Wolfram Alpha – Cubic Bezier Curve Expansion

For more bezier fun check out my next Bezier post: One Dimensional Bezier Curves.

MoriRT: Pixel and Geometry Caching to Aid Real Time Raytracing

About half a year ago, some really intriguing ideas came to me out of the blue dealing with ways to speed up raytracing.  My plan was to create a couple games, showing off these techniques, and after some curiosity was piqued, write an article up talking about how it worked to share it with others in case they found it useful.

For those of you who don’t know what raytracing is, check out this wikipedia article:

http://en.wikipedia.org/wiki/Ray_tracing_(graphics)

Due to some distractions and technical setbacks unrelated to the raytracing itself, I’ve only gotten one small game working with these techniques.  One of the distractions is that I implemented the game using google’s native client (NaCl) and for some reason so far unknown to me, it doesn’t work on some people’s machines which is very annoying!  I plan to get a mac mini in the next couple months to try and repro / solve the problem.

Check out the game if you want to.  Open this link in google chrome to give it a play:

https://chrome.google.com/webstore/detail/kknobginfngkgcagcfhfpnnlflhkdmol?utm_source=chrome-ntp-icon

The sections of this article are:

  1. Limitations
  2. Geometry Caching
  3. Pixel Caching
  4. Future Work
  5. Compatible Game Ideas

Limitations

Admittedly, these techniques have some pretty big limitations.  I’ve explained my techniques to quite a few fellow game devs and when I mention the limitations, the first reaction people give me is the same one you are probably going to have, which is “oh… well THAT’S dumb!”.  Usually after explaining it a bit more, people perk up again, realizing that you can still work within the limitations to make some neat things.   So please hold off judgement until checking out the rest of the article! (:

The big fat, unashamed limitations are:

  • The camera can’t move (*)
  • Objects in the scene shouldn’t move too much, at least not all at the same time

(* there is a possible exception to the camera not moving limitation in the “Future Work” section)

So…. what the heck kind of games can you make with that?  We’ll get to that later on, but here’s some things these techniques are good at:

  • Changing light color and intensity is relatively inexpensive
  • Changing object color and animating textures is inexpensive
  • These techniques don’t break the parallel-izable nature of raytracing.  Use all those CPU and GPU cores to your heart’s content!

Seems a bit dodgy I’m sure, but read on.

Geometry Caching

The first technique is geometry caching.

The idea behind geometry caching is:  If no objects have moved since the last frame, why should we test which objects each ray hits?  It’s a costly part of the ray tracing, and we already KNOW that we are going to get the same results as last frame, so why even bother?  Let’s just use the info we calculated last frame instead.

Also, if some objects HAVE moved, but we know that the moving objects don’t affect all rays, we can just recalculate the rays that have been affected, without needing to recalculate all rays.

Just because we know the collision points for rays doesn’t mean that we can just skip rendering all together though.  Several things that can make us still need to re-render a ray include:  Animating textures, objects changing colors, lights dimming, lights changing color.  When these things happen, we can re-render a ray much less expensively than normal (just recalculate lighting and shading and such), so they are comparatively inexpensive operations compared to objects actually moving around.

How I handle geometry caching is give each ray (primary and otherwise) a unique ID, and I have a dynamic array that holds the collision info for each ID.

In the part of the code that actually casts a single ray, i pass the ID and a flag saying whether it’s allowed to use the geometry cache.  If it isn’t allowed to use the geometry cache, or there is no entry in the cache for the ID, the code calculates intersection info and puts it into the geometry cache.

It then uses the geometry cache information (whether it was re-calculated, or was usable as is) and applies phong shading, does texture lookups, recurses for ray refraction and reflection, and does the other things to figure out the color of the pixel.

In my first implementation of the geometry cache, it was very fast to render with once it was filled in, but it was really expensive to invalidate individual cache items.  If an object moved and a couple hundred geometry cache items needed to be marked as dirty, it was a really computationally expensive operation!

A better option, the one i use now, involves both a 2d grid (for the screen pixels) and a 3d grid (to hold the geometry of the world).

Breaking the screen into a grid, when each ray is cast into the world, I’m able to tell the ray what screen cell it belongs to.   This way, as a ray traverses the 3d grid holding the world geometry, it’s able to add itself each world grid maintains to keep track of which rays pass through that 3d cell (keeping just the unique values of course!).  Child rays know what screen cell they are in by getting that value from their parent.

If an object moves in the world, you can make a union of which world cells it occupied before it moved, and which world cells it occupies after the move.  From there, you can make a union of which screen cells sent rays into that world cell.  The last step is to mark all those screen cells as “geometry dirty” so that next frame, the rays in those cells are disallowed from using the geometry cache data, and instead will re-calculate new intersection info.

This method makes it so potentially a lot of rays re-calcuate their intersection data that don’t really need to, but by tuning the size of the screen and world grids, you can find a good happy medium for your use cases.

If you have an idea to better maintain the geometry cache, feel free to post a comment about it!

Pixel Caching

The second technique is pixel caching which is a fancy way of saying “don’t redraw pixels that we don’t have to”.  The less rays you have to cast, the faster your scene will render.

The first challenge to tackle in this problem is how do you know which pixels will be affected when an object changes color?  That is solved by the same mechanism that tells us when geometry cache data is invalidated.

When an object changes color (or has some other non-geometry change), you just get the list of world cells the object resides in, and then get the union of screen cells that sent rays through those world cells.

When you have that list, instead of marking the screen cell “geometry dirty”, you mark it as “pixel dirty”.

When rendering the screen cells, any screen cell that isn’t marked as dirty in either way can be completely skipped.  Rendering it is a no-op because it would be the same pixels as last time! (:

This is the reason why you want to minimize geometry changes (objects moving, rotating, resizing, etc) and if you have to,  rely instead on animating textures, object colors, and lighting colors / intensities.

Future Work

Here’s a smattering of ideas for future work that I think ought to bear fruit:

  • Replace the screen and/or world grid with better performing data structures
  • Pre-compute (a pack time process) the primary rays and subsequent rays of static geometry and don’t store static geometry in the world grid, but store it in something else instead like perhaps a BSP tree.  This way, at run time, if a ray misses all objects in the dynamic geometry world grid, it can just use the info from the pre-computed static geometry, no matter how complex the static geometry is.  If something DOES hit a dynamic object however, you’ll have to test subsequent rays against both the dynamic object world grid, and the data structure holding the info about the static geometry but hopefully it’ll be a net win in general.
  • Investigate to see how to integrate this with photon mapping techniques and data structures.  Photon mapping is essentially ray tracing from the opposite direction (from light to camera, instead of from camera to light).  Going the opposite direction, there are some things it’s really good at – like caustics – which ray tracing alone just isn’t suited for: http://en.wikipedia.org/wiki/Photon_mapping
  • In a real game, some things in the world will be obscured by the UI overlays.  There might be an oportunity in some places to “early out” when rendering a single ray if it was obscured by UI.  It would complicate caching those since an individual ray could remain dirty while the screen cell itself was marked as clean.
  • Orthographic camera:  If the camera is orthographic, that means you could pan the camera without invalidating the pixel and geometry cache.  This would allow the techniques to be used for a side scrolling game, overhead view game, and things of that nature – so long as orthographic projection looks good enough for the needs of the game.  I think if you got creative, it could end up looking pretty nice.
  • Screen space effects: enhance the raytracing visuals with screen space particles and such.  Could also keep a “Z-Buffer” by having a buffer that holds the time each ray took to hit the first object.  This would allow more advanced effects.
  • Interlaced rendering: to halve the rendering time, every frame could render every other horizontal line.  Un-dirtying a screen cell would take 2 frames but this ought to be a pretty straight forward and decent win if losing a little bit of quality is ok.
  • red/blue 3d glasses mode:  This is actually a feature of my snake game but figured i’d call it out.  It works by rendering the scene twice which is costly (each “camera” has it’s own geometry and pixel cache at least).  If keeping the “Z-Buffer” as mentioned above, there might be a way to fake it more cheaply but not sure.

Compatible Game Ideas

Despite the limitations, I’ve been keeping a list of games that would be compatible with these ideas.  Here’s the highlights of that list!

  • Pinball:  Only flippers, and the area around the ball would actually have geometry changes, limiting geometry cache invalidating.  Could do periodic, cycling color / lighting animations on other parts of the board to spice the board up in the “non active” areas.
  • Marble Madness Clone: Using an orthographic camera, to allow camera paning, a player could control a glass or mirrored ball through a maze with dangerous traps and time limits.  Marble Madness had very few moving objects and was more about the static geometry so there’d probably be a lot of mileage here.  You could also have animated textures for pools of acid so that they didn’t impact the geometry cache.
  • Zelda 1 and other overhead view type games: Using ortho camera to allow panning, or in the case of Zelda 1, have each “room” be a static camera.  You’d have to keep re-rendering down somehow by minimizing enemy count, while still making it challenging.  Could be difficult.
  • Metroidvania: side scroller with ortho camera to allow panning.  Could walk behind glass pillars and waterfalls for cool refractive effects.
  • Monkey Island type game: LOTS of static geometry in a game like that which would be nice.
  • Arkanoid type game: static camera, make use of screen space effects for break bricking particles etc
  • Mystery game: Static scenes where you can use a magnifying glass to LITERALLY view things better (magnification due to refraction, just like in real life) to find clues and solve the mystery.  Move from screen to screen to find new clues and find people to talk to, progress the storyline etc.
  • Puzzle Game: could possibly do a traditional “block based” puzzle game like puzzle fighters, tetris, etc.
  • Physics based puzzle game: You set up pieces on a board (only one object moves at once! your active piece!) then press “play”.  Hopefully it’d be something like a ball goes through your contraption which remains mostly motionless and you beat the level if you get the ball in the hole or something.
  • Somehow work optics into gameplay… maybe a puzzle game based on lasers and lights or something
  • Pool and board games: as always, gotta have a chess game with insane, state of the art graphics hehe
  • mini golf: A fixed camera when you are taking your shot, with a minimum of moving objects (windmills, the player, etc).  When you hit the ball, it rolls, and when it stops, the camera teleports to the new location.
  • Security gaurd game:  Have several raytraced viewports which are played up to be security camera feeds.  Could have scenes unfold in one feed at a time to keep screen pixel redraw low.
  • Turn based overhead view game:  Ortho camera for panning, and since it’s turn based, you can probably keep object movement down to one at a time.

Lastly, here’s a video describing this stuff in action.  When you view the video, the orange squares are screen tiles that are completely clean (no rendering required, they are a no-op).  Purple squares are screen tiles that were able to use the geometry cache.   Where you don’t see any squares at all, it had to re-render the screen tile from scratch and wasn’t able to make use of either caching feature.

Feedback and comments welcomed!  I’d be really interested too in hearing if anyone actually uses this for anything or tries to implement in on their own.