Casual Shadertoy Path Tracing 1: Basic Camera, Diffuse, Emissive

Below is a screenshot of the shadertoy that goes with this post. Click to view full size. That shadertoy can be found at:

When you see a photorealistic image that someone says is “ray traced” what they likely mean is that it is “path traced”.

Path tracing can be pretty heavy on math and require a lot of attention to detail, but it can make for some really great images.

Funny thing though. You can throw out a lot of the formalism and still get superb results.

That is the premise of this series of blog posts. The goal is to get you path tracing and to let you play around, have fun, create art, without having to worry about integrals, PDFs and the existential question of graphics: “To pi or not to pi?”. Unfortunately that means that our images may take longer to render, and may not be exactly “correct”, but you can learn the more complicated techniques later when and if you are interested.

Rendering is an artistic and creative endeavor, so lets ignore some of the details for now. We are not making a ground truth renderer, we are just having fun making cool renderings.

Let’s get to it!

Shadertoy is a website that lets you write small programs in a c-like language called GLSL which are ran for every pixel in an image, on your GPU (not CPU). Shadertoy uses WebGL but manages everything except the pixel shader(s) for you.

These programs have very limited input including pixel coordinate, frame number, time, and mouse position, and give only a color as output.

Despite the simplicity, people have done amazing things, like making a playable level of doom 1.

[SH16C] Doom by Paul Malin

… Or make beautiful photo realistic images and videos which render in real time.

Snail by Inigo Quilez

Shadertoy was founded by demosceners which are people that have parties & competitions to make very small executables that generate impressive audio and visuals.

Besides demosceners, is frequented by professional and hobbyist game developers, graphics researchers, people who work in movies, and all sorts of other people as well.

It’s a great place to play and learn because it lets you dive in without a lot of fuss. I have learned so much there it boggles my mind sometimes… all sorts of math tricks, graphics techniques, dual numbers for automatic differentiation, ray marching solid geometry, and I even did real time ray tracing back in 2013. Real time ray tracing may SEEM new to the world, but demo sceners are doing real time ray tracing on things like the comodore 64 and have been doing so for years or decades. The rest of us are just catching up!

In these articles we are going to be doing path tracing in shadertoy because it’s very easy to do so. You should head over there, create an account and look around a bit at some of the shadertoys. Shadertoy works best in chrome or firefox.

We’ll get started when you come back ๐Ÿ™‚

New Shader

To get started, after you log in, click on the “new” button in the upper right to create a new shader. You should be greeted by something that looks like this:

Go ahead and give the shader a name – the names must be globally unique! – as well as at least one tag, and a description. All of those are required so that you can click “Submit” and do the initial save for your shader.

After you have submitted your shader, the submit button will turn into a save button, and you will be good to go onto the next step.

Generating Rays

Path tracing is a type of ray tracing, which means we are going to shoot a ray into the world out of every pixel. For each individual ray, we are going to see what it hits and use that information to give a color to our pixel. Doing that for every pixel independently gives us our final image.

So, the first step to doing path tracing is to calculate the rays for each pixel. A ray has a starting point and a direction, so we are going to need to calculate where the ray starts, and what direction it is going in.

The ray is going to start at the camera location (you could think of this as the eye location), and it’s going to shoot at pixels on an imaginary rectangle in front of the camera, go through those pixels, and go out into the scene to see what they hit.

We are going to make this rectangle of pixels be from -1 to +1 on the x and y axis, and for now we’ll make it be 1 unit away on the z axis. We’ll say the camera is at the origin (0,0,0). Our goal is to calculate the 3d point on this rectangle for each pixel, then calculate the direction of the ray from the origin to that 3d point.

We’ll start by calculating each pixel’s percentage on the x and y axis from 0 to 1. We’ll show that in the red and green channel to make sure we are doing it correctly.

Put this code below in the code box in the right and click the play button (circled red in the screenshot) to see the results.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // calculate pixel coordinates as a percetange of the screen, from 0 to 1 on each axis
    vec2 uv = fragCoord/iResolution.xy;

    // show percentage as red and green
    fragColor = vec4(uv, 0.0f, 1.0f);

We can see from this picture that (0,0) is in the lower left, where the color is black. As the pixels go to the right, they get more red, which shows us that the x axis goes to the right. As the pixels go up, they get more green, which shows us that the y axis goes upward.

Visualizing values like this is a great way to debug shadertoy shaders. The usual graphics debuggers and debugging techniques are not available when doing webgl, or when using shadertoy, so this “printf” style debugging is the best way to see what’s going on if you are having problems.

Now that we have screen percentage from 0 to 1, we want to change it to being from -1 to 1 so that it’s actually just the x and y coordinate on the imaginary pixel rectangle.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);

We can see now that it’s black in the middle of the screen, and going to the right makes the pixels more red while going up makes the pixels more green. However, going left and down doesn’t change the color at all and just leaves it black. This is because colors are clamped to be between 0 and 1 before being displayed, making all negative numbers just be zero.

if you wanted to make sure the negative values were down there, and it really wasn’t just zero, you could visualize the absolute value like this.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;
    // take absolute value of the pixel target so we can verify there are negative values in the lower left.
    pixelTarget2D = abs(pixelTarget2D);

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);

Ok so now we have the x,y location of the ray target on the imaginary pixel rectangle. We can make the z location be 1 since we want it to be 1 unit away, and that gives us a target for our ray. We now want to get a NORMALIZED vector from the camera position (the origin) to this ray target. That gives us our ray direction and we can visualize that ray to verify that it looks somewhat reasonable.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);

    // show the ray direction
    fragColor = vec4(rayDir, 1.0f);

it’s a little subtle, but as you go right from the center of the screen, the pixels get more red and become a purpleish color. As you go up from the center, the pixels get more green and become a little more teal. This shows us that our ray directions seem sensible. As we move right on the x axis, the normalized ray direction also starts deviating from straight ahead and points more to the right, more positively on the x axis. Similarly for the y axis and the green color.

Going left and down from center, the blue gets darker, but no red and green show up due to the x and y axis being negative values again.

If you play around with the z coordinate of the ray target, you can make the ray direction changes more or less obvious due to basically zooming in or out the camera by moving the imaginary pixel rectangle, but we’ll talk more about that in a little bit.

The nearly constant blue color on the screen is due to the positive z value of the ray direction. All of the rays are aiming forward, which is what we’d expect.

We have successfully calculated a ray starting position and direction, and are ready to start shooting rays into the world!

Rendering Geometry

Next we want to use these rays to actually render some geometry. To do this we need functions that will tell us whether a ray intersects a shape or not, and how far down the ray the intersection happens.

We are going to use the functions “TestSphereTrace” and “TestQuadTrace” for ray intersection vs sphere, and ray intersection vs rectangle (quad). Apologies for not having them in a copy/pastable format here in the post. WordPress malfunctions when I try to put them into a code block, but they are in the final shader toy.

There are other ray vs object intersection functions that you can find. I adapted these from Christer Ericson’s book “Real-Time Collision Detection” (, and there a bunch on the net if you google for them. Ray marching can also be used as a numerical method for ray vs object intersection, which lets you have much richer shapes, infinitely repeating objects, and more.

Anyhow, instead of using the ray direction for the color for each pixel, we are going to test the ray against a scene made up of multiple objects, and give a different color to each object. We do that in the GetColorForRay function.

// The minimunm distance a ray must travel before we consider an intersection.
// This is to prevent a ray from intersecting a surface it just bounced off of.
const float c_minimumRayHitTime = 0.1f;

// the farthest we look for ray hits
const float c_superFar = 10000.0f;

struct SRayHitInfo
    float dist;
    vec3 normal;

float ScalarTriple(vec3 u, vec3 v, vec3 w)
    return dot(cross(u, v), w);

bool TestQuadTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec3 a, in vec3 b, in vec3 c, in vec3 d)
// ...

bool TestSphereTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec4 sphere)
// ...

vec3 GetColorForRay(in vec3 rayPos, in vec3 rayDir)
    SRayHitInfo hitInfo;
    hitInfo.dist = c_superFar;
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
        ret = vec3(1.0f, 0.1f, 0.1f);
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
        ret = vec3(0.1f, 1.0f, 0.1f);
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
            ret = vec3(0.7f, 0.7f, 0.7f);
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
        ret = vec3(0.1f, 0.1f, 1.0f);
    return ret;

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);

We see some things on the screen which is nice, but we have a ways to go yet before we are path tracing.

If you look at the code in the final shadertoy, TestSphereTrace and TestQuadTrace only return true if the ray intersects with the shape AND the intersection distance is closer than hitInfo.dist. A common mistake to make when first doing any kind of raytracing (path tracing or other types) is to accept the first thing a ray hits, but you actually want to test all objects and keep whichever hit is closest.

If you only kept the first hit found, not the closest hit, the blue ball would disappear, because the ray test for the grey quad comes before the blue ball, even though it’s farther away.

Correcting Aspect Ratio

You probably noticed that the spheres were stretched in the last image. The reason for that is that our imaginary pixel rectangle is a square (it’s from -1 to +1 on the x and y axis), while the image we are rendering is not a square. That makes the image get stretched horizontally.

The way to fix this is to either grow the imaginary pixel rectangle on the x axis, or shrink it on the y axis, so that the ratio of the width to the height is the same on the imaginary pixel rectangle as it is in our rendered image. I’m going to shrink the y axis. Here is the new mainImage function to use. Lines 10 through 12 were added to fix the problem.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);

Camera Zoom (FOV)

I said before that the imaginary pixel rectangle would be 1 unit away, but if we make that distance smaller, it’s the same as zooming the camera out or increasing the field of view. If we make that distance larger, it’s the same as zooming the camera in or decreasing the field of view.

It turns out that having the pixel rectangle be 1 unit away gives you a field of view of 90 degrees.

The formula for calculating the distance for a specific field of view is on line 7 of the new mainImage function below and is used on line 11.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);

We are going to leave c_FOVDegrees at 90 degrees as we move forward, but here it is at 120 degrees

and here it is at 45 degrees

If you make the FOV too wide, you’ll start getting distortion at the edges. That’s a deep topic that cameras experience too that we won’t go into but I wanted to make sure you knew about that. You can get some real interesting renders by setting the FOV to really large values like 360 or 720 degrees!

Let’s Path Trace!

Now that we can shoot rays into the world, we are finally ready to do some path tracing!

To do shading we’ll need light sources, and our light sources will just be objects that have emissive lighting, aka glowing objects.

For this post we are only going to do diffuse light shading, which means no shiny reflections. We’ll do shiny reflections probably in the very next post in this series, but for now we’ll just do diffuse.

So, objects will have two fields that define their material:

  • a vec3 for emissive, specifying how much they glow in RGB
  • a vec3 for albedo, specifying what color they are under white light

How path tracing works in this setup is pretty simple. We’ll start out a pixel’s color at black, a “throughput” color at white, and we’ll shoot a ray out into the world, obeying the following rules:

  • When a ray hits an object, emissive*throughput is added to the pixel’s color.
  • When a ray hits an object, the throughput is multiplied by the object’s albedo, which affects the color of future emissive lights.
  • When a ray hits an object, a ray will be reflected in a random direction and the ray will continue (more info in a sec)
  • We will terminate when a ray misses all objects, or when N ray bounces have been reached. (N=8 in this shadertoy, but you can tune that value for speed vs correctness)

That is all there is to it!

The concept of “throughput” might seem strange at first but think of these scenarios as they increase in complexity:

  1. A ray hits a white ball, bounces off and hits a white light. The pixel should be white, because the white ball is lit by white light.
  2. A ray hits a red ball, bounces off and hits the same white light. The pixel should be red because red ball is lit by white light.
  3. A ray hits a white ball, bounces off and hits a red ball, bounces off and hits a white light. The pixel should be red because the white ball is lit by the red ball being lit by the white light.

When a ray bounces off a surface, all future lighting for that ray is multiplied by the color of that surface.

These simple rules are all you need to get soft shadows, bounce lighting, ambient occlusion, and all the rest of the cool things you see show up “automagically” in path tracers.

I said that the ray bounces off randomly but there are two different ways to handle this:

  1. Bounce randomly in the positive hemisphere of the normal and then multiply throughput by dot(newRayDir, surfaceNormal). This is the cosine theta term for diffuse lighting.
  2. Or, bounce randomly in a cosine weighted hemisphere direction of the surface normal. This uses importance sampling for the cosine theta term and is the better way to go.

To do #2, all we need to do is get a “random point on sphere” (also known as a random unit vector), add it to the normal, and normalize the result. It’s nice how simple it is.

You are probably wondering how to generate random numbers in a shader. There are many ways to handle it, but here’s how we are going to do it.

First, we are going to initialize a random seed for the current pixel, based on pixel position and frame number, so that each pixel gets different random numbers, and also so that each pixel gets different random numbers each frame.

We can just put this at the top of our mainImage function:

    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);

Then, we can toss these functions above our mainImage function, which take the rngState in, modify it, and generate a random number.

uint wang_hash(inout uint seed)
    seed = uint(seed ^ uint(61)) ^ uint(seed >> uint(16));
    seed *= uint(9);
    seed = seed ^ (seed >> 4);
    seed *= uint(0x27d4eb2d);
    seed = seed ^ (seed >> 15);
    return seed;

float RandomFloat01(inout uint state)
    return float(wang_hash(state)) / 4294967296.0;

vec3 RandomUnitVector(inout uint state)
    float z = RandomFloat01(state) * 2.0f - 1.0f;
    float a = RandomFloat01(state) * c_twopi;
    float r = sqrt(1.0f - z * z);
    float x = r * cos(a);
    float y = r * sin(a);
    return vec3(x, y, z);

To implement the path tracing rules we previously described, we are going to change some things…

  1. Move the ray vs object tests from GetColorForRay into a new function called TestSceneTrace. This lets us separate the scene testing logic from the iterative bounce raytracing we are going to do.
  2. Add albedo and emissive vec3’s to hit info and set both fields whenever we intersect an object. This is the object material data.
  3. Implement iterative raytracing in GetColorForRay as we described it
  4. Add a light to the scene so it isn’t completely dark
void TestSceneTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo hitInfo)
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
            hitInfo.albedo = vec3(0.7f, 0.7f, 0.7f);
            hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
        hitInfo.albedo = vec3(1.0f, 0.1f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
        hitInfo.albedo = vec3(0.1f, 1.0f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
        hitInfo.albedo = vec3(0.1f, 0.1f, 1.0f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 10.0f, 20.0f, 5.0f)))
        hitInfo.albedo = vec3(0.0f, 0.0f, 0.0f);
        hitInfo.emissive = vec3(1.0f, 0.9f, 0.7f) * 100.0f;

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    // return pixel color
    return ret;

When we run the shadertoy we see something like the above, but the specific dots we see move around each frame due to being random over time.

This might not look great but we are really close now. We just need to make pixels show their average value instead of only a single value.

Averaging Pixel Values

To average the pixel values We are going to need to make our shader into a multipass shader.

The first step is to make another pass. Click the + sign next to the “image” tab and select “Buffer A”

Now you should have a Buf A tab next to the image tab.

Move everything from the image tab to the Buf A tab. In the image tab, select “Buffer A” for texture “iChannel0” and put this code in to read and show pixels from Buffer A.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    fragColor = vec4(color, 1.0f);

The next step is to actually average the pixel values.

In the “Buf A” tab, also select “Buffer A” for texture “iChannel0” so that we can read Buffer A’s pixel from last frame, to average into the current frame’s output. Now we just add lines 27, 28, 29 below to average all the pixels together!

void mainImage( out vec4 fragColor, in vec2 fragCoord )
    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir, rngState);
    // average the frames together
    vec3 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    color = mix(lastFrameColor, color, 1.0f / float(iFrame+1));

    // show the result
    fragColor = vec4(color, 1.0f);

After 5 minutes of rendering, the result looks pretty noisy, pixels are too bright and clip, and it isn’t a great scene, but we are in fact path tracing now! You can even see some neat lighting effects, like how you see some red on the wall next to the red ball, due to light reflecting off of it. The spheres are casting some nice looking shadows too.

We can clean up the scene a little bit, make something like a cornell box and get this render after 5 minutes

When Rays Miss

One last feature before we close up this first post in the series.

Right now when a ray misses the scene and flies out into the void, we stop raytracing, which implicitly is saying that the void outside the scene is black. Instead of having our background be black, let’s have it be an environment.

In the “Buf A” tab choose the forest cube map for texture iChannel1 and then add line 19 to GetColorForRay() below.

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
            ret += texture(iChannel1, rayDir).rgb * throughput;
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    // return pixel color
    return ret;

With those changes, here is a 30 second render.


In 30 seconds it looks about as noisy as the previous 5 minute render did. The 5 minute render looks so much worse because the scene has both very dark places and very bright places. The more different the lighting conditions are in different places in a scene, the more noise you will have. Also, smaller brighter lights will cause more noise than larger dimmer lights.

Learning more sophisticated path tracing techniques (like direct light sampling, also known as next event estimation) makes this not be true, but we are aiming for understandability in this path tracer, not convergence speed.

But, you might notice that as your scene is rendering, shadertoy is reporting 60fps, meaning that the scene rendering is being limited by vsync. The rendering would go faster than 60fps if it could, and so would converge faster, but it can’t.

There’s a shadertoy plugin for chrome that you can install to help this, and you can tell it to do up to 64 draws per frame, which really helps it converge more quickly. I’m on a ~5 year old gaming laptop with an nvidia 980m gpu in it, and I got this full screen render in a minute, which has converged pretty nicely! Click on it to view it full screen.

You can get the shadertoy plugin here:

An alternative to getting the shadertoy plugin for chrome is to just shoot N rays out for the pixel each frame, and average them, before putting them into the averaged buffer. That is more efficient than doing the multiple shader passes N times a frame since things are in the memory cache etc when iterating inside the shader. That means your image will converge more in the same amount of time.


The shadertoy that made this image, that contains everything talked about in this post, is at:

Try playing around with the path tracer, making a different scene, and modifying the code for other visual effects.

If you make something, I’d love to see it! You can find me on twitter at @Atrix256

There are going to be 3 or 4 more posts in this series that will be coming pretty quickly, on a variety of topics including depth of field and bokeh, shiny reflections, anti aliasing and fog/smoke. We also are not even sRGB correct yet! These things will improve the image significantly so stay tuned.

Some other awesome resources regarding path tracing are:

  1. Peter Shirley’s “Ray Tracing in One Weekend” (now a free PDF)
  2. Morgan McGuire’s “Graphics Codex”
  3. My more formal blog post on diffuse & emissive path tracing

Path tracing doesn’t only have to be used for realistic rendering though. Below is a stylized path tracing shadertoy I made, in vaporwave style.

“Neon Desert” stylize path tracer shadertoy:

Thanks for reading, part 2 coming soon!

Using Blue Noise For Raytraced Soft Shadows

Make sure and click the images in this post and view them at full size. Noise (especially blue noise) tends to disappear in smaller images.

There are 3 shadertoys that go with this post where you can see the techniques in action and see the source code of the implementations.

To go along with the blue noise ray marched fog and light shafts of the last post (Ray Marching Fog With Blue Noise), another fun usage case of blue noise is in raytracing soft shadows.

By soft shadows i mean the kinds of shadows you get from area lights, not the hard edged shadows you get from point lights and directional lights.

Soft shadows have a penumbra – the soft edges where it transitions from being fully in shadow to fully lit, like in the image below.

If you want to know more about why shadows work that way, give this post of mine a read: Why Are Some Shadows Soft And Other Shadows Hard?

So how do we use blue noise for this? We’re going to start with spherical directional lights first, then show how to extend it to spherical positional lights, and then to spherical spot lights.

Spherical Directional Light : White Noise


A directional light is a light that is shining from a specific direction, regardless of where you are at in the world.

These lights simulate light sources that are so far away, that any movement (translation) you do doesn’t measurably change your relative position to the light source because the sizes and distances involved are gigantic.

We are basically talking about the sun here.

Ok, so raytracing a shadow for a directional light like this just involves shooting a ray along the negative light direction (towards the sun) and seeing if it hits any geometry (the world, blocking the sun light). If it hits anything, the origin of the ray is in shadow. If it doesn’t hit anything, the origin of the ray is lit. The answer is binary and so the shadow edge is hard. There is no penumbra.

The problem with this model of lighting is that it pretends that the sun is an infinitely tiny point in the sky, when the sun is actually a circle, at least from our perspective. It is actually a sphere (or close enough to a sphere), but the “solid angle” 2d projection of it is a circle.

So, to have more realistic lighting and shadows, instead of shooting a ray at a single tiny point in space, we need to see if the ray origin can see the circle that is the spherical lighting source, and even more so, we need to know what percentage of the circle it can see.

The mathy explanation is we need to integrate visibility over the circle, but since our scene isn’t likely to be a closed form equation, we are going to need to resort to numerical methods and sample over the circle domain to get the shadow term.

The less mathy explanation is that we want to shoot rays at a couple places on that circle at random, and see what percentage of those rays were able to see the circle. We multiply the lighting by that percentage, and we automatically get soft shadows.

Ok so now that we know what we need to do, how do we do it?

So first up, we need to generate a “uniform point on a circle”, by which i mean generate a point where every point in the circle is equally likely to be chosen. If we have 2 uniform random numbers from 0 to 1 we can do that with the GLSL code below, assuming rng is a vec2 with two uniform random numbers in it. Any decent shader hash function or shader rng function should work for making those numbers, but i like using the “Hash without sine” functions from this shadertoy

float pointRadius = c_lightRadius * sqrt(rng.x);
float pointAngle = rng.y * 2.0f * c_pi;
diskPoint = vec2(pointRadius*cos(pointAngle), pointRadius*sin(pointAngle));

Note c_lightRadius in the above. That value is the perceptual size of the circle in the sky. Like if you saw a photograph of the sky, this would be the radius of the circle of the sun if that photograph was 1 world unit away (the radius is in world space units under those strange viewing conditions). In the shadertoy demo I use a value of 0.1.

So now we have a uniform random point on a circle but what do we do with it? Well, to make it an actual point on the sun, we need a tangent basis so we have an X, Y and Z coordinate system to put this circle into.

Here is some GLSL to do that.

vec3 lightTangent = normalize(cross(c_lightDir, vec3(0.0f, 1.0f, 0.0f)));
vec3 lightBitangent = normalize(cross(lightTangent, c_lightDir));

This code assumes that the light direction is not ever going to point straight up. You could put in code to use a different vector if the light direction was ever too close to straight up if you want to fix that assumption. Also, c_lightDir is assumed to be normalized.

So now that we have a uniform random point on a circle, and we have a tangent basis aka coordinate system for that circle, it’s time to make a world space target for the ray to shoot at, so we can get a ray direction.

This part is pretty simple. We just pretend that the circle is 1 unit away and use the coordinate axes to convert the point on the circle into a world space point. From there, we subtract the ray’s position from the ray target position and normalize that vector, to get the ray direction.

vec3 rayTarget = rayPos + c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent;
vec3 shadowRayDir = normalize(rayTarget - rayPos);

Looking at that code, you might notice we are adding in the ray position, just to subtract it out again. We can skip a step and do this instead of the code above:

vec3 shadowRayDir = normalize(c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent);

If you do this, you do in fact get a penumbra. This is what made the image labeled “white noise raw” in the image at the top of this section, using 16 shadow rays per pixel.

Quick note about perf: 16 shadow rays per pixel is kind of a lot. One way to help perf would be to say “if the first 4 rays all agree as to whether we are in shadow or not, just return that answer without doing the 12 other rays”. That would make this code cheaper everywhere except when in the penumbra. Another way to help perf would be to do shadows at a lower resolution. Another way would be to use fewer rays, but filter spatially and/or temporally (over time) to hide that fact. I’ve also heard of people making low resolution shadow maps and skipping the raytracing for pixels very clearly no where near a penumbra.

Spherical Directional Light : Blue Noise Over Space, Low Discrepancy Over Time

Being the blue noise zealot I am, white noise has a strong stink to me, and i know that if white noise is being used, almost certainly better results can be had by either using blue noise, or low discrepancy sequences instead.

We are actually going to use blue noise in 2 different ways to improve the shadows. The first way is that instead of using white noise (uncorrelated) uniform random numbers on the circle, we are going to use blue noise (negatively correlated) uniform random numbers on the circle.

White noise clumps together making semi redundant samples, and also leaves big holes between samples, making for larger unknown areas. Blue noise, on the other hand, is randomized, but roughly evenly spaced samples. (More info: What the heck is blue noise?)

I have some blue noise in a circle points i generated by using mitchell’s best candidate algorithm, but when generating candidates, i made sure they were all in a circle before doing the rest of the logic (aka i used rejection sampling inside of MBC). I also made it not calculate distances toroidally, since wrap around doesn’t make sense in this context. The points are also generated in [-1,1] instead of the usual [0,1].

For details of Mitchell’s best candidate algorithm check out my blog post: Generating Blue Noise Sample Points With Mitchellโ€™s Best Candidate Algorithm

Below is that list of points I generated. These blue noise points are progressive, meaning that if you use the first N, regardless of what N is, you will have blue noise. So, this supports up to 64 samples, or any lower number of samples.

const vec2 BlueNoiseInDisk[64] = vec2[64](

Just doing that change, we get this strange result:

The problem is that every pixel is using the same blue noise sample sequence, and there are only 16 samples. That means there are 16 overlapping shadows for each sphere basically. Blue noise has done a good job in that those shadows are pretty different from each other, but the shadows aren’t good enough yet.

Now we need to bring in the second blue noise. We are going to tile a blue noise texture in screen space (you can get one here:, so that we can get a “blue noise random value” per pixel that is between 0 and 1. We are going to multiply that value by 2 * pi and use that as an angle for how to rotate our 2d blue noise samples.

Each pixel will have a different rotation amount, and each pixel will use a rotation that is very different from the rotation of neighboring pixels. This is due to blue noise textures having that property: pixel values are very different from neighbor pixel values.

A nice optimization here is that since all samples are being rotated by the same amount for a pixel, you can calculate the cosine and sine of the angle outside of the loop for shooting shadow rays, and just re-use them inside that loop for rotation.

If we do that, we end up with the image on the right labeled “blue noise raw”. Compare it vs “white noise raw” to see how the noise in the penumbra is way less noticeable. Same number of samples, same amount of computational complexity.

We aren’t done yet though… we have to now consider the axis of time, before we can declare victory.

Cutting right to the case, we are going to add frameNumber * goldenRatio to the blue noise value and fract it to bring it back to the 0 to 1 range. We want to do that before we multiply by 2 * pi to make it an angle.

If we do that, the blue noise value for each pixel becomes a low discrepancy sequence over time. It damages our blue noise over space property a little but it is a net win.

For a deeper discussion about this topic, check out these posts:
Animating Noise For Integration Over Time
Animating Noise For Integration Over Time 2: Uniform Over Time

The short answer to why this is better is that the distribution of a pixel’s value over time for shorter time durations is going to be closer to the actual distribution of the pixel. This is in contrast to white noise which does have the right average over larger sample counts, but for shorter sample counts may clump values together, and leave voids of unseen values.

In short, this makes the pixel more temporally stable, which is great for TAA and other temporal filtering/accumulation methods, but also is nicer even when viewing it without temporal filtering. You can turn on and off the ANIMATE_NOISE define in the shader and look at the “blue noise raw” panel to see how it looks when animated vs not animated without filtering.

Here are the final results, and the link to the shadertoy again.


By the way, noise is way more noticeable when it’s high contrast with the things around it. For instance, here’s the same image as the above again, but the ambient lighting is 10 times brighter (0.5, instead of 0.05). It’s way harder to see the noise, even the white noise!

Spherical Positional Light

With spherical directional lights explained, the hard part is done!

You only need to make 2 modifications to your code to get from spherical directional lights to spherical positional lights.

The first change is that instead of c_lightDir being a constant, wherever you use that vector, you should instead use the normalized vector pointing from the pixel being shaded to the position of the light (the center of the light).

The second change is that c_lightRadius isn’t constant either anymore. Unlike the sun, which doesn’t noticeably change size as you move around the world, spherical positional lights DO noticeably change size. As you get farther away from the light, the 2d projected circle for that light gets smaller. As you get closer, the circle gets larger.

The formula is actually real simple. If c_lightRadius is the radius of the sphere, and dist is the distance from the shaded pixel to the center of the light, the actual radius you should use for the circle when doing the rest of the “spherical directional light” logic is just: c_lightRadius / dist.

Making those changes, you can get this:


Spherical Spot Light

Now that you have a spherical positional light, it’s pretty easy to modify it into a spherical spotlight.

The first thing to do is to re-introduce a light direction. This light direction is the direction that the spot light is shining in. We are going to dot product the vector from the light to the pixel by this light direction to know what angle the pixel is from the light, to know if it’s in the light cone of the spot light or not.

The next thing to do is to define a “cosine theta inner” and a “cosine theta outer”. When the dot product in the last paragraph is less than “cosine theta outer”, then there is no light. If it’s greater than “cosine theta inner” then it’s full light. Between those values we want it to fade from lit to unlit, and for that i like to use smoothstep to do a non linear fade. Here’s some glsl code that does this:

vec3 lightDir = normalize(c_lightPos - hitPos);
float angleAtten = dot(lightDir, -c_lightDir);
angleAtten = smoothstep(c_cosThetaOuter, c_cosThetaInner, angleAtten);
lightColor *= angleAtten;

The rest of the code runs the same as a Spherical Positional Light, and you get a result like this:



This post showed how to use blue noise in two different ways for raytracing soft shadows.

  1. Blue noise sample points in a circle were used to sample a circle representing the light. Using blue noise here results in less error than white noise.
  2. A blue noise texture was used to rotate those 2d sample points per each pixel to make the resulting error be screen space blue noise, which looks better, is harder to detect, and is filtered more easily, compared to white noise, even for the same amount of error.

An important note here is that while blue noise converges at the same rate as white noise (but starts with lower error), low discrepancy sequences can converge quite a bit faster than white noise or blue noise.

If we could use a low discrepancy sequence for sampling the circle in step 1 above and converge so that there was no error, we could be done and wouldn’t need to do step 2.

Unfortunately, low discrepancy sequences have aliasing and so have worse looking error than blue noise before they converge. For the ray sample counts we have in budget right now for real time rendering, we definitely can’t afford to converge, even with an LDS, so that is why we reached for blue noise.

This is important to know, because for non real time rendering usage cases, you are likely better off with an LDS, than blue noise. Blue noise is just a neat trick you can do when you are really strapped for samples (aka can’t afford many rays).

One last thing – it turns out that sampling the projected 2d circle for 3d spherical lights is not quite the correct thing to do. It’s a decent approximation but for more information, check out this link (and note, you can apply blue noise the same way with better spherical light sampling):

Happy raytracing and feel free to hit me up on shadertoy or on twitter if you have any questions @Atrix256

Ray Marching Fog With Blue Noise

The shadertoy that goes with this post is at:

I talk about blue noise a lot more often than I show usage cases of blue noise. Ray marching fog is a great usage case of blue noise that I haven’t shared yet.

Make sure and click the images in this post and view them at their full size to best see the noise. Blue noise tends to melt in thumbnail images, due to the low pass filtering involved in making an image smaller.

Real quick before going into it, here are some other usage cases of blue noise with corresponding shadertoys.

So here’s an algorithm for ray marching fog, which can give some simple scattering effects, like crepuscular lighting aka “god rays”.

  1. Render normally.
  2. Take N steps down the ray for the pixel starting from the camera, going to the depth value for the pixel.
  3. At each step, look in a shadow map to see if that world position is in shadow or not. Calculate the percentage of steps that were in shadow.
  4. Use this percentage to lerp between an “unlit fog color” and a “lit fog color”, use that as the fog color for that pixel.
  5. Use the usual distance fog calculations for that pixel.

Here is an image of doing the algorithm above, doing 256 steps.

If you decrease the step count to 16, the algorithm gets a lot faster, but you get much worse results that look like this:

That looks real bad, but you can break up banding by using noise. When you do the ray marching, you can roll a random number (white noise) for each pixel, and use that 0 to 1 value as a percentage of “one step distance” to push the ray start down the ray. Doing that, you get this result.

That result is better than the banding, but is pretty ugly. Using a tiled screen space blue noise texture as a source of the random numbers instead gives you this result, which is lots better.

As a bonus, here’s a result using Jorge Jimenez’s “Interleaved Gradient Noise” which is designed to work best when used with TAA that uses the “3×3 neighborhood history rejection” method.

Other Notes

There is a shadertoy that shows all this at

The step count used in the ray march defines how many “shades” there are between the lit and unlit fog. For instance, 2 steps will only let the lit percentage be: 0%, 50%, 100%. That only gives 3 shades of fog. In the images above where 16 steps were used, it gives 17 different shades (it’s always numsteps+1). If you drop the count lower, the noise will have fewer shades and be more noticeable in all cases.

If you use PCF sampling of the shadow map (if doing this in a raster setup), the fog light shafts get softer edges, which can look real nice, if you want that look.

If this technique looks interesting to you, you should also give this a read, for making better fake fog:

These still images tell only part of the story. If you animate the noise well over both space AND time, where a single frame has good noise patterns, but each pixel is also blue or low discrepancy over time, you’ll have even better results. The shadertoy has a define for animating the noise ANIMATE_NOISE that you can turn on and off to see the difference. The result is better RAW like the shadertoy shows it, but it’s also better under temporal filtering, because it makes the value temporally converge to something closer to the correct value with less flickering.

The blue noise is animated by adding the golden ratio * the frame number to the blue noise texture value and using fract to bring it back between 0 and 1. Repeatedly adding the golden ratio to any starting value, and fract’ing it, will make a progressive low discrepancy sequence, which is exactly what we want. Doing this process makes each pixel be low discrepancy over time while the blue noise texture makes the pixels be blue noise (randomized low discrepancy) over space. Unfortunately, this damages the blue noise over space a bit, but it is a net win.

Interleaved gradient noise in this shadertoy is animated by scrolling the texture on each axis by 5.588238 pixels each frame. This is a value Jorge Jimenez (the maker of IGN) found through much manual effort, to try and find a scroll amount that made pixels be low discrepancy sequences. He hasn’t published this information but said I was free to share it so long as I gave him credit. Thanks Jorge!

For more info on animating noise, check out these two posts of mine:

The sampling in this setup has four dimensions, but isn’t vanilla four dimensional. It uses blue noise for the 2 dimensions of screen space, it uses low discrepancy sampling for the dimension of time, but it uses regular spaced samples on the dimension of distance down the ray.

I believe there are at least a couple different better ways to sample this setup but I’m not 100% sure what they are, or what the ideal one would be.

The “regular spaced samples over distance” seem like something that would be nice to look at fixing, but it’s also possible that a triangular distributed blue noise could be better for the blue noise over space.

For another rabbit hole to go down, check out this very related presentation slide deck.
“Low Complexity, High Fidelity – INSIDE Rendering”

Guess a Number Between 1 and 10

When you were a kid and had a gumball to share but 2 of your friends wanted the gumball, what would you do?

When I was a kid, and now that I have two kids of my own, a common way to resolve disputes like this is to say “Pick a number between 1 and 10 and whoever is closer gets the gumball”.

This works with any number of people and with any number of rewards because you can say the N closest people out of M people total get a reward. Ties can be broken by repeating the process.

It SEEMS like a purely random choice too, doesn’t it?

Well, if you want to win, the right move is to always guess 5 or 6, actually. It makes sense if you think about how 5.5 is right in the middle, so 5 and 6 are the numbers least far from all the other numbers.

If you are only playing against one other person, and both the real number and the guess are really random (uniform independent random numbers), guessing 5 will let you win 55% of the time, and tie 14% of the time. That means you will lose only 31% of the time, or less than one out of three times!

There is a way to make this game fair again though, and that is by letting the numbers wrap around between 1 and 10 as if they were on a circle instead of a number line.

So, if the right answer is 1, and you guess 10, without wrap around, the distance would be 9. With wrap around, the distance would be 1.

You calculate wrap around distance by saying “if the difference is greater than 5, then the difference is actually 10 minus that number”.

Playing that way, it puts all guesses on equal footing. Every number acts like a 5, because every number is 5 or less away from every other number.

In that scenario, guessing a 5 will cause you to win 41% of the time, tie 18% of the time, and lose 41% of the time.

The game is fair again, as is shown experimentally below.

The code that generated this output is at:

Another way kids have to solve these sorts of situations is through “inka binka” and similar, where there is a song and a deterministic process to pick a winner.

My 5 year old already realized the same position wins every time, which i was pretty proud of hehe.

This sort of stuff is pretty neat though and makes me think people out there must be studying childhood computational sociology ๐Ÿ˜›

Basic Methods For Finding Zeroes and Mins / Maxes of Functions

There is ~500 lines of simple standalone C++ code that goes with this post, that generated all data used and implements the things talked about. You can find it at:

Simple equations can be solved using algebra without too much fuss:

3x-6=0 \\ 3x=6 \\ x=2

You can even solve more complicated equations by cleverly using identities, the quadratic equation, and other mathematical tools you might have in your tool belt.

Some equations are beyond our ability to solve them using simple algebra though, like high degree polynomials:


We still can find where functions like that equal zero, but we can’t do it analytically, we have to do it numerically.

That is, we do things like roll a ball down hill and hope we find a zero.

(Apparently there are some analytical solutions to be had from that function. I don’t know how to get them though, so would have to resort to numerical methods!)

This post talks about a few methods for finding zeroes of equations we can’t solve analytically, and talks about how to use those methods to find minimum and maximum values of functions as well.

The Basic Idea

Let’s say I tell you that the value of a function at a specific point is “3” and that for every step to the right you take, it subtracts 1 from that value.

Then I ask: “Where does this function equal zero?”

If you answered “three steps to the right” you are correct, and guess what – you’ve just used Newton’s method for root finding.

There is an asterisk here though. The slope (how the function value changes as you take a step to the right. Also called the derivative.) was constant in the above example, but is not going to be constant in the functions we want to solve. That slope is going to change depending where you are on the graph.

To deal with this, Newton’s method takes the step it thinks it should take, and then asks again for the value and slope (derivative) and takes another step.

The hope is that with a good initial guess, Newton’s method will find a zero without too much effort.

We’re going to get a little more formal, while also showing some results, but all the rest of what we are going to talk about is based on Newton’s method (or is even simpler).

Newton’s Method

We’ve seen this informally, but formally, a single step of Newton’s method looks like this:

x = x - \frac{f(x)}{f'(x)}

That is, we divide the y value at our current point by the derivative at our current point, and subtract that from our current x to get our new x.

Bullet Points:

  • Requires f(x) and f'(x)
  • Requires a “good guess” for an initial x value
  • Converges Quadratically

More info:’s_method

Secant Method

What if you don’t have the analytical first derivative to your function and still want to use Newton’s Method?

Well, you can use finite differences to get the first derivative: move the x over a little bit, see how the y value changes over distance, and use that as your numerical derivative, instead of the analytical one. (A blog post of mine on finite differences:

If you do that, you are now using the secant Method.

m = \frac{f(x+e)-f(x-e)}{2e}

x = x - \frac{f(x)}{m}

The above uses “central differences” to calculate the first derivative numerically. e is a tuneable parameter, but should be small (without triggering numerical issues) – like perhaps maybe 0.01.

Bullet Points:

  • Requires f(x)
  • Requires a “good guess” for an initial x value
  • Converges at a rate of 1.618… (the golden ratio! ??!) so is slower than Newton’s method, but can be handy if you don’t have the first derivative.

More info:

Halley’s Method (& Householder’s method)

If the first derivative helped us find a zero, surely the second derivative can help us find it even faster, right?

Yep. If you generalize Newton’s method to using the second derivative, you get Halley’s method.

x = x - \frac{2f(x)f'(x)}{2f'(x)^2-f(x)f''(x)}

Bullet Points:

  • Requires f(x), f'(x) and f”(x). You can get the derivatives numerically but it will slow down convergence.
  • Requires a “good guess” for an initial x value
  • Converges cubically (fast!!)

Both Newton and Halley are part of a larger family of methods called “Householder’s Method” which generalizes Newton’s method to any order derivative.

More info:


Bisection is simpler than the other methods. You start with a left x and a right x, with the requirement that the signs of the y values for these x’s have to be different (one positive, one negative) which means that a zero must be between the two x’s.

The algorithm itself is just a binary search, where you look at the value in the middle of the left and right x, and update the left or right x to be the middle, based on the sign of the y value at the middle.

Bullet Points:

  • Requires f(x)
  • Requires a min x and a max x that contains a zero between them, and the y values at those x’s have to have opposite signs
  • Converges Linearly (slow!)

More info:

Experimental Results: y = x^2 – 1

Here’s the first equation: y=x^2-1

You can see visually that there’s a zero at x=-1 and at x=1.

Here’s how the various methods did at finding the roots. the x axis is how many steps were taken and the y axis is the absolute value of y at that step, which also happens to be the error too, since we are finding zeros.

Halley is the winner which is unsurprising with it’s cubic convergence rate, then newton with it’s quadratic convergence rate, then secant with it’s golden ratio convergence rate, and bisect comes in last after a strong start, with it’s linear convergence rate. Our experimental results match the theory, neat!

The values in the legend next to the result name specify what parameters were used. For newton and Halley, the value shown is the initial guess. For secant, x is the initial guess and px is the previous initial guess. For bisect it shows the range given to the bisection algorithm.

Choosing different parameters for those can drastically affect performance of the algorithms. Here are some less well chosen values shown along with the previous results. The previous results are the blue shades.

Bisection is now so bad that it dwarfs everything else. Let’s remove that.

Removing bisection, the first worst data point dropped to 100 from 2500 for all the “more poorly chosen parameters” which are orange, yellow and green. The previous values are the blues and the grey line, which basically look flat.

In these larger values with worse parameters, we are still seeing Halley winning, Newton coming in second, secant coming next, and then bisection, but we are seeing much worse performance. These parameters are important to getting good performance!

Experimental Results: y = sin(x)

Here’s the next equation: y=sin(x)

Here is the convergence for each technique:

You can’t see it, but the orange, grey and blue are all basically overlapping. Looking at the actual data below though, you can see that the winners are in the same order again.

Experimental Results: y = x^2-x-1

This is a fun equation, because a zero is at the golden ratio (and the other is at -1/goldenRatio): y=x^2-x-1

Here is the convergence graph:

An interesting thing happens here if we choose to start at the value “0.5” for Newton and Halley. The dark blue line for Newton is under the grey line for Halley which is flat and doesn’t go up and down. The reason this happens is that the first derivative is 0 at 0.5 so the algorithm doesn’t know which direction to go and gets stuck.

Using other initial guess values causes it not to be stuck, but you can see that newton and secant starting at 0.75 has a pretty big jump in error in the beginning.

Error Case: x^2+1

For the next equation we have: y=x^2+1

This graph has no zeroes. Our bisect bounds also do not have their needs met since one side is supposed to be negative and the other positive, but this graph is positive everywhere. That means we don’t have any results for bisect on this one.

Here’s the convergence graph:

Newton goes absolutely off the rails for a few samples. Lets remove it to see what the others are doing:

Secant goes real bad at the end, so let’s remove it to see that Halley is pretty well behaved despite there not actually being a root to find:

Ray vs Sphere

Let’s do something more interesting… let’s use these methods to find where a ray intersects a sphere. We can do this analytically (there is a formula to calculate this!) but we can use this as a simpler example of a more complex usage case.

We need a formula that is zero where a ray hits a sphere, and we need to be able to calculate it’s first and second derivative analytically ideally.

I fought with this for a bit but came up with a solution.

Here are the variables involved:

  • rayPos and rayDir, both are vec3’s
  • spherePos which is a vec3 and a sphereRadius which is a scalar
  • t – how far down the ray we are, and is a scalar

We are going to make the sphere center be the origin by subtracting it out of the ray position (then we don’t need spherePos anymore), so then all we have to find is where the magnitude of the vector from the origin to the ray position at time t is the sphere radius. To simplify calculating the derivatives, we are going to square both sides of the equation.

f(t) = MagnitudeSquared(rayPos + rayDir * t) - sphereRadius^2

If we can find where that equation is zero, that will tell us the time t that the ray hits the sphere, and we can get the actual intersected position (then normal, etc) from that.

For a 3d vector, Magnitude squared is just the below which is the distance equation without the square root.

MagnitudeSquared(P) = P.x^2+P.y^2+P.z^2

If you expand f(t) out for just the x axis to get the pattern per axis, you get:

rayPos.x^2+2*rayPos.x*rayDir.x*t+rayDir.x^2*t^2 - sphereRadius^2

You would add a similar thing for y and z – but not do the sphere radius part because it’s already handled above.

We can take the derivative of the above with respect to t, to get:


So the first derivative is that equation, but also adding in the y and z axes:

f'(t) = 2*rayPos.x*rayDir.x+2*rayDir.x^2*t + 2*rayPos.y*rayDir.y+2*rayDir.y^2*t + 2*rayPos.z*rayDir.z+2*rayDir.z^2*t

That looks a bit like spaghetti but is way cleaner in code with a loop iteration for each axis ๐Ÿ˜›

We can then take the derivative of that first derivative to get the second derivative (still, with respect to t). That gives us this:

f''(t) = 2*rayDir.x^2 + 2*rayDir.y^2 + 2*rayDir.z^2

Ok cool, now that we have our equations, I used (0,0,0) for the ray position, (0,0,1) for the ray direction, (0.2, 0.3, 5.0) for the sphere position, and a sphere radius of 2.0.

Let’s see how our methods did at finding the intersection time of the ray vs the sphere.

Once again, Halley wins, newton comes in second, then secant, then bisect. Neat!

Finding Function Mins & Maxes

For being one of the headlines of the blog post title, this is actually a really simple thing to explain now that you know the rest.

The minimum and maximum of a function – by definition! – has a derivative (slope) of zero, because it’s right where the function goes flat. The function has either climbed to the top of a mountain, gone flat, and is heading back down, or it has gone to the bottom of a valley, gone flat, and is heading back up. There is actually also a third option called a saddlepoint where it is pausing it’s ascent or descent momentarily and then continues.

In any case, to find the minimum or maximum of a function, we need to find where it’s derivative is zero. We do this by doing root finding on it’s first derivative.

From there, we can plug that x value found into the original f(x) function to get our extrema value.

The code that goes with this blog post uses this technique to find the maximum value for the function y=-x^2+x+1.

It finds that the derivative (y=-2x+1) is zero at x=0.5. Plugging 0.5 into the original function for x gives us a value of 1.25. That does look like the correct x and y value for the maximum of the function, doesn’t it?


The code that goes with this blog post can be found at:

A good 10 minute video on Newton’s Method

Here’s a way to get an analytic derivative of a function without having to do it symbolically. Dual numbers are a form of automatic differentiation, just like backpropagation is.

You can extend this stuff to higher dimensions using the gradient instead of the derivative, and maybe a Jacobian matrix or Hesian matrix for higher derivatives. That might make a good future blog post.

We actually do something similar to Newton’s method when doing sphere tracing (ray marching with a distance estimate). We divide the distance value at a point in space (the y value, or f(x)) by the length of the gradient (which is basically the slope aka generalized derivative) to know that a surface can’t be any closer than that value, so we can step that length in whatever direction our ray wants to move. This also has applications to drawing 2d graphics. You can read about that here:

Using White Noise to Choose Between Red Noise and Blue Noise

There is source code that goes with this post, which generated all the images and did all the tests. You can find it at

I recently saw a really cool video from @Numberphile which mixed some of my very favorite things: Fibonacci numbers (aka the golden ratio), red noise and blue noise.

The link to the video is below and is very much worth watching.
“Random Fibonacci Numbers”

It had me thinking though… they are using a coin flip (white noise) to determine if they should add the last two numbers (red noise / low pass filter) or subtract them (blue noise / high pass filter).

I was curious what the DFT of that would look like, which would show what sort of frequency content that number sequence had.

BTW I have a post talking about dice, probability distributions and noise color here that is a bit related:

White Noise

Just to prime things, here is 100 uniform white noise values on the number line and the DFT of those points. The points clump together and leave holes, and the frequency spectrum shows all frequencies.

Here is the average DFT of 100,000 such point sets. The average flattens out and the grey lines showing 1 standard deviation are flat as well.

Regular Fibonacci Numbers

The first 90 Fibonacci numbers look like the below on the number line:

And here’s their DFT, where 0hz (DC) is in the middle:

Nothing super interesting really.

Randomized Fibonacci Numbers

Here is 90 randomized Fibonacci numbers on the numberline, the dft, and the average DFT of 100,000 such number sets.

It’s interesting to see that the individual randomized Fibonacci has strong low frequency content, but other frequencies too, while the average DFT of the number sets shows only low frequency content.

I think what’s going on here is that since the numbers start out small and grow larger over time, that they will always start out clumped together (red noise), but then depending on the coin flip (white noise) which have different frequency content in each set of numbers, as the numbers grow larger. This means that the only thing common among them is low frequency content, but the rest is just white noise and averages out to be flat.

Maybe not that interesting of a result, but it’s an answer at least ๐Ÿ˜›

Prime Numbers

I got a tweet from Tariq wondering what DFTing the prime numbers would look like.

Here are the first 25 on the numberline, and the DFT:

Here are the first 100:

Here are the first 200:

Here are the first 1000:

I don’t really see much of a pattern, but I guess if someone did, they would have gotten some prize by now? ๐Ÿ™‚

“The next coin flip has to be tails!”

If you saw a fair coin flipped and 10 heads came up in a row, you’d probably either think that the coin was a 2 headed coin, or that the next flip HAD to be tails.

In the code that goes with this post ( check out DoCoinTossTest()), I have a random number generator generate bits until there are 10 ones in a row, and then count how many times the next random bit is a one again.

I do that test 10000 times, and ran that overall test 5 times. Here are the results!

At an infinite number of coin flips, you can expect an even numbers of heads and tails, but until you reach infinity, all bets are off. When a coin has come up heads 10 times in a row, it still has a 50/50 chance of heads in the next coin flip.

That’s the definition of white noise – independent, random events.

Red noise would tend to have similar values – so, heads would clump together, and tails would clump together. This makes more sense with dice, where similar values would be rolled.

Blue noise would tend to have dissimilar values – so heads and tails would specifically NOT clump. And with dice, you’d rarely get the same, or similar values, in 2 dice rolls.

White noise doesn’t care about what came before, and just gives you a new number based on the probability distribution.

Keep this in mind if you ever play roulette!

How Do I Calculate Variance in 1 Pass?

If you google “how do i calculate variance?” you’ll get some nice explanations that say:

  1. Calculate the mean (average) of your numbers
  2. Calculate the average of: each number minus the mean, squared

That’s fine for people trying to just understand how the math works, but if you are calculating variance in a computer program, you might not realize there is a way to do it in a single pass over the data.

That can be significant to the performance and even feasibility of your algorithm!

Here is how you calculate variance in one pass:

  1. Calculate the mean (average) of your numbers
  2. In the same loop, calculate the mean (average) of your numbers squared
  3. After the loop, variance is the absolute value of #2, minus #1 squared

That might look like word salad, so here’s a code snippet.

float Lerp(float a, float b, float t)
    return a * (1.0f - t) + b * t;

float Variance_1Pass(const std::vector & data)
    // get the average (value) and average (value*value)
    float average_value = {};
    float average_valueSquared = {};
    for (size_t index = 0; index < data.size(); ++index)
        float value = data[index];
        average_value = Lerp(average_value, value, 1.0f / float(index + 1));
        average_valueSquared = Lerp(average_valueSquared, value * value, 1.0f / float(index + 1));

    // variance is absolute value of average(value*value) - (average_value*average_value)
    return abs(average_valueSquared - (average_value * average_value));

There is code that goes with this post, that implements it both ways and shows you that they are equivalent. You can find it at:

If you are wondering why I'm using "lerp" to average numbers, check out this post:

It turns out this one pass method can have numerical problems though, so no free lunch. Here is a more numerically robust way to do it, which also allows you to incrementally calculate variance, as numbers come in (Thanks Bart!):

Why might you want to calculate variance?

One reason is if you are analyzing or reporting data, the average value is important to know, but it's also important to know if the numbers were usually pretty close to the average, or if they had lots of spikes above and below the average. You can square root variance to get the standard deviation, which is in the same units as the data you are reporting.

Assuming your data is a Gaussian distribution (due to something called the central limit theorem, a surprising number of things are actually gaussian distributed – like even rolling a bunch of dice and adding them up), 68% of the data points are + or – 1 standard deviation from the mean.

As an example, if the average temperature at your house over a month was 75 degrees Farenheit with a standard deviation of 5 degrees, that means that 68% of the days had a temperature between 70 and 80 degrees.

If the average temperature was still 75 but had a variance of 25 degrees, that means that 68% of the days had a temperature between 50 and 100 degrees. That is quite a difference! Just reporting the average temperature doesn't convey this information the same way as reporting average and standard deviation (or variance) does.

For more info on that, check this out:

Lastly, I mentioned that doing 2 passes to calculate variance could be a deal breaker for an algorithm.

An example of this is a graphics technique called "Variance Shadow Maps" (paper: which ultimately calculates a mean and a variance for how far a group if pixels is away from a light source. When rendering the variance shadow map from the point of the view of the light, each pixel stores the depth, as well as the depth squared. A fun property is that you can blur these values with neighboring pixels without harming the mathematical integrity of the values. The result is soft shadows. (more info on soft shadows:

When lighting a pixel later on and using the shadow map, it knows the pixel's distance from the light source, and can read the two values from the filtered (blurred) shadow map, which allow it to get the mean and variance of the objects in the shadow map (the things that are casting shadows). It then uses something called Chebyshev's inequality to get an estimate for how much in shadow the pixel is.

That is a lot of setup explanation, but if having to do 2 pass variance calculations, instead of the 1 pass that it does do, you'd have to run over the full screen of pixels and do logic for each one (subtract the average pixel value), to calculate the variance. In real time graphics, having to do an extra "full screen pass" can be pretty costly, and can easily put a game over budget, meaning the technique would have to be cut so the rest of the game could render fast enough.

This blog post is here in the hopes that the next time someone googles "how do i calculate variance" for use in a computer program, that they see this post, and implement it as a single pass. Fingers crossed! ๐Ÿ˜›

Using Low Discrepancy Sequences & Blue Noise in Loot Drop Tables for Games

I never thought I’d be much of a stats person but here we are. This post is low on formalism though, so may the gods of formalism have mercy on my soul!

This post includes the result of experiments showing the value of what is talked about, and includes some simple C++ that you can find at:

If you’ve ever played a game that involved grinding for loot, you might have looked online and found the drop rate for a specific item, only to find that if it says it drops one out of 100 times, that it takes you 200-300 runs to get it, while your friends get the drop in the first 10 runs.

What the heck is that about?!

That, my friends, is the nature of regular old random numbers – aka white noise – the kind of random numbers you get from rolling fair dice, flipping fair coins, hashing values using good hash algorithms, or using well made random number generators.

The core issue is that white noise random numbers can take on any valid value with equal probability at any time, regardless of whatever has happened before.

If you were betting on whether a fair coin would come up heads or tails after seeing 10 heads, if you say the next will be tails (because of course it will!) you will still only be right 50% of the time. If you flip the coin an infinite number of times, you will get an even number of heads or tails, but before reaching infinity, all bets are off.

This can be a problem for game designers too. They can collect statistics about how their randomized systems are behaving, analyze the results and come to the conclusion that everything is balanced and even. While that may be true when looking at global averages, the individual player experience may vary wildly and not be balanced at all.

Tangent: This is called variance and is the source of noise in raytraced rendering.

Tangent: There’s a fun related story here about the U.S. air force realizing there is no such thing as an average pilot:

In any case, is this “globally balanced but individually unbalanced” something we have to live with, or is there something we can do about it?

Luckily there is something we can do about it, and we can ensure that individual players have a more balanced, more pleasant, and more controlled experience, without sacrificing randomization.

Enter Low Discrepancy Sequences

A low discrepancy sequence is a sequence of numbers which are neither too close together nor too far apart.

If we put marks evenly spaced on a number line, the sequence of numbers at those marks would have zero discrepancy because they were evenly spaced. Low discrepancy numbers have a low discrepancy value that is greater than zero.

Examples of low discrepancy sequences that you may have heard of are: Sobol, Halton, Van Der Corput.

Some nice links of mine for learning more about low discrepancy sequences are:

Tangent: Going back to the raytraced noise example, regular sampling makes aliasing, and white noise sampling makes noise. Low discrepancy sequences sort of lay somewhere in the middle, gaining the benefits of both worlds, and actually having mystical powers of making numerical integration converge very quickly.

So what do low discrepancy sequences have to do with our problem?

If you use a low discrepancy sequence to generate 5 “random numbers” between 0 and 1, those 5 numbers will be roughly evenly spaced, which means that if you use those numbers on a loot table, the player is going to get a wider spread on the full possibilities of what the loot table has to offer.

If something has a very small percentage to drop, the player still has a low probability to get that drop, but if it’s a 1 in 100 drop, it’s more likely to happen at the 100th drop mark.

This is in constrast to white noise where the values may be clumped together and leave big holes in the 0 to 1 range, like: 0.114, 0.081, 0.093, 0.2, 0.95. There is a huge gap empty between 0.2 and 0.95, which is 75% of the possibilities!

There’s a problem with low discrepancy sequences though: They are deterministic – that is, they are predictable and not random. You get the same values from low discrepancy sequences every time.

Before admitting defeat though, there is another way to get randomization from this even though the sequences themselves are not random: You can shuffle the loot table!

Now, if you have thousands of players on an MMO server rolling numbers against a loot table, you probably just barfed in your mouth a little at my suggestion. There is a way to make a “shuffle iterator” though, so that you can get the benefits of shuffling a loot table, without actually having to keep copies of shuffled loot tables around. You’d use some unique player id (and more) as a random seed for the shuffle iterator, then could use a low discrepancy sequence to select loot. This way, each player would see different (randomized) loot drop sequences, but the loot rolls would still be low discrepancy.

Tangent: you can read more about a way to make shuffle iterators using low quality encryption in “Format Preserving Encryption” here:

But we aren’t done yet…

Enter Randomized Low Discrepancy Sequences

The low discrepancy sequences talked about in the last section were deterministic, but what if we were able to make number sequences that had low discrepancy but were randomized?

That exists, and it’s known as “blue noise” because blue noise is random numbers which have only high frequency components (like how blue light is high frequency).

The property of both being low discrepancy, but also randomized, is great for many reasons way outside the scope of this article. For our loot drop purposes, it means that the loot will be both unpredictable, but also a player will get a balanced personalized experience, instead of only the global averages being balanced.

Tangent: Here’s a link about how to generate a blue noise sequence:

The other shoe dropping is that blue noise can take a long time to generate, so is computationally expensive. In graphics, it’s common to generate blue noise in advance and just use the pre-made sequence. In loot drops, that is a less viable option because it makes your sequence deterministic and then you are back to shuffling the loot table.

Not real sure the answer here, but it may involve just keeping track of the last N loot drops, and using Mitchell’s best candidate algorithm to generate the N+1th value, adding that to the loot drop RNG list and removing the oldest one. If you get creative you might find a solution that fits your needs.

Prove it

Before we show experimental results, I wanted to defined a couple terms in regards to low discrepancy sequences.

  1. Progressive Sequence – A progressive sequence is a sequence of N values, where if you use less than N of the values, they still have the desired properties. For instance, if you make 100 blue noise distributed numbers, but only use the first 10, if it’s a progressive sequence, those first 10 will also be blue. If it isn’t a progressive sequence, you have to use all 100 before they are blue. This is also a property of deterministic low discrepancy sequences. For our loot drop purposes we NEED to use progressive sequences because other wise, the loot drops won’t be balanced until the very end, which kind of defeats the point.
  2. Open Sequence – An open sequence is one that you can always add more items to. If you regularly space 4 samples from 0 to 1 you are going to get 0, 0.25, 0.5, 0.75. If you want to add a 5th number you can’t! That means that this sequence is not open. Many low discrepancy sequences are open, and using Mitchell’s best candidate to generate blue noise does make an open sequence. For loot drops, we generally do want open sequences, because we usually don’t know how many times the player is going to loot something in advance.

The numbers below are from experiments done using the code that goes with this blog post. It’s ~380 lines of C++ in a single file using only standard includes. You can find it at:

I used the following sequences:

  • White Noise – Straight up vanilla RNG.
  • Blue Noise – Using Mitchell’s Best Candidate to generate a progressive, open, uniform blue noise distributed number sequence.
  • Golden Ratio – Starting at 0, i add the golden ratio to the previous loot drop value to get the next loot drop value. I use modulus to keep the value between 0 and 1. The golden ratio has some interesting & good properties as a low discrepancy sequence.
  • Sobol – This is the low discrepancy sequence that leads in numerical integration convergence speeds.

For each sequence type, I generated 10 random loot tables which had 2 to 6 items, each item having a random roll weighting between 1 and 20. I then rolled as many loot drops as i needed to make it so the actual loot drop percentages were within 1% of what the loot drop table said they should be.

Higher numbers mean it took more loot drops for the actual loot dropped to reach the probabilities the loot table said they should be. Lower numbers mean it took fewer loot drops to reach the desired probabilities. I did 10 runs so that you can see if things are consistently good or consistently bad. Just doing a single test isn’t enough to show if something just got lucky one time, or if it’s a good choice.

Here are the results….

  • White Noise: 50513, 7834, 1859, 516, 8824, 3650, 1380, 24461, 35, 12455
  • Blue Noise: 72, 77, 143, 9, 129, 308, 353, 236, 176, 205
  • Golden Ratio: 47, 34, 50, 76, 55, 51, 114, 77, 21, 105
  • Sobol: 216, 155, 161, 77, 13, 71, 56, 75, 127, 51

It’s important to note that the loot tables themselves are generated with white noise, which is a source of variance in the results above. 10 samples isn’t a whole lot, so in a real analysis you might want to do more (100,000 or more runs?) but hopefully you should see that white noise really is not great. I was also surprised to see that Sobol didn’t do that well compared to blue noise and golden ratio. It must just do better at higher dimensions.

One last thing I wanted to mention is that this isn’t limited to loot drop tables. You can use these concepts for randomized events, procedural content generation, and basically anything else you use random numbers for in games.

The important takeaway here is that even if things look right when looking at global averages, using white noise makes the individual experience very different from that global average. Having better control over crafting a player’s individual experience is possible though, and has the possibility of giving a game a more hand crafted feel, even though you are still using RNG.


A Fun 2d Rotation Matrix Derivation

A few weeks ago I joined NVIDIA as part of the graphics dev tech team.

The dev tech team’s scope is pretty wide, but it encompasses nearly all real time graphics programming needs that NVIDIA may have: making code samples for new tech, helping developers optimize their games and make them look prettier, working with graphics researchers, and contributing to graphics research as well.

Sometimes it’s like being a meta engine programmer (making stuff for engine programmers who make stuff for people making games), and other times it’s like working on an early prototype dev kit, where the dev kit is the PC itself.

I’m enjoying it quite a bit so far. It’s a bit surreal to be working along side really respectable folks I’ve seen present at siggraph, or who have written papers I’ve read, but at the same time it feels appropriate because I’m interested in the same areas and am looking forward to mixing in my own contributions and getting to collaborate on cool stuff.

Speaking of which, yes, my first contribution involved blue noise, and I’ll point it out when the thing it’s part of is public!

My new boss is a guy named Rahul, and I could tell things were going to be great, because when we were talking about math, he became visibly dettached from the real world while explaining something. Alternately, he was interested in learning about some obscure topics I had, like dual numbers, or abusing texture sampling to calculate polynomials.

He is the one that showed this logic chain to me.

Act 1 – The 2D Cross Product

So there is this thing that people call the “2d cross product” which is not really a cross product but if you have a 2d vector, it will give you a vector perpendicular to that vector. This is a really valuable tool in the toolbox for game developers, as you are often working in 2d coordinates, even though the world may be 3d, or you can sometimes simplify problems to 2d and bring them back into 3d after solving them.

You calculate the 2d cross product by flipping the x and y components of a vector and flipping the sign of the new x component:

newVec.x = -oldVec.y;
newVec.y = oldVec.x;

You could negate the new y instead of the new x to get the other perpendicular vector (there are 2!) but for this derivation, the way I have it above is important to the result. (I’ll revisit this at the end of the post)

As a simple example, if we take the vector (1,0), flip x and y, and negate the new x, we get (0,1), which is indeed perpendicular.

This works with arbitrary 2d vectors though. Doing it to (1,2) gives you (-2,1) which you can see in the image below is definitely perpendicular.

You can actually express this operation as a matrix:

\begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix}

which you can see results in the same:

\begin{bmatrix}x & y\end{bmatrix} \cdot \begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix} = \begin{bmatrix}-y & x\end{bmatrix}

Act 2 – Complex Numbers

Way back in 2014 I wrote a blog post on how to use imaginary (complex) numbers to do vector rotation:

The key take away relevant here is that if you multiply a vector by the imaginary number i, it’s a 90 degree rotation counter clockwise.

Strangely, that’s exactly what our matrix does too… could this matrix be i? Well, i*i is -1, so let’s see if that happens when we multiply this matrix by itself:

\begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix} \cdot \begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix} = \begin{bmatrix}-1 & 0\\0 & -1\end{bmatrix} = -1 * I

So yeah, that checks out… this matrix, which represents the “2d cross product” is also the imaginary number i, the square root of -1. That’s fun. (the I above is just the identity matrix)

That means that multiplying by this matrix is the same as if we were multiplying a vector by the complex number (0+1i).

We can see this is true by re-doing the example (1,2), and multiplying the values together using FOIL (first, outer, inner, last).

(1 + 2i)*(0+i) = \\ (1*0)+(1*i)+(2i*0)+(2i*i) = \\ 0 + i + 0 -2 = \\ -2 + i = \\ (-2, 1)

Act 3 – Complex Exponentials

Euler’s formula is a way of calculating points on a circle on the complex plane and is given as:

e^{i\theta} = \cos{\theta} + i\sin{\theta}

The value (0+1i) is the just the above formula when theta is 90 degrees, which is the amount of rotation we got when multiplying. We can easily verify that this is 90 degrees by remembering that cosine of 90 is 0, and sine of 90 is 1.

So, let’s replace our multiplication of (0+1i) with the right side of Euler’s formula. This way we can rotate by arbitrary angles, not just 90 degrees.

(x+iy) * (\cos{\theta}+i\sin{\theta}) = \\ (x \cos{\theta}) + (x i\sin{\theta}) + (iy \cos{\theta}) + (iy i\sin{\theta}) = \\ (x \cos{\theta} - y \sin{\theta}) + i(x\sin{\theta}+y\cos{\theta})

From here you can extract it back into matrix form to get:

(x \cos{\theta} - y \sin{\theta}) + i(x\sin{\theta}+y\cos{\theta}) = \\ \begin{bmatrix}x & y\end{bmatrix} \cdot \begin{bmatrix}\cos{\theta} & sin{\theta}\\-\sin{\theta} & \cos{\theta}\end{bmatrix}

Bam, there’s the 2d rotation matrix.

Other Fun Matrices

This post showed the matrix form of the imaginary number i, where i*i=-1.

\begin{bmatrix}0 & 1\\-1 & 0\end{bmatrix}

Dual numbers are another fun type of number where there is an \epsilon that is not zero, but \epsilon \cdot \epsilon is zero.

Dual numbers let you do forward mode automatic differentiation, and you can read some more about them at

A matrix form for them is:

\begin{bmatrix}0 & 1\\0 & 0\end{bmatrix}

Lastly are hyperbolic numbers (or split complex numbers), where you have an i that is not 1, but i*i is 1. I’m not sure what they are useful for but you can read more here:

A matrix form for them is:

\begin{bmatrix}-1 & 0\\0 & 1\end{bmatrix}

By the way, when I introduced the 2d cross product i said flip x and y and negate the new x, but said the other way is possible as well. If you do it the other way, that operation is multiplying by -i, instead of multiplying by i. It isn’t a special case that breaks any of the things in this post, it’s just a different value. It’s a rotation by -90 degrees.

IIR Audio & Data Filters – Featuring Biquads

The last post talked about finite impulse response filters (FIRs) where the output of the filter was based on the current and previous input samples. (

In this post we are going to talk about infinite impulse response filters (IIRs) where the output of the filter is based not only on current and previous input samples, but also previous output samples.

This seemingly minor change makes for much more powerful and interesting filtering possibilities, but as it isn’t stateless, that means it must be evaluated serially (no SIMD!), and so is more computationally expensive.

The simple standalone C++ code that goes with this post is at:

The interactive web demo that goes with this post is at:

IIR Difference Equation

So let’s start with an order 2 FIR difference equation:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

Now let’s say we want the difference equation to include the previous two output samples too. We can just complete the pattern, by including some previous y terms with coefficient multipliers on the left side of the equation.

y_n + b_1y_{n-1} + b_2y_{n-2} = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

We can then move everything but y_n to the right side of the equation to get a function that gives us our current filtered output:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

You might be wondering why there is no b_0 term and the reason for that is because it would be a multiplier for y_n which is weird. Do we really need to scale the output like that? Sometimes people will include the b_0 term, and will divide both sides of the equation by b_0 to get an equation like this:

y_n = \frac{1}{b_0}(a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2})

Let’s just pretend that if b_0 exists, it’s value is always 1, and then we can move on without it actually being there, complicating our equations needlessly.

So, to repeat it, here is a difference equation for an order 2 IIR filter, which is built on an order 2 FIR filter.

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

You can pull out the a_0 parameter as a gain parameter again if you want to, but the b parameters don’t get the same sort of benefit, so you can leave them in their raw form.

y_n = a_0(x_n + \alpha_1x_{n-1} + \alpha_2x_{n-2}) - b_1y_{n-1} - b_2y_{n-2}

IIR Transfer Function

To calculate the transfer function, lets start back from where we added the previous outputs on the left side of the equation:

y_n + b_1y_{n-1} + b_2y_{n-2} = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

Next, let’s take the z transform:

y(z) + b_1y(z)z^{-1} + b_2y(z)z^{-2} = a_0x(z) + a_1x(z)z^{-1} + a_2x(z)z^{-2}

We then factor out y(z) and x(z) to get:

y(z)(1 + b_1z^{-1} + b_2z^{-2}) = x(z)(a_0 + a_1z^{-1} + a_2z^{-2})

Since the transfer function is just y(z) divided by x(z) we can do simple algebra to do that now!

\frac{y(z)}{x(z)}(1 + b_1z^{-1} + b_2z^{-2}) = a_0 + a_1z^{-1} + a_2z^{-2} \\ H(z) = \frac{y(z)}{x(z)} = \frac{a_0 + a_1z^{-1} + a_2z^{-2}}{1 + b_1z^{-1} + b_2z^{-2}}

You can factor out the a0 term to be gain if you want, to get a more familiar looking top of the equation:

H(z) = \frac{a_0(1 + \alpha_1z^{-1} + \alpha_2z^{-2})}{1 + b_1z^{-1} + b_2z^{-2}}

From there you can plug in frequencies and see what sort of frequency and phase response you get, just like in the last blog post for FIRs.

You might notice that the transfer function is quadratic in the numerator, and the denominator. This is in fact called a “biquadratic filter” or a “biquad” for short.

Often times higher order filters (like EQs) are made by chaining multiple biquads together. Biquads are a pretty important staple of DSP.

Pole Zero Plot

You might wonder why in the last post we called it a “Pole Zero Plot” when there were zeros but no poles.

IIRs add poles to the pole zero plot and they are where the function shoots to infinity. This happens in a fraction when you divide by zero, so a pole is a place where the denominator of the transfer function is zero.

To make it explicit:

  1. You solve the quadratic equation in the numerator to get the zeros of that quadratic equation, which are also the zeros of the filter.
  2. You solve the quadratic equation in the denominator to get the zeros of that quadratic equation, which are the also the POLES of the filter.

That makes things pretty easy for making a pole zero plot. Calculating the zeros of the filter is the same, and you use the same technique to find the poles.

In the last post, we saw that the zeros of an order 2 FIR filter were at:

z = \frac{-\alpha_1}{2} \pm \frac{\sqrt{\alpha_1^2-4\alpha_2}}{2}

That still works for IIRS too. For poles, all you need to do is replace the alphas with bs:

z = \frac{-b_1}{2} \pm \frac{\sqrt{b_1^2-4b_2}}{2}

Example Filters

Unlike FIRs which are relatively tame and can only make certain frequencies lower, IIRs can cause frequency amplitudes to get much higher and even shoot off to infinity. That makes for some cool sounds by adding distortion to certain frequency notes of an instrument but not others.

Check the links section for the “RBJ cookbook” if you want to get deeper into the filters, but here are a couple interesting filters I found.

This one boosts low frequencies and attenuates high frequencies.

This does the reverse and boosts high frequencies while attenuating low frequencies.

This makes use of both poles and zeros to make a really interesting sounding filter.

It’s also still fun to slowly move the controls back and forth while some simple instrumental loop plays. Filtering white noise is still really interesting too because white noise has all frequency content, which means filtering frequencies out or boosting them up will always affect white noise. That isn’t true of things with simpler frequency components. The extreme of this is the sine wave which only has a single frequency so is unaffected by other frequencies being boosted up or attenuated.

Crafting a Filter By Placing Zeros

Creating a biquad filter from zero and pole locations is pretty simple if you already can make an FIR filter by placing zeros. In fact, that is exactly how you place the zeros.

To place the poles, you do the exact same steps as placing the zeros, but the coefficients you get out are for b0, b1 and b2 instead of a0, a1 and a2.

Estimating Frequency Response From a Pole Zero Plot

Doing this from an FIR involved getting the distance from a point on the unit circle to every zero, multiplying all those distances together as a product and that is the amount the amplitude would be multiplied by.

Adding poles into the mix extends this.

As a second step, you get the distance from the point on the unit circle to every pole. You multiply all those pole distances together as a product and divide the previous product by this amount.

That’s all there is to it, and you should hopefully be able to see that this is why a frequency at the pole would shoot to infinity. The distance is zero so the frequency response shoots to infinity.


If you want to generate sinusoid waves of specific frequencies, you can use IIRs to do that by putting the poles on the unit circle, and leaving the zeros at the origin.

For a filter to be stable (not shoot off to infinity), the poles need to be inside of the unit circle, so putting them on the edge of the unit circle is playing with fire a bit, but the instability is also what makes the oscillator function.

We could talk about the algebra for calculating polynomials in the numerator and denominator of the transfer function to do this, but lets jump to the end and look at the simplified result of what to set the parameters to in the difference equation below:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

The parameters should be:

  • a_0 = 1
  • a_1 = 0
  • a_2 = 0
  • b_1 = 2 cos(\omega)
  • b_2 = 1

Where omega (\omega) is the amount of radians to advance the wave form per sample.

If you plug these into the difference equation, you can simplify it to this:

y_n = x_n - 2 cos(\omega)y_{n-1} - y_{n-2}

These oscillators don’t need input and just want a steady stream of zeros. Because of this, we can remove the input from the equation.

y_n = -2 cos(\omega)y_{n-1} - y_{n-2}

The last thing you need to do however is to give it some initial starting state. If you make it so y[-1] and y[-2] are zero, to calculate y[0] and y[1], the oscillator won’t work correctly.

This is because we need to initialize the state (prior output) to be as if it’s been running all along.

So, you can set y[-1] to be cos(-\omega*1) and y[-2] to be cos(-\omega*2). That will make it so the next sample will be cos(0) which means the wave will start at zero degrees and continue.

You could initialize the state to whatever phase you wanted to, by initializing y[-1] and y[-2] to the prior two values to the desired phase.

As a quick and dumb example, let’s look at a sine wave that advances 180 degrees (pi radians) per sample. That means b1 will be 2, which makes our difference equation:

y_n = -2y_{n-1} - y_{n-2}

We’ll initialize y[-2] to be cos(-2pi) or 1, and we’ll initialize y[-1] to be cos(-pi) or 0.

Following the difference equation starting at y[0] we get…

\begin{array}{|c|c|} \hline \text{y index} & \text{value} \\ \hline -2 & 1 \\ -1 & -1 \\ 0 & 1 \\ 1 & -1 \\ 2 & 1 \\ 3 & -1 \\ 4 & 1 \\ 5 & -1 \\ \hline \end{array}

180 degrees is nyquist, and we can see that it’s doing the right thing of flipping between 1 and -1. It works with less clean numbers too, and the simple c++ code that goes with this post shows that working (

Unfortunately, with less clean numbers, this method will start to drift from reality over time due to floating point imprecision. One way to deal with this is to reset the oscillator after every 360 degrees of advancement.

Nick Appleton (@nickappleton) has an alternate method if you are looking for a cheap oscillator.

First you make two complex numbers:

  • y = 1 + 0i
  • a = e^(i*omega)

Where omega is still the number of degrees the wave advances per sample. Another way to calculate a is: std::polar(1.0f, radiansPerSample)

Then, for each sample you do this:

y = y * a

the resulting complex number will have the cosine value in the real portion, and the sine value in the imaginary portion.

This has better numerical stability, which you can see in the c++ code output too.


Here are some good links where i got info about oscillators.

Click to access ece5655_chap8.pdf

There is a famous document called the “RBJ cookbook” which gives recipes for biquads that act in specific ways. RBJ stands for the author’s name Robert Bristow-Johnson. You can find it attached to the message at the link below!

Marc B Reynolds (@marc_b_reynolds) recently put out an interesting blog post talking about how it’s better to use integers to repeatedly add things (irrational numbers in his case) instead of using floats. There are some good lessons there that apply here as well I bet, probably especially for oscillators.