Casual Shadertoy Path Tracing 1: Basic Camera, Diffuse, Emissive

Below is a screenshot of the shadertoy that goes with this post. Click to view full size. That shadertoy can be found at: https://www.shadertoy.com/view/tsBBWW

When you see a photorealistic image that someone says is “ray traced” what they likely mean is that it is “path traced”.

Path tracing can be pretty heavy on math and require a lot of attention to detail, but it can make for some really great images.

Funny thing though. You can throw out a lot of the formalism and still get superb results.

That is the premise of this series of blog posts. The goal is to get you path tracing and to let you play around, have fun, create art, without having to worry about integrals, PDFs and the existential question of graphics: “To pi or not to pi?”. Unfortunately that means that our images may take longer to render, and may not be exactly “correct”, but you can learn the more complicated techniques later when and if you are interested.

Rendering is an artistic and creative endeavor, so lets ignore some of the details for now. We are not making a ground truth renderer, we are just having fun making cool renderings.

Let’s get to it!

Shadertoy.com

Shadertoy is a website that lets you write small programs in a c-like language called GLSL which are ran for every pixel in an image, on your GPU (not CPU). Shadertoy uses WebGL but manages everything except the pixel shader(s) for you.

These programs have very limited input including pixel coordinate, frame number, time, and mouse position, and give only a color as output.

Despite the simplicity, people have done amazing things, like making a playable level of doom 1.

[SH16C] Doom by Paul Malin https://www.shadertoy.com/view/lldGDr

… Or make beautiful photo realistic images and videos which render in real time.

Snail by Inigo Quilez https://www.shadertoy.com/view/ld3Gz2

Shadertoy was founded by demosceners which are people that have parties & competitions to make very small executables that generate impressive audio and visuals.

Besides demosceners, shadertoy.com is frequented by professional and hobbyist game developers, graphics researchers, people who work in movies, and all sorts of other people as well.

It’s a great place to play and learn because it lets you dive in without a lot of fuss. I have learned so much there it boggles my mind sometimes… all sorts of math tricks, graphics techniques, dual numbers for automatic differentiation, ray marching solid geometry, and I even did real time ray tracing back in 2013. Real time ray tracing may SEEM new to the world, but demo sceners are doing real time ray tracing on things like the comodore 64 and have been doing so for years or decades. The rest of us are just catching up!

In these articles we are going to be doing path tracing in shadertoy because it’s very easy to do so. You should head over there, create an account and look around a bit at some of the shadertoys. Shadertoy works best in chrome or firefox.

We’ll get started when you come back ๐Ÿ™‚

http://shadertoy.com

New Shader

To get started, after you log in, click on the “new” button in the upper right to create a new shader. You should be greeted by something that looks like this:

Go ahead and give the shader a name – the names must be globally unique! – as well as at least one tag, and a description. All of those are required so that you can click “Submit” and do the initial save for your shader.

After you have submitted your shader, the submit button will turn into a save button, and you will be good to go onto the next step.

Generating Rays

Path tracing is a type of ray tracing, which means we are going to shoot a ray into the world out of every pixel. For each individual ray, we are going to see what it hits and use that information to give a color to our pixel. Doing that for every pixel independently gives us our final image.

So, the first step to doing path tracing is to calculate the rays for each pixel. A ray has a starting point and a direction, so we are going to need to calculate where the ray starts, and what direction it is going in.

The ray is going to start at the camera location (you could think of this as the eye location), and it’s going to shoot at pixels on an imaginary rectangle in front of the camera, go through those pixels, and go out into the scene to see what they hit.

We are going to make this rectangle of pixels be from -1 to +1 on the x and y axis, and for now we’ll make it be 1 unit away on the z axis. We’ll say the camera is at the origin (0,0,0). Our goal is to calculate the 3d point on this rectangle for each pixel, then calculate the direction of the ray from the origin to that 3d point.

We’ll start by calculating each pixel’s percentage on the x and y axis from 0 to 1. We’ll show that in the red and green channel to make sure we are doing it correctly.

Put this code below in the code box in the right and click the play button (circled red in the screenshot) to see the results.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate pixel coordinates as a percetange of the screen, from 0 to 1 on each axis
    vec2 uv = fragCoord/iResolution.xy;

    // show percentage as red and green
    fragColor = vec4(uv, 0.0f, 1.0f);
}

We can see from this picture that (0,0) is in the lower left, where the color is black. As the pixels go to the right, they get more red, which shows us that the x axis goes to the right. As the pixels go up, they get more green, which shows us that the y axis goes upward.

Visualizing values like this is a great way to debug shadertoy shaders. The usual graphics debuggers and debugging techniques are not available when doing webgl, or when using shadertoy, so this “printf” style debugging is the best way to see what’s going on if you are having problems.

Now that we have screen percentage from 0 to 1, we want to change it to being from -1 to 1 so that it’s actually just the x and y coordinate on the imaginary pixel rectangle.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);
}

We can see now that it’s black in the middle of the screen, and going to the right makes the pixels more red while going up makes the pixels more green. However, going left and down doesn’t change the color at all and just leaves it black. This is because colors are clamped to be between 0 and 1 before being displayed, making all negative numbers just be zero.

if you wanted to make sure the negative values were down there, and it really wasn’t just zero, you could visualize the absolute value like this.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;
    
    // take absolute value of the pixel target so we can verify there are negative values in the lower left.
    pixelTarget2D = abs(pixelTarget2D);

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);
}

Ok so now we have the x,y location of the ray target on the imaginary pixel rectangle. We can make the z location be 1 since we want it to be 1 unit away, and that gives us a target for our ray. We now want to get a NORMALIZED vector from the camera position (the origin) to this ray target. That gives us our ray direction and we can visualize that ray to verify that it looks somewhat reasonable.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);

    // show the ray direction
    fragColor = vec4(rayDir, 1.0f);
}

it’s a little subtle, but as you go right from the center of the screen, the pixels get more red and become a purpleish color. As you go up from the center, the pixels get more green and become a little more teal. This shows us that our ray directions seem sensible. As we move right on the x axis, the normalized ray direction also starts deviating from straight ahead and points more to the right, more positively on the x axis. Similarly for the y axis and the green color.

Going left and down from center, the blue gets darker, but no red and green show up due to the x and y axis being negative values again.

If you play around with the z coordinate of the ray target, you can make the ray direction changes more or less obvious due to basically zooming in or out the camera by moving the imaginary pixel rectangle, but we’ll talk more about that in a little bit.

The nearly constant blue color on the screen is due to the positive z value of the ray direction. All of the rays are aiming forward, which is what we’d expect.

We have successfully calculated a ray starting position and direction, and are ready to start shooting rays into the world!

Rendering Geometry

Next we want to use these rays to actually render some geometry. To do this we need functions that will tell us whether a ray intersects a shape or not, and how far down the ray the intersection happens.

We are going to use the functions “TestSphereTrace” and “TestQuadTrace” for ray intersection vs sphere, and ray intersection vs rectangle (quad). Apologies for not having them in a copy/pastable format here in the post. WordPress malfunctions when I try to put them into a code block, but they are in the final shader toy.

There are other ray vs object intersection functions that you can find. I adapted these from Christer Ericson’s book “Real-Time Collision Detection” (https://www.amazon.com/Real-Time-Collision-Detection-Interactive-Technology/dp/1558607323), and there a bunch on the net if you google for them. Ray marching can also be used as a numerical method for ray vs object intersection, which lets you have much richer shapes, infinitely repeating objects, and more.

Anyhow, instead of using the ray direction for the color for each pixel, we are going to test the ray against a scene made up of multiple objects, and give a different color to each object. We do that in the GetColorForRay function.

// The minimunm distance a ray must travel before we consider an intersection.
// This is to prevent a ray from intersecting a surface it just bounced off of.
const float c_minimumRayHitTime = 0.1f;

// the farthest we look for ray hits
const float c_superFar = 10000.0f;

struct SRayHitInfo
{
    float dist;
    vec3 normal;
};

float ScalarTriple(vec3 u, vec3 v, vec3 w)
{
    return dot(cross(u, v), w);
}

bool TestQuadTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec3 a, in vec3 b, in vec3 c, in vec3 d)
{
// ...
}

bool TestSphereTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec4 sphere)
{
// ...
}

vec3 GetColorForRay(in vec3 rayPos, in vec3 rayDir)
{
    SRayHitInfo hitInfo;
    hitInfo.dist = c_superFar;
    
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(1.0f, 0.1f, 0.1f);
    } 
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(0.1f, 1.0f, 0.1f);
    }    
    
    {
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
        {
            ret = vec3(0.7f, 0.7f, 0.7f);
        }
	}
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(0.1f, 0.1f, 1.0f);
    }       
    
    return ret;
}

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

We see some things on the screen which is nice, but we have a ways to go yet before we are path tracing.

If you look at the code in the final shadertoy, TestSphereTrace and TestQuadTrace only return true if the ray intersects with the shape AND the intersection distance is closer than hitInfo.dist. A common mistake to make when first doing any kind of raytracing (path tracing or other types) is to accept the first thing a ray hits, but you actually want to test all objects and keep whichever hit is closest.

If you only kept the first hit found, not the closest hit, the blue ball would disappear, because the ray test for the grey quad comes before the blue ball, even though it’s farther away.

Correcting Aspect Ratio

You probably noticed that the spheres were stretched in the last image. The reason for that is that our imaginary pixel rectangle is a square (it’s from -1 to +1 on the x and y axis), while the image we are rendering is not a square. That makes the image get stretched horizontally.

The way to fix this is to either grow the imaginary pixel rectangle on the x axis, or shrink it on the y axis, so that the ratio of the width to the height is the same on the imaginary pixel rectangle as it is in our rendered image. I’m going to shrink the y axis. Here is the new mainImage function to use. Lines 10 through 12 were added to fix the problem.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

Camera Zoom (FOV)

I said before that the imaginary pixel rectangle would be 1 unit away, but if we make that distance smaller, it’s the same as zooming the camera out or increasing the field of view. If we make that distance larger, it’s the same as zooming the camera in or decreasing the field of view.

It turns out that having the pixel rectangle be 1 unit away gives you a field of view of 90 degrees.

The formula for calculating the distance for a specific field of view is on line 7 of the new mainImage function below and is used on line 11.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

We are going to leave c_FOVDegrees at 90 degrees as we move forward, but here it is at 120 degrees

and here it is at 45 degrees

If you make the FOV too wide, you’ll start getting distortion at the edges. That’s a deep topic that cameras experience too that we won’t go into but I wanted to make sure you knew about that. You can get some real interesting renders by setting the FOV to really large values like 360 or 720 degrees!

Let’s Path Trace!

Now that we can shoot rays into the world, we are finally ready to do some path tracing!

To do shading we’ll need light sources, and our light sources will just be objects that have emissive lighting, aka glowing objects.

For this post we are only going to do diffuse light shading, which means no shiny reflections. We’ll do shiny reflections probably in the very next post in this series, but for now we’ll just do diffuse.

So, objects will have two fields that define their material:

  • a vec3 for emissive, specifying how much they glow in RGB
  • a vec3 for albedo, specifying what color they are under white light

How path tracing works in this setup is pretty simple. We’ll start out a pixel’s color at black, a “throughput” color at white, and we’ll shoot a ray out into the world, obeying the following rules:

  • When a ray hits an object, emissive*throughput is added to the pixel’s color.
  • When a ray hits an object, the throughput is multiplied by the object’s albedo, which affects the color of future emissive lights.
  • When a ray hits an object, a ray will be reflected in a random direction and the ray will continue (more info in a sec)
  • We will terminate when a ray misses all objects, or when N ray bounces have been reached. (N=8 in this shadertoy, but you can tune that value for speed vs correctness)

That is all there is to it!

The concept of “throughput” might seem strange at first but think of these scenarios as they increase in complexity:

  1. A ray hits a white ball, bounces off and hits a white light. The pixel should be white, because the white ball is lit by white light.
  2. A ray hits a red ball, bounces off and hits the same white light. The pixel should be red because red ball is lit by white light.
  3. A ray hits a white ball, bounces off and hits a red ball, bounces off and hits a white light. The pixel should be red because the white ball is lit by the red ball being lit by the white light.

When a ray bounces off a surface, all future lighting for that ray is multiplied by the color of that surface.

These simple rules are all you need to get soft shadows, bounce lighting, ambient occlusion, and all the rest of the cool things you see show up “automagically” in path tracers.

I said that the ray bounces off randomly but there are two different ways to handle this:

  1. Bounce randomly in the positive hemisphere of the normal and then multiply throughput by dot(newRayDir, surfaceNormal). This is the cosine theta term for diffuse lighting.
  2. Or, bounce randomly in a cosine weighted hemisphere direction of the surface normal. This uses importance sampling for the cosine theta term and is the better way to go.

To do #2, all we need to do is get a “random point on sphere” (also known as a random unit vector), add it to the normal, and normalize the result. It’s nice how simple it is.

You are probably wondering how to generate random numbers in a shader. There are many ways to handle it, but here’s how we are going to do it.

First, we are going to initialize a random seed for the current pixel, based on pixel position and frame number, so that each pixel gets different random numbers, and also so that each pixel gets different random numbers each frame.

We can just put this at the top of our mainImage function:

    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);

Then, we can toss these functions above our mainImage function, which take the rngState in, modify it, and generate a random number.

uint wang_hash(inout uint seed)
{
    seed = uint(seed ^ uint(61)) ^ uint(seed >> uint(16));
    seed *= uint(9);
    seed = seed ^ (seed >> 4);
    seed *= uint(0x27d4eb2d);
    seed = seed ^ (seed >> 15);
    return seed;
}

float RandomFloat01(inout uint state)
{
    return float(wang_hash(state)) / 4294967296.0;
}

vec3 RandomUnitVector(inout uint state)
{
    float z = RandomFloat01(state) * 2.0f - 1.0f;
    float a = RandomFloat01(state) * c_twopi;
    float r = sqrt(1.0f - z * z);
    float x = r * cos(a);
    float y = r * sin(a);
    return vec3(x, y, z);
}

To implement the path tracing rules we previously described, we are going to change some things…

  1. Move the ray vs object tests from GetColorForRay into a new function called TestSceneTrace. This lets us separate the scene testing logic from the iterative bounce raytracing we are going to do.
  2. Add albedo and emissive vec3’s to hit info and set both fields whenever we intersect an object. This is the object material data.
  3. Implement iterative raytracing in GetColorForRay as we described it
  4. Add a light to the scene so it isn’t completely dark
void TestSceneTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo hitInfo)
{    
    {
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
        {
            hitInfo.albedo = vec3(0.7f, 0.7f, 0.7f);
            hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
        }
	}    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(1.0f, 0.1f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
    } 
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(0.1f, 1.0f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
    }    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(0.1f, 0.1f, 1.0f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
    }           
    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 10.0f, 20.0f, 5.0f)))
    {
        hitInfo.albedo = vec3(0.0f, 0.0f, 0.0f);
        hitInfo.emissive = vec3(1.0f, 0.9f, 0.7f) * 100.0f;
    }         
}

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
{
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
    {
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
            break;
        
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    }
 
    // return pixel color
    return ret;
}

When we run the shadertoy we see something like the above, but the specific dots we see move around each frame due to being random over time.

This might not look great but we are really close now. We just need to make pixels show their average value instead of only a single value.

Averaging Pixel Values

To average the pixel values We are going to need to make our shader into a multipass shader.

The first step is to make another pass. Click the + sign next to the “image” tab and select “Buffer A”

Now you should have a Buf A tab next to the image tab.

Move everything from the image tab to the Buf A tab. In the image tab, select “Buffer A” for texture “iChannel0” and put this code in to read and show pixels from Buffer A.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    fragColor = vec4(color, 1.0f);
}

The next step is to actually average the pixel values.

In the “Buf A” tab, also select “Buffer A” for texture “iChannel0” so that we can read Buffer A’s pixel from last frame, to average into the current frame’s output. Now we just add lines 27, 28, 29 below to average all the pixels together!

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);
    
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir, rngState);
    
    // average the frames together
    vec3 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    color = mix(lastFrameColor, color, 1.0f / float(iFrame+1));

    // show the result
    fragColor = vec4(color, 1.0f);
}

After 5 minutes of rendering, the result looks pretty noisy, pixels are too bright and clip, and it isn’t a great scene, but we are in fact path tracing now! You can even see some neat lighting effects, like how you see some red on the wall next to the red ball, due to light reflecting off of it. The spheres are casting some nice looking shadows too.

We can clean up the scene a little bit, make something like a cornell box and get this render after 5 minutes

When Rays Miss

One last feature before we close up this first post in the series.

Right now when a ray misses the scene and flies out into the void, we stop raytracing, which implicitly is saying that the void outside the scene is black. Instead of having our background be black, let’s have it be an environment.

In the “Buf A” tab choose the forest cube map for texture iChannel1 and then add line 19 to GetColorForRay() below.

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
{
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
    {
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
        {
            ret += texture(iChannel1, rayDir).rgb * throughput;
            break;
        }
        
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    }
 
    // return pixel color
    return ret;
}

With those changes, here is a 30 second render.

Noise

In 30 seconds it looks about as noisy as the previous 5 minute render did. The 5 minute render looks so much worse because the scene has both very dark places and very bright places. The more different the lighting conditions are in different places in a scene, the more noise you will have. Also, smaller brighter lights will cause more noise than larger dimmer lights.

Learning more sophisticated path tracing techniques (like direct light sampling, also known as next event estimation) makes this not be true, but we are aiming for understandability in this path tracer, not convergence speed.

But, you might notice that as your scene is rendering, shadertoy is reporting 60fps, meaning that the scene rendering is being limited by vsync. The rendering would go faster than 60fps if it could, and so would converge faster, but it can’t.

There’s a shadertoy plugin for chrome that you can install to help this, and you can tell it to do up to 64 draws per frame, which really helps it converge more quickly. I’m on a ~5 year old gaming laptop with an nvidia 980m gpu in it, and I got this full screen render in a minute, which has converged pretty nicely! Click on it to view it full screen.

You can get the shadertoy plugin here: https://chrome.google.com/webstore/detail/shadertoy-unofficial-plug/ohicbclhdmkhoabobgppffepcopomhgl

An alternative to getting the shadertoy plugin for chrome is to just shoot N rays out for the pixel each frame, and average them, before putting them into the averaged buffer. That is more efficient than doing the multiple shader passes N times a frame since things are in the memory cache etc when iterating inside the shader. That means your image will converge more in the same amount of time.

Closing

The shadertoy that made this image, that contains everything talked about in this post, is at: https://www.shadertoy.com/view/tsBBWW

Try playing around with the path tracer, making a different scene, and modifying the code for other visual effects.

If you make something, I’d love to see it! You can find me on twitter at @Atrix256

There are going to be 3 or 4 more posts in this series that will be coming pretty quickly, on a variety of topics including depth of field and bokeh, shiny reflections, anti aliasing and fog/smoke. We also are not even sRGB correct yet! These things will improve the image significantly so stay tuned.

Some other awesome resources regarding path tracing are:

  1. Peter Shirley’s “Ray Tracing in One Weekend” (now a free PDF) https://www.realtimerendering.com/raytracing/Ray%20Tracing%20in%20a%20Weekend.pdf
  2. Morgan McGuire’s “Graphics Codex” https://graphicscodex.com/
  3. My more formal blog post on diffuse & emissive path tracing https://blog.demofox.org/2016/09/21/path-tracing-getting-started-with-diffuse-and-emissive/

Path tracing doesn’t only have to be used for realistic rendering though. Below is a stylized path tracing shadertoy I made, in vaporwave style.

“Neon Desert” stylize path tracer shadertoy: https://www.shadertoy.com/view/tdXBW8

Thanks for reading, part 2 coming soon!

Using Blue Noise For Raytraced Soft Shadows

Make sure and click the images in this post and view them at full size. Noise (especially blue noise) tends to disappear in smaller images.

There are 3 shadertoys that go with this post where you can see the techniques in action and see the source code of the implementations.

To go along with the blue noise ray marched fog and light shafts of the last post (Ray Marching Fog With Blue Noise), another fun usage case of blue noise is in raytracing soft shadows.

By soft shadows i mean the kinds of shadows you get from area lights, not the hard edged shadows you get from point lights and directional lights.

Soft shadows have a penumbra – the soft edges where it transitions from being fully in shadow to fully lit, like in the image below.

If you want to know more about why shadows work that way, give this post of mine a read: Why Are Some Shadows Soft And Other Shadows Hard?

So how do we use blue noise for this? We’re going to start with spherical directional lights first, then show how to extend it to spherical positional lights, and then to spherical spot lights.

Spherical Directional Light : White Noise

Shadertoy: https://www.shadertoy.com/view/3sfBWs

A directional light is a light that is shining from a specific direction, regardless of where you are at in the world.

These lights simulate light sources that are so far away, that any movement (translation) you do doesn’t measurably change your relative position to the light source because the sizes and distances involved are gigantic.

We are basically talking about the sun here.

Ok, so raytracing a shadow for a directional light like this just involves shooting a ray along the negative light direction (towards the sun) and seeing if it hits any geometry (the world, blocking the sun light). If it hits anything, the origin of the ray is in shadow. If it doesn’t hit anything, the origin of the ray is lit. The answer is binary and so the shadow edge is hard. There is no penumbra.

The problem with this model of lighting is that it pretends that the sun is an infinitely tiny point in the sky, when the sun is actually a circle, at least from our perspective. It is actually a sphere (or close enough to a sphere), but the “solid angle” 2d projection of it is a circle.

So, to have more realistic lighting and shadows, instead of shooting a ray at a single tiny point in space, we need to see if the ray origin can see the circle that is the spherical lighting source, and even more so, we need to know what percentage of the circle it can see.

The mathy explanation is we need to integrate visibility over the circle, but since our scene isn’t likely to be a closed form equation, we are going to need to resort to numerical methods and sample over the circle domain to get the shadow term.

The less mathy explanation is that we want to shoot rays at a couple places on that circle at random, and see what percentage of those rays were able to see the circle. We multiply the lighting by that percentage, and we automatically get soft shadows.

Ok so now that we know what we need to do, how do we do it?

So first up, we need to generate a “uniform point on a circle”, by which i mean generate a point where every point in the circle is equally likely to be chosen. If we have 2 uniform random numbers from 0 to 1 we can do that with the GLSL code below, assuming rng is a vec2 with two uniform random numbers in it. Any decent shader hash function or shader rng function should work for making those numbers, but i like using the “Hash without sine” functions from this shadertoy https://www.shadertoy.com/view/4djSRW

float pointRadius = c_lightRadius * sqrt(rng.x);
float pointAngle = rng.y * 2.0f * c_pi;
diskPoint = vec2(pointRadius*cos(pointAngle), pointRadius*sin(pointAngle));

Note c_lightRadius in the above. That value is the perceptual size of the circle in the sky. Like if you saw a photograph of the sky, this would be the radius of the circle of the sun if that photograph was 1 world unit away (the radius is in world space units under those strange viewing conditions). In the shadertoy demo I use a value of 0.1.

So now we have a uniform random point on a circle but what do we do with it? Well, to make it an actual point on the sun, we need a tangent basis so we have an X, Y and Z coordinate system to put this circle into.

Here is some GLSL to do that.

vec3 lightTangent = normalize(cross(c_lightDir, vec3(0.0f, 1.0f, 0.0f)));
vec3 lightBitangent = normalize(cross(lightTangent, c_lightDir));

This code assumes that the light direction is not ever going to point straight up. You could put in code to use a different vector if the light direction was ever too close to straight up if you want to fix that assumption. Also, c_lightDir is assumed to be normalized.

So now that we have a uniform random point on a circle, and we have a tangent basis aka coordinate system for that circle, it’s time to make a world space target for the ray to shoot at, so we can get a ray direction.

This part is pretty simple. We just pretend that the circle is 1 unit away and use the coordinate axes to convert the point on the circle into a world space point. From there, we subtract the ray’s position from the ray target position and normalize that vector, to get the ray direction.

vec3 rayTarget = rayPos + c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent;
vec3 shadowRayDir = normalize(rayTarget - rayPos);

Looking at that code, you might notice we are adding in the ray position, just to subtract it out again. We can skip a step and do this instead of the code above:

vec3 shadowRayDir = normalize(c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent);

If you do this, you do in fact get a penumbra. This is what made the image labeled “white noise raw” in the image at the top of this section, using 16 shadow rays per pixel.

Quick note about perf: 16 shadow rays per pixel is kind of a lot. One way to help perf would be to say “if the first 4 rays all agree as to whether we are in shadow or not, just return that answer without doing the 12 other rays”. That would make this code cheaper everywhere except when in the penumbra. Another way to help perf would be to do shadows at a lower resolution. Another way would be to use fewer rays, but filter spatially and/or temporally (over time) to hide that fact. I’ve also heard of people making low resolution shadow maps and skipping the raytracing for pixels very clearly no where near a penumbra.

Spherical Directional Light : Blue Noise Over Space, Low Discrepancy Over Time

Being the blue noise zealot I am, white noise has a strong stink to me, and i know that if white noise is being used, almost certainly better results can be had by either using blue noise, or low discrepancy sequences instead.

We are actually going to use blue noise in 2 different ways to improve the shadows. The first way is that instead of using white noise (uncorrelated) uniform random numbers on the circle, we are going to use blue noise (negatively correlated) uniform random numbers on the circle.

White noise clumps together making semi redundant samples, and also leaves big holes between samples, making for larger unknown areas. Blue noise, on the other hand, is randomized, but roughly evenly spaced samples. (More info: What the heck is blue noise?)

I have some blue noise in a circle points i generated by using mitchell’s best candidate algorithm, but when generating candidates, i made sure they were all in a circle before doing the rest of the logic (aka i used rejection sampling inside of MBC). I also made it not calculate distances toroidally, since wrap around doesn’t make sense in this context. The points are also generated in [-1,1] instead of the usual [0,1].

For details of Mitchell’s best candidate algorithm check out my blog post: Generating Blue Noise Sample Points With Mitchellโ€™s Best Candidate Algorithm

Below is that list of points I generated. These blue noise points are progressive, meaning that if you use the first N, regardless of what N is, you will have blue noise. So, this supports up to 64 samples, or any lower number of samples.

const vec2 BlueNoiseInDisk[64] = vec2[64](
    vec2(0.478712,0.875764),
    vec2(-0.337956,-0.793959),
    vec2(-0.955259,-0.028164),
    vec2(0.864527,0.325689),
    vec2(0.209342,-0.395657),
    vec2(-0.106779,0.672585),
    vec2(0.156213,0.235113),
    vec2(-0.413644,-0.082856),
    vec2(-0.415667,0.323909),
    vec2(0.141896,-0.939980),
    vec2(0.954932,-0.182516),
    vec2(-0.766184,0.410799),
    vec2(-0.434912,-0.458845),
    vec2(0.415242,-0.078724),
    vec2(0.728335,-0.491777),
    vec2(-0.058086,-0.066401),
    vec2(0.202990,0.686837),
    vec2(-0.808362,-0.556402),
    vec2(0.507386,-0.640839),
    vec2(-0.723494,-0.229240),
    vec2(0.489740,0.317826),
    vec2(-0.622663,0.765301),
    vec2(-0.010640,0.929347),
    vec2(0.663146,0.647618),
    vec2(-0.096674,-0.413835),
    vec2(0.525945,-0.321063),
    vec2(-0.122533,0.366019),
    vec2(0.195235,-0.687983),
    vec2(-0.563203,0.098748),
    vec2(0.418563,0.561335),
    vec2(-0.378595,0.800367),
    vec2(0.826922,0.001024),
    vec2(-0.085372,-0.766651),
    vec2(-0.921920,0.183673),
    vec2(-0.590008,-0.721799),
    vec2(0.167751,-0.164393),
    vec2(0.032961,-0.562530),
    vec2(0.632900,-0.107059),
    vec2(-0.464080,0.569669),
    vec2(-0.173676,-0.958758),
    vec2(-0.242648,-0.234303),
    vec2(-0.275362,0.157163),
    vec2(0.382295,-0.795131),
    vec2(0.562955,0.115562),
    vec2(0.190586,0.470121),
    vec2(0.770764,-0.297576),
    vec2(0.237281,0.931050),
    vec2(-0.666642,-0.455871),
    vec2(-0.905649,-0.298379),
    vec2(0.339520,0.157829),
    vec2(0.701438,-0.704100),
    vec2(-0.062758,0.160346),
    vec2(-0.220674,0.957141),
    vec2(0.642692,0.432706),
    vec2(-0.773390,-0.015272),
    vec2(-0.671467,0.246880),
    vec2(0.158051,0.062859),
    vec2(0.806009,0.527232),
    vec2(-0.057620,-0.247071),
    vec2(0.333436,-0.516710),
    vec2(-0.550658,-0.315773),
    vec2(-0.652078,0.589846),
    vec2(0.008818,0.530556),
    vec2(-0.210004,0.519896) 
);

Just doing that change, we get this strange result:

The problem is that every pixel is using the same blue noise sample sequence, and there are only 16 samples. That means there are 16 overlapping shadows for each sphere basically. Blue noise has done a good job in that those shadows are pretty different from each other, but the shadows aren’t good enough yet.

Now we need to bring in the second blue noise. We are going to tile a blue noise texture in screen space (you can get one here: http://momentsingraphics.de/BlueNoise.html), so that we can get a “blue noise random value” per pixel that is between 0 and 1. We are going to multiply that value by 2 * pi and use that as an angle for how to rotate our 2d blue noise samples.

Each pixel will have a different rotation amount, and each pixel will use a rotation that is very different from the rotation of neighboring pixels. This is due to blue noise textures having that property: pixel values are very different from neighbor pixel values.

A nice optimization here is that since all samples are being rotated by the same amount for a pixel, you can calculate the cosine and sine of the angle outside of the loop for shooting shadow rays, and just re-use them inside that loop for rotation.

If we do that, we end up with the image on the right labeled “blue noise raw”. Compare it vs “white noise raw” to see how the noise in the penumbra is way less noticeable. Same number of samples, same amount of computational complexity.

We aren’t done yet though… we have to now consider the axis of time, before we can declare victory.

Cutting right to the case, we are going to add frameNumber * goldenRatio to the blue noise value and fract it to bring it back to the 0 to 1 range. We want to do that before we multiply by 2 * pi to make it an angle.

If we do that, the blue noise value for each pixel becomes a low discrepancy sequence over time. It damages our blue noise over space property a little but it is a net win.

For a deeper discussion about this topic, check out these posts:
Animating Noise For Integration Over Time
Animating Noise For Integration Over Time 2: Uniform Over Time

The short answer to why this is better is that the distribution of a pixel’s value over time for shorter time durations is going to be closer to the actual distribution of the pixel. This is in contrast to white noise which does have the right average over larger sample counts, but for shorter sample counts may clump values together, and leave voids of unseen values.

In short, this makes the pixel more temporally stable, which is great for TAA and other temporal filtering/accumulation methods, but also is nicer even when viewing it without temporal filtering. You can turn on and off the ANIMATE_NOISE define in the shader and look at the “blue noise raw” panel to see how it looks when animated vs not animated without filtering.

Here are the final results, and the link to the shadertoy again.

Shadertoy: https://www.shadertoy.com/view/3sfBWs

By the way, noise is way more noticeable when it’s high contrast with the things around it. For instance, here’s the same image as the above again, but the ambient lighting is 10 times brighter (0.5, instead of 0.05). It’s way harder to see the noise, even the white noise!

Spherical Positional Light

With spherical directional lights explained, the hard part is done!

You only need to make 2 modifications to your code to get from spherical directional lights to spherical positional lights.

The first change is that instead of c_lightDir being a constant, wherever you use that vector, you should instead use the normalized vector pointing from the pixel being shaded to the position of the light (the center of the light).

The second change is that c_lightRadius isn’t constant either anymore. Unlike the sun, which doesn’t noticeably change size as you move around the world, spherical positional lights DO noticeably change size. As you get farther away from the light, the 2d projected circle for that light gets smaller. As you get closer, the circle gets larger.

The formula is actually real simple. If c_lightRadius is the radius of the sphere, and dist is the distance from the shaded pixel to the center of the light, the actual radius you should use for the circle when doing the rest of the “spherical directional light” logic is just: c_lightRadius / dist.

Making those changes, you can get this:

Shadertoy: https://www.shadertoy.com/view/ts2BRh

Spherical Spot Light

Now that you have a spherical positional light, it’s pretty easy to modify it into a spherical spotlight.

The first thing to do is to re-introduce a light direction. This light direction is the direction that the spot light is shining in. We are going to dot product the vector from the light to the pixel by this light direction to know what angle the pixel is from the light, to know if it’s in the light cone of the spot light or not.

The next thing to do is to define a “cosine theta inner” and a “cosine theta outer”. When the dot product in the last paragraph is less than “cosine theta outer”, then there is no light. If it’s greater than “cosine theta inner” then it’s full light. Between those values we want it to fade from lit to unlit, and for that i like to use smoothstep to do a non linear fade. Here’s some glsl code that does this:

vec3 lightDir = normalize(c_lightPos - hitPos);
float angleAtten = dot(lightDir, -c_lightDir);
angleAtten = smoothstep(c_cosThetaOuter, c_cosThetaInner, angleAtten);
lightColor *= angleAtten;

The rest of the code runs the same as a Spherical Positional Light, and you get a result like this:

Shadertoy: https://www.shadertoy.com/view/tsjfRh

Summary

This post showed how to use blue noise in two different ways for raytracing soft shadows.

  1. Blue noise sample points in a circle were used to sample a circle representing the light. Using blue noise here results in less error than white noise.
  2. A blue noise texture was used to rotate those 2d sample points per each pixel to make the resulting error be screen space blue noise, which looks better, is harder to detect, and is filtered more easily, compared to white noise, even for the same amount of error.

An important note here is that while blue noise converges at the same rate as white noise (but starts with lower error), low discrepancy sequences can converge quite a bit faster than white noise or blue noise.

If we could use a low discrepancy sequence for sampling the circle in step 1 above and converge so that there was no error, we could be done and wouldn’t need to do step 2.

Unfortunately, low discrepancy sequences have aliasing and so have worse looking error than blue noise before they converge. For the ray sample counts we have in budget right now for real time rendering, we definitely can’t afford to converge, even with an LDS, so that is why we reached for blue noise.

This is important to know, because for non real time rendering usage cases, you are likely better off with an LDS, than blue noise. Blue noise is just a neat trick you can do when you are really strapped for samples (aka can’t afford many rays).

One last thing – it turns out that sampling the projected 2d circle for 3d spherical lights is not quite the correct thing to do. It’s a decent approximation but for more information, check out this link (and note, you can apply blue noise the same way with better spherical light sampling):

https://schuttejoe.github.io/post/arealightsampling/

Happy raytracing and feel free to hit me up on shadertoy or on twitter if you have any questions @Atrix256

Ray Marching Fog With Blue Noise

The shadertoy that goes with this post is at: https://www.shadertoy.com/view/WsfBDf

I talk about blue noise a lot more often than I show usage cases of blue noise. Ray marching fog is a great usage case of blue noise that I haven’t shared yet.

Make sure and click the images in this post and view them at their full size to best see the noise. Blue noise tends to melt in thumbnail images, due to the low pass filtering involved in making an image smaller.

Real quick before going into it, here are some other usage cases of blue noise with corresponding shadertoys.

So here’s an algorithm for ray marching fog, which can give some simple scattering effects, like crepuscular lighting aka “god rays”.

  1. Render normally.
  2. Take N steps down the ray for the pixel starting from the camera, going to the depth value for the pixel.
  3. At each step, look in a shadow map to see if that world position is in shadow or not. Calculate the percentage of steps that were in shadow.
  4. Use this percentage to lerp between an “unlit fog color” and a “lit fog color”, use that as the fog color for that pixel.
  5. Use the usual distance fog calculations for that pixel.

Here is an image of doing the algorithm above, doing 256 steps.

If you decrease the step count to 16, the algorithm gets a lot faster, but you get much worse results that look like this:

That looks real bad, but you can break up banding by using noise. When you do the ray marching, you can roll a random number (white noise) for each pixel, and use that 0 to 1 value as a percentage of “one step distance” to push the ray start down the ray. Doing that, you get this result.

That result is better than the banding, but is pretty ugly. Using a tiled screen space blue noise texture as a source of the random numbers instead gives you this result, which is lots better.

As a bonus, here’s a result using Jorge Jimenez’s “Interleaved Gradient Noise” which is designed to work best when used with TAA that uses the “3×3 neighborhood history rejection” method.

Other Notes

There is a shadertoy that shows all this at https://www.shadertoy.com/view/WsfBDf

The step count used in the ray march defines how many “shades” there are between the lit and unlit fog. For instance, 2 steps will only let the lit percentage be: 0%, 50%, 100%. That only gives 3 shades of fog. In the images above where 16 steps were used, it gives 17 different shades (it’s always numsteps+1). If you drop the count lower, the noise will have fewer shades and be more noticeable in all cases.

If you use PCF sampling of the shadow map (if doing this in a raster setup), the fog light shafts get softer edges, which can look real nice, if you want that look.

If this technique looks interesting to you, you should also give this a read, for making better fake fog: https://www.iquilezles.org/www/articles/fog/fog.htm

These still images tell only part of the story. If you animate the noise well over both space AND time, where a single frame has good noise patterns, but each pixel is also blue or low discrepancy over time, you’ll have even better results. The shadertoy has a define for animating the noise ANIMATE_NOISE that you can turn on and off to see the difference. The result is better RAW like the shadertoy shows it, but it’s also better under temporal filtering, because it makes the value temporally converge to something closer to the correct value with less flickering.

The blue noise is animated by adding the golden ratio * the frame number to the blue noise texture value and using fract to bring it back between 0 and 1. Repeatedly adding the golden ratio to any starting value, and fract’ing it, will make a progressive low discrepancy sequence, which is exactly what we want. Doing this process makes each pixel be low discrepancy over time while the blue noise texture makes the pixels be blue noise (randomized low discrepancy) over space. Unfortunately, this damages the blue noise over space a bit, but it is a net win.

Interleaved gradient noise in this shadertoy is animated by scrolling the texture on each axis by 5.588238 pixels each frame. This is a value Jorge Jimenez (the maker of IGN) found through much manual effort, to try and find a scroll amount that made pixels be low discrepancy sequences. He hasn’t published this information but said I was free to share it so long as I gave him credit. Thanks Jorge!

For more info on animating noise, check out these two posts of mine:

The sampling in this setup has four dimensions, but isn’t vanilla four dimensional. It uses blue noise for the 2 dimensions of screen space, it uses low discrepancy sampling for the dimension of time, but it uses regular spaced samples on the dimension of distance down the ray.

I believe there are at least a couple different better ways to sample this setup but I’m not 100% sure what they are, or what the ideal one would be.

The “regular spaced samples over distance” seem like something that would be nice to look at fixing, but it’s also possible that a triangular distributed blue noise could be better for the blue noise over space.

For another rabbit hole to go down, check out this very related presentation slide deck.
“Low Complexity, High Fidelity – INSIDE Rendering”
https://www.gdcvault.com/play/1023002/Low-Complexity-High-Fidelity-INSIDE

Guess a Number Between 1 and 10

When you were a kid and had a gumball to share but 2 of your friends wanted the gumball, what would you do?

When I was a kid, and now that I have two kids of my own, a common way to resolve disputes like this is to say “Pick a number between 1 and 10 and whoever is closer gets the gumball”.

This works with any number of people and with any number of rewards because you can say the N closest people out of M people total get a reward. Ties can be broken by repeating the process.

It SEEMS like a purely random choice too, doesn’t it?

Well, if you want to win, the right move is to always guess 5 or 6, actually. It makes sense if you think about how 5.5 is right in the middle, so 5 and 6 are the numbers least far from all the other numbers.

If you are only playing against one other person, and both the real number and the guess are really random (uniform independent random numbers), guessing 5 will let you win 55% of the time, and tie 14% of the time. That means you will lose only 31% of the time, or less than one out of three times!

There is a way to make this game fair again though, and that is by letting the numbers wrap around between 1 and 10 as if they were on a circle instead of a number line.

So, if the right answer is 1, and you guess 10, without wrap around, the distance would be 9. With wrap around, the distance would be 1.

You calculate wrap around distance by saying “if the difference is greater than 5, then the difference is actually 10 minus that number”.

Playing that way, it puts all guesses on equal footing. Every number acts like a 5, because every number is 5 or less away from every other number.

In that scenario, guessing a 5 will cause you to win 41% of the time, tie 18% of the time, and lose 41% of the time.

The game is fair again, as is shown experimentally below.

The code that generated this output is at:
https://github.com/Atrix256/GuessNumberOneTen

Another way kids have to solve these sorts of situations is through “inka binka” and similar, where there is a song and a deterministic process to pick a winner.

My 5 year old already realized the same position wins every time, which i was pretty proud of hehe.

This sort of stuff is pretty neat though and makes me think people out there must be studying childhood computational sociology ๐Ÿ˜›

Basic Methods For Finding Zeroes and Mins / Maxes of Functions

There is ~500 lines of simple standalone C++ code that goes with this post, that generated all data used and implements the things talked about. You can find it at: https://github.com/Atrix256/BasicRootFinding

Simple equations can be solved using algebra without too much fuss:

3x-6=0 \\ 3x=6 \\ x=2

You can even solve more complicated equations by cleverly using identities, the quadratic equation, and other mathematical tools you might have in your tool belt.

Some equations are beyond our ability to solve them using simple algebra though, like high degree polynomials:

3x^5+2x^2-20x+2=0

We still can find where functions like that equal zero, but we can’t do it analytically, we have to do it numerically.

That is, we do things like roll a ball down hill and hope we find a zero.

(Apparently there are some analytical solutions to be had from that function. I don’t know how to get them though, so would have to resort to numerical methods!)

This post talks about a few methods for finding zeroes of equations we can’t solve analytically, and talks about how to use those methods to find minimum and maximum values of functions as well.

The Basic Idea

Let’s say I tell you that the value of a function at a specific point is “3” and that for every step to the right you take, it subtracts 1 from that value.

Then I ask: “Where does this function equal zero?”

If you answered “three steps to the right” you are correct, and guess what – you’ve just used Newton’s method for root finding.

There is an asterisk here though. The slope (how the function value changes as you take a step to the right. Also called the derivative.) was constant in the above example, but is not going to be constant in the functions we want to solve. That slope is going to change depending where you are on the graph.

To deal with this, Newton’s method takes the step it thinks it should take, and then asks again for the value and slope (derivative) and takes another step.

The hope is that with a good initial guess, Newton’s method will find a zero without too much effort.

We’re going to get a little more formal, while also showing some results, but all the rest of what we are going to talk about is based on Newton’s method (or is even simpler).

Newton’s Method

We’ve seen this informally, but formally, a single step of Newton’s method looks like this:

x = x - \frac{f(x)}{f'(x)}

That is, we divide the y value at our current point by the derivative at our current point, and subtract that from our current x to get our new x.

Bullet Points:

  • Requires f(x) and f'(x)
  • Requires a “good guess” for an initial x value
  • Converges Quadratically

More info: https://en.wikipedia.org/wiki/Newton’s_method

Secant Method

What if you don’t have the analytical first derivative to your function and still want to use Newton’s Method?

Well, you can use finite differences to get the first derivative: move the x over a little bit, see how the y value changes over distance, and use that as your numerical derivative, instead of the analytical one. (A blog post of mine on finite differences: https://blog.demofox.org/2015/08/02/finite-differences/)

If you do that, you are now using the secant Method.

m = \frac{f(x+e)-f(x-e)}{2e}

x = x - \frac{f(x)}{m}

The above uses “central differences” to calculate the first derivative numerically. e is a tuneable parameter, but should be small (without triggering numerical issues) – like perhaps maybe 0.01.

Bullet Points:

  • Requires f(x)
  • Requires a “good guess” for an initial x value
  • Converges at a rate of 1.618… (the golden ratio! ??!) so is slower than Newton’s method, but can be handy if you don’t have the first derivative.

More info: https://en.wikipedia.org/wiki/Secant_method

Halley’s Method (& Householder’s method)

If the first derivative helped us find a zero, surely the second derivative can help us find it even faster, right?

Yep. If you generalize Newton’s method to using the second derivative, you get Halley’s method.

x = x - \frac{2f(x)f'(x)}{2f'(x)^2-f(x)f''(x)}

Bullet Points:

  • Requires f(x), f'(x) and f”(x). You can get the derivatives numerically but it will slow down convergence.
  • Requires a “good guess” for an initial x value
  • Converges cubically (fast!!)

Both Newton and Halley are part of a larger family of methods called “Householder’s Method” which generalizes Newton’s method to any order derivative.

More info:
https://en.wikipedia.org/wiki/Halley%27s_method
https://en.wikipedia.org/wiki/Householder%27s_method

Bisection

Bisection is simpler than the other methods. You start with a left x and a right x, with the requirement that the signs of the y values for these x’s have to be different (one positive, one negative) which means that a zero must be between the two x’s.

The algorithm itself is just a binary search, where you look at the value in the middle of the left and right x, and update the left or right x to be the middle, based on the sign of the y value at the middle.

Bullet Points:

  • Requires f(x)
  • Requires a min x and a max x that contains a zero between them, and the y values at those x’s have to have opposite signs
  • Converges Linearly (slow!)

More info:
https://en.wikipedia.org/wiki/Bisection_method

Experimental Results: y = x^2 – 1

Here’s the first equation: y=x^2-1

You can see visually that there’s a zero at x=-1 and at x=1.

Here’s how the various methods did at finding the roots. the x axis is how many steps were taken and the y axis is the absolute value of y at that step, which also happens to be the error too, since we are finding zeros.

Halley is the winner which is unsurprising with it’s cubic convergence rate, then newton with it’s quadratic convergence rate, then secant with it’s golden ratio convergence rate, and bisect comes in last after a strong start, with it’s linear convergence rate. Our experimental results match the theory, neat!

The values in the legend next to the result name specify what parameters were used. For newton and Halley, the value shown is the initial guess. For secant, x is the initial guess and px is the previous initial guess. For bisect it shows the range given to the bisection algorithm.

Choosing different parameters for those can drastically affect performance of the algorithms. Here are some less well chosen values shown along with the previous results. The previous results are the blue shades.

Bisection is now so bad that it dwarfs everything else. Let’s remove that.

Removing bisection, the first worst data point dropped to 100 from 2500 for all the “more poorly chosen parameters” which are orange, yellow and green. The previous values are the blues and the grey line, which basically look flat.

In these larger values with worse parameters, we are still seeing Halley winning, Newton coming in second, secant coming next, and then bisection, but we are seeing much worse performance. These parameters are important to getting good performance!

Experimental Results: y = sin(x)

Here’s the next equation: y=sin(x)

Here is the convergence for each technique:

You can’t see it, but the orange, grey and blue are all basically overlapping. Looking at the actual data below though, you can see that the winners are in the same order again.

Experimental Results: y = x^2-x-1

This is a fun equation, because a zero is at the golden ratio (and the other is at -1/goldenRatio): y=x^2-x-1

Here is the convergence graph:

An interesting thing happens here if we choose to start at the value “0.5” for Newton and Halley. The dark blue line for Newton is under the grey line for Halley which is flat and doesn’t go up and down. The reason this happens is that the first derivative is 0 at 0.5 so the algorithm doesn’t know which direction to go and gets stuck.

Using other initial guess values causes it not to be stuck, but you can see that newton and secant starting at 0.75 has a pretty big jump in error in the beginning.

Error Case: x^2+1

For the next equation we have: y=x^2+1

This graph has no zeroes. Our bisect bounds also do not have their needs met since one side is supposed to be negative and the other positive, but this graph is positive everywhere. That means we don’t have any results for bisect on this one.

Here’s the convergence graph:

Newton goes absolutely off the rails for a few samples. Lets remove it to see what the others are doing:

Secant goes real bad at the end, so let’s remove it to see that Halley is pretty well behaved despite there not actually being a root to find:

Ray vs Sphere

Let’s do something more interesting… let’s use these methods to find where a ray intersects a sphere. We can do this analytically (there is a formula to calculate this!) but we can use this as a simpler example of a more complex usage case.

We need a formula that is zero where a ray hits a sphere, and we need to be able to calculate it’s first and second derivative analytically ideally.

I fought with this for a bit but came up with a solution.

Here are the variables involved:

  • rayPos and rayDir, both are vec3’s
  • spherePos which is a vec3 and a sphereRadius which is a scalar
  • t – how far down the ray we are, and is a scalar

We are going to make the sphere center be the origin by subtracting it out of the ray position (then we don’t need spherePos anymore), so then all we have to find is where the magnitude of the vector from the origin to the ray position at time t is the sphere radius. To simplify calculating the derivatives, we are going to square both sides of the equation.

f(t) = MagnitudeSquared(rayPos + rayDir * t) - sphereRadius^2

If we can find where that equation is zero, that will tell us the time t that the ray hits the sphere, and we can get the actual intersected position (then normal, etc) from that.

For a 3d vector, Magnitude squared is just the below which is the distance equation without the square root.

MagnitudeSquared(P) = P.x^2+P.y^2+P.z^2

If you expand f(t) out for just the x axis to get the pattern per axis, you get:

rayPos.x^2+2*rayPos.x*rayDir.x*t+rayDir.x^2*t^2 - sphereRadius^2

You would add a similar thing for y and z – but not do the sphere radius part because it’s already handled above.

We can take the derivative of the above with respect to t, to get:

2*rayPos.x*rayDir.x+2*rayDir.x^2*t

So the first derivative is that equation, but also adding in the y and z axes:

f'(t) = 2*rayPos.x*rayDir.x+2*rayDir.x^2*t + 2*rayPos.y*rayDir.y+2*rayDir.y^2*t + 2*rayPos.z*rayDir.z+2*rayDir.z^2*t

That looks a bit like spaghetti but is way cleaner in code with a loop iteration for each axis ๐Ÿ˜›

We can then take the derivative of that first derivative to get the second derivative (still, with respect to t). That gives us this:

f''(t) = 2*rayDir.x^2 + 2*rayDir.y^2 + 2*rayDir.z^2

Ok cool, now that we have our equations, I used (0,0,0) for the ray position, (0,0,1) for the ray direction, (0.2, 0.3, 5.0) for the sphere position, and a sphere radius of 2.0.

Let’s see how our methods did at finding the intersection time of the ray vs the sphere.

Once again, Halley wins, newton comes in second, then secant, then bisect. Neat!

Finding Function Mins & Maxes

For being one of the headlines of the blog post title, this is actually a really simple thing to explain now that you know the rest.

The minimum and maximum of a function – by definition! – has a derivative (slope) of zero, because it’s right where the function goes flat. The function has either climbed to the top of a mountain, gone flat, and is heading back down, or it has gone to the bottom of a valley, gone flat, and is heading back up. There is actually also a third option called a saddlepoint where it is pausing it’s ascent or descent momentarily and then continues.

In any case, to find the minimum or maximum of a function, we need to find where it’s derivative is zero. We do this by doing root finding on it’s first derivative.

From there, we can plug that x value found into the original f(x) function to get our extrema value.

The code that goes with this blog post uses this technique to find the maximum value for the function y=-x^2+x+1.

It finds that the derivative (y=-2x+1) is zero at x=0.5. Plugging 0.5 into the original function for x gives us a value of 1.25. That does look like the correct x and y value for the maximum of the function, doesn’t it?

Links

The code that goes with this blog post can be found at: https://github.com/Atrix256/BasicRootFinding

A good 10 minute video on Newton’s Method

Here’s a way to get an analytic derivative of a function without having to do it symbolically. Dual numbers are a form of automatic differentiation, just like backpropagation is.
https://blog.demofox.org/2014/12/30/dual-numbers-automatic-differentiation/

You can extend this stuff to higher dimensions using the gradient instead of the derivative, and maybe a Jacobian matrix or Hesian matrix for higher derivatives. That might make a good future blog post.

We actually do something similar to Newton’s method when doing sphere tracing (ray marching with a distance estimate). We divide the distance value at a point in space (the y value, or f(x)) by the length of the gradient (which is basically the slope aka generalized derivative) to know that a surface can’t be any closer than that value, so we can step that length in whatever direction our ray wants to move. This also has applications to drawing 2d graphics. You can read about that here:
https://www.iquilezles.org/www/articles/distance/distance.htm

Using White Noise to Choose Between Red Noise and Blue Noise

There is source code that goes with this post, which generated all the images and did all the tests. You can find it at https://github.com/Atrix256/DFTRandomFibonacci

I recently saw a really cool video from @Numberphile which mixed some of my very favorite things: Fibonacci numbers (aka the golden ratio), red noise and blue noise.

The link to the video is below and is very much worth watching.
“Random Fibonacci Numbers”

It had me thinking though… they are using a coin flip (white noise) to determine if they should add the last two numbers (red noise / low pass filter) or subtract them (blue noise / high pass filter).

I was curious what the DFT of that would look like, which would show what sort of frequency content that number sequence had.

BTW I have a post talking about dice, probability distributions and noise color here that is a bit related: https://blog.demofox.org/2019/07/30/dice-distributions-noise-colors/

White Noise

Just to prime things, here is 100 uniform white noise values on the number line and the DFT of those points. The points clump together and leave holes, and the frequency spectrum shows all frequencies.

Here is the average DFT of 100,000 such point sets. The average flattens out and the grey lines showing 1 standard deviation are flat as well.

Regular Fibonacci Numbers

The first 90 Fibonacci numbers look like the below on the number line:

And here’s their DFT, where 0hz (DC) is in the middle:

Nothing super interesting really.

Randomized Fibonacci Numbers

Here is 90 randomized Fibonacci numbers on the numberline, the dft, and the average DFT of 100,000 such number sets.

It’s interesting to see that the individual randomized Fibonacci has strong low frequency content, but other frequencies too, while the average DFT of the number sets shows only low frequency content.

I think what’s going on here is that since the numbers start out small and grow larger over time, that they will always start out clumped together (red noise), but then depending on the coin flip (white noise) which have different frequency content in each set of numbers, as the numbers grow larger. This means that the only thing common among them is low frequency content, but the rest is just white noise and averages out to be flat.

Maybe not that interesting of a result, but it’s an answer at least ๐Ÿ˜›

Prime Numbers

I got a tweet from Tariq wondering what DFTing the prime numbers would look like.

Here are the first 25 on the numberline, and the DFT:

Here are the first 100:

Here are the first 200:

Here are the first 1000:

I don’t really see much of a pattern, but I guess if someone did, they would have gotten some prize by now? ๐Ÿ™‚

“The next coin flip has to be tails!”

If you saw a fair coin flipped and 10 heads came up in a row, you’d probably either think that the coin was a 2 headed coin, or that the next flip HAD to be tails.

In the code that goes with this post (https://github.com/Atrix256/DFTRandomFibonacci check out DoCoinTossTest()), I have a random number generator generate bits until there are 10 ones in a row, and then count how many times the next random bit is a one again.

I do that test 10000 times, and ran that overall test 5 times. Here are the results!

At an infinite number of coin flips, you can expect an even numbers of heads and tails, but until you reach infinity, all bets are off. When a coin has come up heads 10 times in a row, it still has a 50/50 chance of heads in the next coin flip.

That’s the definition of white noise – independent, random events.

Red noise would tend to have similar values – so, heads would clump together, and tails would clump together. This makes more sense with dice, where similar values would be rolled.

Blue noise would tend to have dissimilar values – so heads and tails would specifically NOT clump. And with dice, you’d rarely get the same, or similar values, in 2 dice rolls.

White noise doesn’t care about what came before, and just gives you a new number based on the probability distribution.

Keep this in mind if you ever play roulette!

How Do I Calculate Variance in 1 Pass?

If you google “how do i calculate variance?” you’ll get some nice explanations that say:

  1. Calculate the mean (average) of your numbers
  2. Calculate the average of: each number minus the mean, squared

That’s fine for people trying to just understand how the math works, but if you are calculating variance in a computer program, you might not realize there is a way to do it in a single pass over the data.

That can be significant to the performance and even feasibility of your algorithm!

Here is how you calculate variance in one pass:

  1. Calculate the mean (average) of your numbers
  2. In the same loop, calculate the mean (average) of your numbers squared
  3. After the loop, variance is the absolute value of #2, minus #1 squared

That might look like word salad, so here’s a code snippet.

float Lerp(float a, float b, float t)
{
    return a * (1.0f - t) + b * t;
}

float Variance_1Pass(const std::vector & data)
{
    // get the average (value) and average (value*value)
    float average_value = {};
    float average_valueSquared = {};
    for (size_t index = 0; index < data.size(); ++index)
    {
        float value = data[index];
        average_value = Lerp(average_value, value, 1.0f / float(index + 1));
        average_valueSquared = Lerp(average_valueSquared, value * value, 1.0f / float(index + 1));
    }

    // variance is absolute value of average(value*value) - (average_value*average_value)
    return abs(average_valueSquared - (average_value * average_value));
}

There is code that goes with this post, that implements it both ways and shows you that they are equivalent. You can find it at: https://github.com/Atrix256/CalculateVariance1Pass/blob/master/main.cpp

If you are wondering why I'm using "lerp" to average numbers, check out this post: https://blog.demofox.org/2016/08/23/incremental-averaging/

It turns out this one pass method can have numerical problems though, so no free lunch. Here is a more numerically robust way to do it, which also allows you to incrementally calculate variance, as numbers come in (Thanks Bart!): https://www.johndcook.com/blog/standard_deviation/

Why might you want to calculate variance?

One reason is if you are analyzing or reporting data, the average value is important to know, but it's also important to know if the numbers were usually pretty close to the average, or if they had lots of spikes above and below the average. You can square root variance to get the standard deviation, which is in the same units as the data you are reporting.

Assuming your data is a Gaussian distribution (due to something called the central limit theorem, a surprising number of things are actually gaussian distributed – like even rolling a bunch of dice and adding them up), 68% of the data points are + or – 1 standard deviation from the mean.

As an example, if the average temperature at your house over a month was 75 degrees Farenheit with a standard deviation of 5 degrees, that means that 68% of the days had a temperature between 70 and 80 degrees.

If the average temperature was still 75 but had a variance of 25 degrees, that means that 68% of the days had a temperature between 50 and 100 degrees. That is quite a difference! Just reporting the average temperature doesn't convey this information the same way as reporting average and standard deviation (or variance) does.

For more info on that, check this out: https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

Lastly, I mentioned that doing 2 passes to calculate variance could be a deal breaker for an algorithm.

An example of this is a graphics technique called "Variance Shadow Maps" (paper: http://www.punkuser.net/vsm/vsm_paper.pdf) which ultimately calculates a mean and a variance for how far a group if pixels is away from a light source. When rendering the variance shadow map from the point of the view of the light, each pixel stores the depth, as well as the depth squared. A fun property is that you can blur these values with neighboring pixels without harming the mathematical integrity of the values. The result is soft shadows. (more info on soft shadows: https://blog.demofox.org/2017/07/01/why-are-some-shadows-soft-and-other-shadows-hard/)

When lighting a pixel later on and using the shadow map, it knows the pixel's distance from the light source, and can read the two values from the filtered (blurred) shadow map, which allow it to get the mean and variance of the objects in the shadow map (the things that are casting shadows). It then uses something called Chebyshev's inequality to get an estimate for how much in shadow the pixel is.

That is a lot of setup explanation, but if having to do 2 pass variance calculations, instead of the 1 pass that it does do, you'd have to run over the full screen of pixels and do logic for each one (subtract the average pixel value), to calculate the variance. In real time graphics, having to do an extra "full screen pass" can be pretty costly, and can easily put a game over budget, meaning the technique would have to be cut so the rest of the game could render fast enough.

This blog post is here in the hopes that the next time someone googles "how do i calculate variance" for use in a computer program, that they see this post, and implement it as a single pass. Fingers crossed! ๐Ÿ˜›

Using Low Discrepancy Sequences & Blue Noise in Loot Drop Tables for Games

I never thought I’d be much of a stats person but here we are. This post is low on formalism though, so may the gods of formalism have mercy on my soul!

This post includes the result of experiments showing the value of what is talked about, and includes some simple C++ that you can find at: https://github.com/Atrix256/LootLDS

If you’ve ever played a game that involved grinding for loot, you might have looked online and found the drop rate for a specific item, only to find that if it says it drops one out of 100 times, that it takes you 200-300 runs to get it, while your friends get the drop in the first 10 runs.

What the heck is that about?!

That, my friends, is the nature of regular old random numbers – aka white noise – the kind of random numbers you get from rolling fair dice, flipping fair coins, hashing values using good hash algorithms, or using well made random number generators.

The core issue is that white noise random numbers can take on any valid value with equal probability at any time, regardless of whatever has happened before.

If you were betting on whether a fair coin would come up heads or tails after seeing 10 heads, if you say the next will be tails (because of course it will!) you will still only be right 50% of the time. If you flip the coin an infinite number of times, you will get an even number of heads or tails, but before reaching infinity, all bets are off.

This can be a problem for game designers too. They can collect statistics about how their randomized systems are behaving, analyze the results and come to the conclusion that everything is balanced and even. While that may be true when looking at global averages, the individual player experience may vary wildly and not be balanced at all.

Tangent: This is called variance and is the source of noise in raytraced rendering.

Tangent: There’s a fun related story here about the U.S. air force realizing there is no such thing as an average pilot:
https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html

In any case, is this “globally balanced but individually unbalanced” something we have to live with, or is there something we can do about it?

Luckily there is something we can do about it, and we can ensure that individual players have a more balanced, more pleasant, and more controlled experience, without sacrificing randomization.

Enter Low Discrepancy Sequences

A low discrepancy sequence is a sequence of numbers which are neither too close together nor too far apart.

If we put marks evenly spaced on a number line, the sequence of numbers at those marks would have zero discrepancy because they were evenly spaced. Low discrepancy numbers have a low discrepancy value that is greater than zero.

Examples of low discrepancy sequences that you may have heard of are: Sobol, Halton, Van Der Corput.

Some nice links of mine for learning more about low discrepancy sequences are:
https://blog.demofox.org/2017/05/29/when-random-numbers-are-too-random-low-discrepancy-sequences/
https://github.com/Atrix256/SampleZoo

Tangent: Going back to the raytraced noise example, regular sampling makes aliasing, and white noise sampling makes noise. Low discrepancy sequences sort of lay somewhere in the middle, gaining the benefits of both worlds, and actually having mystical powers of making numerical integration converge very quickly.

So what do low discrepancy sequences have to do with our problem?

If you use a low discrepancy sequence to generate 5 “random numbers” between 0 and 1, those 5 numbers will be roughly evenly spaced, which means that if you use those numbers on a loot table, the player is going to get a wider spread on the full possibilities of what the loot table has to offer.

If something has a very small percentage to drop, the player still has a low probability to get that drop, but if it’s a 1 in 100 drop, it’s more likely to happen at the 100th drop mark.

This is in constrast to white noise where the values may be clumped together and leave big holes in the 0 to 1 range, like: 0.114, 0.081, 0.093, 0.2, 0.95. There is a huge gap empty between 0.2 and 0.95, which is 75% of the possibilities!

There’s a problem with low discrepancy sequences though: They are deterministic – that is, they are predictable and not random. You get the same values from low discrepancy sequences every time.

Before admitting defeat though, there is another way to get randomization from this even though the sequences themselves are not random: You can shuffle the loot table!

Now, if you have thousands of players on an MMO server rolling numbers against a loot table, you probably just barfed in your mouth a little at my suggestion. There is a way to make a “shuffle iterator” though, so that you can get the benefits of shuffling a loot table, without actually having to keep copies of shuffled loot tables around. You’d use some unique player id (and more) as a random seed for the shuffle iterator, then could use a low discrepancy sequence to select loot. This way, each player would see different (randomized) loot drop sequences, but the loot rolls would still be low discrepancy.

Tangent: you can read more about a way to make shuffle iterators using low quality encryption in “Format Preserving Encryption” here: https://blog.demofox.org/2013/07/06/fast-lightweight-random-shuffle-functionality-fixed/

But we aren’t done yet…

Enter Randomized Low Discrepancy Sequences

The low discrepancy sequences talked about in the last section were deterministic, but what if we were able to make number sequences that had low discrepancy but were randomized?

That exists, and it’s known as “blue noise” because blue noise is random numbers which have only high frequency components (like how blue light is high frequency).

The property of both being low discrepancy, but also randomized, is great for many reasons way outside the scope of this article. For our loot drop purposes, it means that the loot will be both unpredictable, but also a player will get a balanced personalized experience, instead of only the global averages being balanced.

Tangent: Here’s a link about how to generate a blue noise sequence: https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/

The other shoe dropping is that blue noise can take a long time to generate, so is computationally expensive. In graphics, it’s common to generate blue noise in advance and just use the pre-made sequence. In loot drops, that is a less viable option because it makes your sequence deterministic and then you are back to shuffling the loot table.

Not real sure the answer here, but it may involve just keeping track of the last N loot drops, and using Mitchell’s best candidate algorithm to generate the N+1th value, adding that to the loot drop RNG list and removing the oldest one. If you get creative you might find a solution that fits your needs.

Prove it

Before we show experimental results, I wanted to defined a couple terms in regards to low discrepancy sequences.

  1. Progressive Sequence – A progressive sequence is a sequence of N values, where if you use less than N of the values, they still have the desired properties. For instance, if you make 100 blue noise distributed numbers, but only use the first 10, if it’s a progressive sequence, those first 10 will also be blue. If it isn’t a progressive sequence, you have to use all 100 before they are blue. This is also a property of deterministic low discrepancy sequences. For our loot drop purposes we NEED to use progressive sequences because other wise, the loot drops won’t be balanced until the very end, which kind of defeats the point.
  2. Open Sequence – An open sequence is one that you can always add more items to. If you regularly space 4 samples from 0 to 1 you are going to get 0, 0.25, 0.5, 0.75. If you want to add a 5th number you can’t! That means that this sequence is not open. Many low discrepancy sequences are open, and using Mitchell’s best candidate to generate blue noise does make an open sequence. For loot drops, we generally do want open sequences, because we usually don’t know how many times the player is going to loot something in advance.

The numbers below are from experiments done using the code that goes with this blog post. It’s ~380 lines of C++ in a single file using only standard includes. You can find it at: https://github.com/Atrix256/LootLDS

I used the following sequences:

  • White Noise – Straight up vanilla RNG.
  • Blue Noise – Using Mitchell’s Best Candidate to generate a progressive, open, uniform blue noise distributed number sequence.
  • Golden Ratio – Starting at 0, i add the golden ratio to the previous loot drop value to get the next loot drop value. I use modulus to keep the value between 0 and 1. The golden ratio has some interesting & good properties as a low discrepancy sequence.
  • Sobol – This is the low discrepancy sequence that leads in numerical integration convergence speeds.

For each sequence type, I generated 10 random loot tables which had 2 to 6 items, each item having a random roll weighting between 1 and 20. I then rolled as many loot drops as i needed to make it so the actual loot drop percentages were within 1% of what the loot drop table said they should be.

Higher numbers mean it took more loot drops for the actual loot dropped to reach the probabilities the loot table said they should be. Lower numbers mean it took fewer loot drops to reach the desired probabilities. I did 10 runs so that you can see if things are consistently good or consistently bad. Just doing a single test isn’t enough to show if something just got lucky one time, or if it’s a good choice.

Here are the results….

  • White Noise: 50513, 7834, 1859, 516, 8824, 3650, 1380, 24461, 35, 12455
  • Blue Noise: 72, 77, 143, 9, 129, 308, 353, 236, 176, 205
  • Golden Ratio: 47, 34, 50, 76, 55, 51, 114, 77, 21, 105
  • Sobol: 216, 155, 161, 77, 13, 71, 56, 75, 127, 51

It’s important to note that the loot tables themselves are generated with white noise, which is a source of variance in the results above. 10 samples isn’t a whole lot, so in a real analysis you might want to do more (100,000 or more runs?) but hopefully you should see that white noise really is not great. I was also surprised to see that Sobol didn’t do that well compared to blue noise and golden ratio. It must just do better at higher dimensions.

One last thing I wanted to mention is that this isn’t limited to loot drop tables. You can use these concepts for randomized events, procedural content generation, and basically anything else you use random numbers for in games.

The important takeaway here is that even if things look right when looking at global averages, using white noise makes the individual experience very different from that global average. Having better control over crafting a player’s individual experience is possible though, and has the possibility of giving a game a more hand crafted feel, even though you are still using RNG.

FRIENDS DON’T LET FRIENDS USE WHITE NOISE!

IIR Audio & Data Filters – Featuring Biquads

The last post talked about finite impulse response filters (FIRs) where the output of the filter was based on the current and previous input samples. (https://blog.demofox.org/2020/01/14/fir-audio-data-filters/)

In this post we are going to talk about infinite impulse response filters (IIRs) where the output of the filter is based not only on current and previous input samples, but also previous output samples.

This seemingly minor change makes for much more powerful and interesting filtering possibilities, but as it isn’t stateless, that means it must be evaluated serially (no SIMD!), and so is more computationally expensive.

The simple standalone C++ code that goes with this post is at: https://github.com/Atrix256/DSPIIR

The interactive web demo that goes with this post is at: http://demofox.org/DSPIIR/IIR.html

IIR Difference Equation

So let’s start with an order 2 FIR difference equation:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

Now let’s say we want the difference equation to include the previous two output samples too. We can just complete the pattern, by including some previous y terms with coefficient multipliers on the left side of the equation.

y_n + b_1y_{n-1} + b_2y_{n-2} = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

We can then move everything but y_n to the right side of the equation to get a function that gives us our current filtered output:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

You might be wondering why there is no b_0 term and the reason for that is because it would be a multiplier for y_n which is weird. Do we really need to scale the output like that? Sometimes people will include the b_0 term, and will divide both sides of the equation by b_0 to get an equation like this:

y_n = \frac{1}{b_0}(a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2})

Let’s just pretend that if b_0 exists, it’s value is always 1, and then we can move on without it actually being there, complicating our equations needlessly.

So, to repeat it, here is a difference equation for an order 2 IIR filter, which is built on an order 2 FIR filter.

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

You can pull out the a_0 parameter as a gain parameter again if you want to, but the b parameters don’t get the same sort of benefit, so you can leave them in their raw form.

y_n = a_0(x_n + \alpha_1x_{n-1} + \alpha_2x_{n-2}) - b_1y_{n-1} - b_2y_{n-2}

IIR Transfer Function

To calculate the transfer function, lets start back from where we added the previous outputs on the left side of the equation:

y_n + b_1y_{n-1} + b_2y_{n-2} = a_0x_n + a_1x_{n-1} + a_2x_{n-2}

Next, let’s take the z transform:

y(z) + b_1y(z)z^{-1} + b_2y(z)z^{-2} = a_0x(z) + a_1x(z)z^{-1} + a_2x(z)z^{-2}

We then factor out y(z) and x(z) to get:

y(z)(1 + b_1z^{-1} + b_2z^{-2}) = x(z)(a_0 + a_1z^{-1} + a_2z^{-2})

Since the transfer function is just y(z) divided by x(z) we can do simple algebra to do that now!

\frac{y(z)}{x(z)}(1 + b_1z^{-1} + b_2z^{-2}) = a_0 + a_1z^{-1} + a_2z^{-2} \\ H(z) = \frac{y(z)}{x(z)} = \frac{a_0 + a_1z^{-1} + a_2z^{-2}}{1 + b_1z^{-1} + b_2z^{-2}}

You can factor out the a0 term to be gain if you want, to get a more familiar looking top of the equation:

H(z) = \frac{a_0(1 + \alpha_1z^{-1} + \alpha_2z^{-2})}{1 + b_1z^{-1} + b_2z^{-2}}

From there you can plug in frequencies and see what sort of frequency and phase response you get, just like in the last blog post for FIRs.

You might notice that the transfer function is quadratic in the numerator, and the denominator. This is in fact called a “biquadratic filter” or a “biquad” for short.

Often times higher order filters (like EQs) are made by chaining multiple biquads together. Biquads are a pretty important staple of DSP.

Pole Zero Plot

You might wonder why in the last post we called it a “Pole Zero Plot” when there were zeros but no poles.

IIRs add poles to the pole zero plot and they are where the function shoots to infinity. This happens in a fraction when you divide by zero, so a pole is a place where the denominator of the transfer function is zero.

To make it explicit:

  1. You solve the quadratic equation in the numerator to get the zeros of that quadratic equation, which are also the zeros of the filter.
  2. You solve the quadratic equation in the denominator to get the zeros of that quadratic equation, which are the also the POLES of the filter.

That makes things pretty easy for making a pole zero plot. Calculating the zeros of the filter is the same, and you use the same technique to find the poles.

In the last post, we saw that the zeros of an order 2 FIR filter were at:

z = \frac{-\alpha_1}{2} \pm \frac{\sqrt{\alpha_1^2-4\alpha_2}}{2}

That still works for IIRS too. For poles, all you need to do is replace the alphas with bs:

z = \frac{-b_1}{2} \pm \frac{\sqrt{b_1^2-4b_2}}{2}

Example Filters

Unlike FIRs which are relatively tame and can only make certain frequencies lower, IIRs can cause frequency amplitudes to get much higher and even shoot off to infinity. That makes for some cool sounds by adding distortion to certain frequency notes of an instrument but not others.

Check the links section for the “RBJ cookbook” if you want to get deeper into the filters, but here are a couple interesting filters I found.

This one boosts low frequencies and attenuates high frequencies.

This does the reverse and boosts high frequencies while attenuating low frequencies.

This makes use of both poles and zeros to make a really interesting sounding filter.

It’s also still fun to slowly move the controls back and forth while some simple instrumental loop plays. Filtering white noise is still really interesting too because white noise has all frequency content, which means filtering frequencies out or boosting them up will always affect white noise. That isn’t true of things with simpler frequency components. The extreme of this is the sine wave which only has a single frequency so is unaffected by other frequencies being boosted up or attenuated.

Crafting a Filter By Placing Zeros

Creating a biquad filter from zero and pole locations is pretty simple if you already can make an FIR filter by placing zeros. In fact, that is exactly how you place the zeros.

To place the poles, you do the exact same steps as placing the zeros, but the coefficients you get out are for b0, b1 and b2 instead of a0, a1 and a2.

Estimating Frequency Response From a Pole Zero Plot

Doing this from an FIR involved getting the distance from a point on the unit circle to every zero, multiplying all those distances together as a product and that is the amount the amplitude would be multiplied by.

Adding poles into the mix extends this.

As a second step, you get the distance from the point on the unit circle to every pole. You multiply all those pole distances together as a product and divide the previous product by this amount.

That’s all there is to it, and you should hopefully be able to see that this is why a frequency at the pole would shoot to infinity. The distance is zero so the frequency response shoots to infinity.

Oscillators

If you want to generate sinusoid waves of specific frequencies, you can use IIRs to do that by putting the poles on the unit circle, and leaving the zeros at the origin.

For a filter to be stable (not shoot off to infinity), the poles need to be inside of the unit circle, so putting them on the edge of the unit circle is playing with fire a bit, but the instability is also what makes the oscillator function.

We could talk about the algebra for calculating polynomials in the numerator and denominator of the transfer function to do this, but lets jump to the end and look at the simplified result of what to set the parameters to in the difference equation below:

y_n = a_0x_n + a_1x_{n-1} + a_2x_{n-2} - b_1y_{n-1} - b_2y_{n-2}

The parameters should be:

  • a_0 = 1
  • a_1 = 0
  • a_2 = 0
  • b_1 = 2 cos(\omega)
  • b_2 = 1

Where omega (\omega) is the amount of radians to advance the wave form per sample.

If you plug these into the difference equation, you can simplify it to this:

y_n = x_n - 2 cos(\omega)y_{n-1} - y_{n-2}

These oscillators don’t need input and just want a steady stream of zeros. Because of this, we can remove the input from the equation.

y_n = -2 cos(\omega)y_{n-1} - y_{n-2}

The last thing you need to do however is to give it some initial starting state. If you make it so y[-1] and y[-2] are zero, to calculate y[0] and y[1], the oscillator won’t work correctly.

This is because we need to initialize the state (prior output) to be as if it’s been running all along.

So, you can set y[-1] to be cos(-\omega*1) and y[-2] to be cos(-\omega*2). That will make it so the next sample will be cos(0) which means the wave will start at zero degrees and continue.

You could initialize the state to whatever phase you wanted to, by initializing y[-1] and y[-2] to the prior two values to the desired phase.

As a quick and dumb example, let’s look at a sine wave that advances 180 degrees (pi radians) per sample. That means b1 will be 2, which makes our difference equation:

y_n = -2y_{n-1} - y_{n-2}

We’ll initialize y[-2] to be cos(-2pi) or 1, and we’ll initialize y[-1] to be cos(-pi) or 0.

Following the difference equation starting at y[0] we get…

\begin{array}{|c|c|} \hline \text{y index} & \text{value} \\ \hline -2 & 1 \\ -1 & -1 \\ 0 & 1 \\ 1 & -1 \\ 2 & 1 \\ 3 & -1 \\ 4 & 1 \\ 5 & -1 \\ \hline \end{array}

180 degrees is nyquist, and we can see that it’s doing the right thing of flipping between 1 and -1. It works with less clean numbers too, and the simple c++ code that goes with this post shows that working (https://github.com/Atrix256/DSPIIR).

Unfortunately, with less clean numbers, this method will start to drift from reality over time due to floating point imprecision. One way to deal with this is to reset the oscillator after every 360 degrees of advancement.

Nick Appleton (@nickappleton) has an alternate method if you are looking for a cheap oscillator.

First you make two complex numbers:

  • y = 1 + 0i
  • a = e^(i*omega)

Where omega is still the number of degrees the wave advances per sample. Another way to calculate a is: std::polar(1.0f, radiansPerSample)

Then, for each sample you do this:

y = y * a

the resulting complex number will have the cosine value in the real portion, and the sine value in the imaginary portion.

This has better numerical stability, which you can see in the c++ code output too.

Links

Here are some good links where i got info about oscillators.

http://dspfirst.gatech.edu/chapters/08feedbac/demos/recur/index.html

Click to access ece5655_chap8.pdf

There is a famous document called the “RBJ cookbook” which gives recipes for biquads that act in specific ways. RBJ stands for the author’s name Robert Bristow-Johnson. You can find it attached to the message at the link below!

https://www.musicdsp.org/en/latest/Filters/197-rbj-audio-eq-cookbook.html

Marc B Reynolds (@marc_b_reynolds) recently put out an interesting blog post talking about how it’s better to use integers to repeatedly add things (irrational numbers in his case) instead of using floats. There are some good lessons there that apply here as well I bet, probably especially for oscillators.

http://marc-b-reynolds.github.io/distribution/2020/01/24/Rank1Pre.html

Linear Fit Search

Binary search looks in the middle of a list to make a guess about where a search value is. If that guess is wrong, it can eliminate half of the list (based on whether the search value is less than or greater than the guess location) and try again. It repeats until it’s either found the search value, or runs out of list.

This algorithm works well but it is blind to the actual values it got when making guesses, beyond just checking if they were greater or less than the search value.

I recently wondered: If we knew the min and max value stored in the list, couldn’t we make a more intelligent guess as to where the search value might be? We could fit the data with a line, figure out where our guess would be on that line, and make that be our initial guess. As we iterate, we could use our incorrect guesses as new min or max values of the line as appropriate, updating our line fit as we went, and perhaps arrive at an answer more quickly.

Another way of looking at this: If the guess a binary search made is VERY far from the search value, maybe it should go farther than the midpoint when making the next guess? Or, if it was pretty close to the search value, maybe it shouldn’t go as far as the midpoint? Close vs far measurements depend on the overall magnitude of the numbers in the list, so you’d need to know what sort of values are stored. A min and a max value of the list can give you a rough idea of that, especially if you update those min / max values as you repeatedly cut the list with guesses.

This post explores that idea. The result is something that could be more attractive than binary search, depending on what kind of trade offs are being looked for. While I haven’t heard of this technique , I wouldn’t be surprised if it’s been tried before and written about. (Know of a source? let me know!).

UPDATE: @thouis from twitter mentioned the basic idea is called “interpolation search”. This post goes beyond that basic idea but you can read more about it here if you’d like ๐Ÿ™‚ https://www.techiedelight.com/interpolation-search/. He has a paper about interpolation search that you can read here (it has some relation to discrepancy, as in low discrepancy sequences, oddly!) https://erikdemaine.org/papers/InterpolationSearch_SODA2004/

The post goes a step further to address a problem that is encountered when using this algorithm, and also talks about other ways this algorithm might be extended or generalized.

An implementation, and the code that generated all the data for this post, can be found here: https://github.com/Atrix256/LinearFitSearch

Initial Problem / Other Possible Avenues

(Feel free to skip this section if you get lost. You won’t miss anything important about the algorithm itself)

If you are wise in the ways of numbers, you might be saying to yourself that this only works if you have roughly evenly distributed numbers – basically, a flat PDF, or a flat histogram. This is because by only knowing the min and max, you are doing a linear fit of the data, and making guesses as if your data is well represented by that line. The less like a line your data actually is, the less good this ought to work.

That is true, and I thought up this idea while trying to think of how to generate 1d blue noise more quickly, which is random but roughly evenly spaced values. For that usage case it does well, but there are many types of non linear data out there that you might want to search through.

Really what you want to do is learn the distribution of the values in the list, and use that knowledge to know where the value you are searching for is likely to be.

I didn’t go that direction in these experiments, but it seems like a data scientist would have plenty of tools in their tool box to attempt something like that. Markov chain Monte Carlo type algorithms come to mind.

There’s another way to look at the problem of searching for a value in a list, and that’s to look at it as strictly a function inversion problem.

If you look at your sorted list as a lookup table, where the index is the x value, and the value stored is the y value, a search tries to tell you the x value for a specific y value that you are searching for.

In this context you only care about integer values of x, and there might be duplicate values in the list, making it not a strictly monotonic function – not having each y value be larger than the last y value – but has a more relaxed version where each y value is >= the last y value.

Thinking about the search problem as a function inversion problem, ignoring the monotocity issue, there are far too many data points to do an analytic inverse, so you would be looking at numerical inverse solutions.

I also didn’t really explore that direction, so it’s another way to go that might yield some better fruit.

Lastly, you could see searching a sorted list as a root finding problem. If you are looking for where the function minus the search value equals zero, numerical root finding functions could maybe help you here. I also did not try anything in that direction.

If anyone ends up exploring any of the alternative avenues, I’d love to hear what kind of techniques you used and what your results were!

Linear Fit Search

The algorithm works like this…

  1. Start with a sorted list, and the minimum and maximum value stored in that list.
  2. Calculate a line fitting the min and max. For an equation y=mx+b, you are calculating m and b.
  3. Using the inverse of the function, which is x=(y-b)/m, make a guess for what index (x) the search value (y) is at by plugging the search value into that equation as y and getting an x. That x is the index you are guessing the value is at.
  4. If your guess was correct, you are done so exit. Otherwise, if the guess was too high, this is your new max. If the guess was too low, this is your new min. If you’ve run out of list to search, the value isn’t there, so exit.
  5. Goto 2

This algorithm assumes the sorted list looks like a line if you were to graph it, so it does better when the sorted list actually looks like a line.

Let’s see how it does for a linear list with values in it between 0 and 2000. (Click to see full size image)

The left image shows the items in the array.

In the middle image, x axis is the number of items in the list, and y axis is how many guesses it took to search for a random value. This shows the average of 100 runs.

In the right image, it shows the minimum and maximum guesses it took for each list size, for those same 100 runs.

The linear fit did pretty well didn’t it? At minimum it took zero guesses (the search value was less or equal to min or greater or equal to max), and at maximum it took 2 guesses to find the search value, regardless of list size.

Binary search took about the usual log2(N), as expected.

Let’s try a list made up of random numbers between 0 and 2000.

That looks pretty similar to the linear case, but the line fit search doesn’t beat binary search by quite as much. The randomness of the list makes it so the guesses are more often wrong, and so it takes a few extra guesses to find the right place.

Let’s try a quadratic function: y=2000x^2:

The average for line fit search still beats binary search, but if you look at the min/max graph, the line fit min and max entirely encompasses the binary search min and max. That means there is a ton of variance about whether it will be faster or slower than binary search, even though on average it will be faster.

Let’s try a cubic function: y=2000x^3:

While the average still (barely) beats binary search, the maximum for line fit search has gotten REALLY erratic.

Let’s try a log function:

Ouch, the line fit is actually doing worse now than the binary search.

Lastly, let’s go back to the linear list, but let’s make the last entry in the table be 200,000 instead of 2000:

Ouch! Linear fit search is super awful now. What happened?!

It turns out that this uneven histogram type of list is really a worst case scenario for the line fit search.

What is happening here is that it sees the min as 0 and the max as 200,000 so it thinks the line is very steep. On it’s first guess, everything it could search for (it searches for a random value between 0 and 2000), it will think the value is at index 0. It will very likely be wrong, and elminate index 0. The next round, it will choose index 1, be very likely wrong again, and repeat by picking 2 then 3 then 4 and so on. This data layout nearly forces this search to a more computationally expensive version of linear search. Binary search doesn’t have this problem because it doesn’t care what the values are, it just cuts the list in half repeatedly until it’s done.

Wouldn’t it be nice if we could know whether it’d be better to use binary search or linear fit search for a data set?

We’d have to analyze the data set to figure that out, and if we are going to go to all that trouble, we probably should just learn the shape of the data set in general and use that knowledge to make a better guess than either binary search or linear fit.

I think going that route could be fruitful, but I didn’t try it. Instead I came up with a Hybrid Search.

Here is my more readable, less optimized code for the linear fit search.

TestResults TestList_LineFit(const std::vector<size_t>& values, size_t searchValue)
{
    // The idea of this test is that we keep a fit of a line y=mx+b
    // of the left and right side known data points, and use that
    // info to make a guess as to where the value will be.
    //
    // When a guess is wrong, it becomes the new left or right of the line
    // depending on if it was too low (left) or too high (right).
    //
    // This function returns how many steps it took to find the value
    // but doesn't include the min and max reads at the beginning because
    // those could reasonably be done in advance.

    // get the starting min and max value.
    size_t minIndex = 0;
    size_t maxIndex = values.size() - 1;
    size_t min = values[minIndex];
    size_t max = values[maxIndex];

    TestResults ret;
    ret.found = true;
    ret.guesses = 0;

    // if we've already found the value, we are done
    if (searchValue < min)
    {
        ret.index = minIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue > max)
    {
        ret.index = maxIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue == min)
    {
        ret.index = minIndex;
        return ret;
    }
    if (searchValue == max)
    {
        ret.index = maxIndex;
        return ret;
    }

    // fit a line to the end points
    // y = mx + b
    // m = rise / run
    // b = y - mx
    float m = (float(max) - float(min)) / float(maxIndex - minIndex);
    float b = float(min) - m * float(minIndex);

    while (1)
    {
        // make a guess based on our line fit
        ret.guesses++;
        size_t guessIndex = size_t(0.5f + (float(searchValue) - b) / m);
        guessIndex = Clamp(minIndex + 1, maxIndex - 1, guessIndex);
        size_t guess = values[guessIndex];

        // if we found it, return success
        if (guess == searchValue)
        {
            ret.index = guessIndex;
            return ret;
        }

        // if we were too low, this is our new minimum
        if (guess < searchValue)
        {
            minIndex = guessIndex;
            min = guess;
        }
        // else we were too high, this is our new maximum
        else
        {
            maxIndex = guessIndex;
            max = guess;
        }

        // if we run out of places to look, we didn't find it
        if (minIndex + 1 >= maxIndex)
        {
            ret.index = minIndex;
            ret.found = false;
            return ret;
        }

        // fit a new line
        m = (float(max) - float(min)) / float(maxIndex - minIndex);
        b = float(min) - m * float(minIndex);
    }

    return ret;
}

Hybrid Search

Since binary search and linear fit search both have situationally good properties, I decided to try a hybrid of the two where it switches between the two for each guess. The first guess is a linear fit, the next is a binary search guess, then back to linear fit, and so on.

Here’s where that puts things with the previous worst case scneario: the linear data with a single huge outlier. New graph on top, old on bottom for comparison. Apologies that the colors aren’t consistent between old and new! ๐Ÿ˜›


There’s quite a bit of variance, and the linear fit min and max contains the binary search min and max, but on average it does beat the binary search now, which is kind of neat.

Let’s analyze the line fit worst performers to best performers and see how the hybrid search compares.

Here’s the log function:


The variance has decreased compared to line fit. The average beats binary search too, where the non hybrid test didn’t.

Next is the cubic function:


With the non hybrid approach, cubic on average was barely beating binary search and had a huge amount of variance. The hybrid average is beating binary search by a larger margin and the variance has dropped a lot.

Here’s quadratic:


The line fit search beat binary search, like the hybrid search does. It even beats it by roughly the same amount. The hybrid search has a lot less variance though, which is a nice property. You’ll have more consistent timings as you search.

Here’s random:


The hybrid search does a little worse both for average, and variance, than the linear fit search did.

Last is linear:


it’s impossible to see where the hybrid max line is, but it went up to 3, from the 2 that line fit max was at, which also brings the average up just a little bit. In my opinion, that isn’t so bad that we slightly damaged the perfectly linear and random cases in favor of making it much more robust in the general case.

Here is my more readable, less optimized code for the hybrid search. The only meaningful difference is on line 48 where it chooses to do a linear fit or binary search step, and line 72 where it toggles which one it does next.

TestResults TestList_HybridSearch(const std::vector<size_t>& values, size_t searchValue)
{
    // On even iterations, this does a line fit step.
    // On odd iterations, this does a binary search step.
    // Line fit can do better than binary search, but it can also get trapped in situations that it does poorly.
    // The binary search step is there to help it break out of those situations.

    // get the starting min and max value.
    size_t minIndex = 0;
    size_t maxIndex = values.size() - 1;
    size_t min = values[minIndex];
    size_t max = values[maxIndex];

    TestResults ret;
    ret.found = true;
    ret.guesses = 0;

    // if we've already found the value, we are done
    if (searchValue < min)
    {
        ret.index = minIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue > max)
    {
        ret.index = maxIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue == min)
    {
        ret.index = minIndex;
        return ret;
    }
    if (searchValue == max)
    {
        ret.index = maxIndex;
        return ret;
    }

    // fit a line to the end points
    // y = mx + b
    // m = rise / run
    // b = y - mx
    float m = (float(max) - float(min)) / float(maxIndex - minIndex);
    float b = float(min) - m * float(minIndex);

    bool doBinaryStep = false;
    while (1)
    {
        // make a guess based on our line fit, or by binary search, depending on the value of doBinaryStep
        ret.guesses++;
        size_t guessIndex = doBinaryStep ? (minIndex + maxIndex) / 2 : size_t(0.5f + (float(searchValue) - b) / m);
        guessIndex = Clamp(minIndex + 1, maxIndex - 1, guessIndex);
        size_t guess = values[guessIndex];

        // if we found it, return success
        if (guess == searchValue)
        {
            ret.index = guessIndex;
            return ret;
        }

        // if we were too low, this is our new minimum
        if (guess < searchValue)
        {
            minIndex = guessIndex;
            min = guess;
        }
        // else we were too high, this is our new maximum
        else
        {
            maxIndex = guessIndex;
            max = guess;
        }

        // if we run out of places to look, we didn't find it
        if (minIndex + 1 >= maxIndex)
        {
            ret.index = minIndex;
            ret.found = false;
            return ret;
        }

        // fit a new line
        m = (float(max) - float(min)) / float(maxIndex - minIndex);
        b = float(min) - m * float(minIndex);

        // toggle what search mode we are using
        doBinaryStep = !doBinaryStep;
    }

    return ret;
}

Random Odds and Ends

Just like binary search, the linear fit and hybrid search algorithms can return you the index to insert your value into the list, if not present.

Some folks may balk at the idea of having the min and max value of the list before you do a search, from the point of view that it’s sort of like 2 guesses that aren’t being counted against the graph. If that’s your point of view, you can add 2 to the values graphed and you can see that the hybrid search is still compelling. I think it’s perfectly reasonable that you’d know the min and max of a sorted list though. After all, we store the length, why not also the min and max?

It may not be optimal to do 1 step of line fit search and 1 step of binary search in the hybrid search method. It might be that by doing something like 1 binary step then 3 line fit steps, and repeating that pattern, may give you better results. It may also be a better idea to just do line fit search, but if you aren’t making good enough progress, throw in a binary search step. I didn’t explore this at all due to the “nice enough” results i got switching off every time.

I had a thought that it might be good to try doing an “online linear squares fit” while making guesses so that you learned the shape of the list while searching it. If that sounds interesting to you, give this a read: https://blog.demofox.org/2016/12/22/incremental-least-squares-curve-fitting/. I suspect that having a more localized fit (like in this post) performs better, but I might be wrong. I could also see doing a least squares fit of the data offline in advance so you had that data available, like a min and a max, before you started the search. A problem with doing a fit in general though is that you have to be able to invert the function of whatever you fit the data with. Quadratic or cubic seem like they are probably the limit of what you’d want to try to avoid ringing and the complexity of higher order function inversion.

You can make binary searches more cache friendly by putting them into binary trees stored in arrays. This makes it so for instance, that when you test index 0, you are really testing the half way point. If the search value is less than index 0, you look at index 1, else you look at index 2. The left and right child of an index is just index*2 and index*2+1. I bring this up, because the “fixed guess points” of a binary search make this possible. A linear fit search doesn’t have fixed guess points, which makes it not possible to do the same thing. I’m betting with some creativity, some better cache friendliness could be figured out for a linear fit search.

Following in that idea, is the concept of a cache oblivious b-tree. Check it out here: https://github.com/lodborg/cache-oblivious-btree

Another nice property of binary searching is that you can make it branchless and very SIMD friendly, or very friendly for simple hardware implementations. A linear fit search doesn’t seem as well suited for that, but again, maybe some creativity could help it be so. Here’s more about binary search operating like I just described: https://blog.demofox.org/2017/06/20/simd-gpu-friendly-branchless-binary-search/

Lastly, you might have noticed that the graph for the linear data set showed that the line fit and hybrid searches were taking fewer guesses as the list got larger. It looks impossible, and lets me make this dank meme:

What the heck is going on there?

The x axis of those graphs shows how large the list is, and the y axis is how many guesses are taken, but in all those linear lists of each size, the list linearly breaks up the range [0,2000]. It’s also always searching for random numbers in [0,2000]

In smaller lists, the numbers are more sparse, while in larger lists the numbers are more dense.

If you have a linear data set, and are using a linear fit to look for a number in that list that may or may not be there, a denser list will have the values there more often, and the first guess is going to more often be the correct location of the search value.

That’s what is happening, and that’s why it’s showing an improvement in the linear case as the list gets larger, because it’s also getting more dense.

Here’s a graph for a version of the test where the density is kept the same for each list. The lists are between [0,5*count] and the search values are in the same range.

It’s interesting and kind of cool that both the average and min/max are flat, but this is a best case scenario for the line fit (and hybrid) search, with the data actually being linear.

Performance

Ok finally we get to performance. Many of you fine folks were probably looking at the guess count graphs and thinking “So what? Where’s the perf measurements?” TL;DR I think this is a pareto frontier advancement but i’ll explain more.

here are the perf results but don’t be too quick to say “aha!”, because they need some explanation and context. These results are on my modern-ish gaming laptop.

Results:

  • Linear search takes ~1.5 nanoseconds per guess. (eg, increment the index and read the next value from the array)
  • Binary search takes ~5 nanoseconds per guess.
  • Both linear fit and hybrid search takes ~12 nanoseconds per guess.

So, from my tests, binary search would need to take 2.5 times as many guesses as linear fit or hybrid searching to break even. The only case where that is true in my tests is the purely linear list.

Now that I’ve said that, I don’t think the tests I’ve done are really a good apples to apples comparison.

What I did as a test was generate lists of the various types described above, generated a list of random numbers to search for in them, then had each search algorithm do all the searches and i divided the total time by the total number of guesses done to get a time per guesses for each algorithm.

It is true that the linear fit is slightly more complicated logic than a binary search, or the linear search, so computationally I do expect it to take longer, and the 2.5x as long seems like a fair measurement.

HOWEVER, searching the same list over and over is an unrealistic pattern for most applications. More of the list would be likely to be in the cache when doing multiple searches back to back like this, so memory reading would be under-reported in the profiling.

Because the linear fit (and hybrid) searches are more computationally expensive, but end up doing fewer guesses, they use more cpu, but less memory bandwidth. That means that the wins they give would show up in times when memory reads (or wherever the list was stored) were slower. Having the list in the cache is not a time when the reads are going to be slower, so I think the testing is stacked against the linear fit and hybrid testing.

That said, I can’t think of a better “canned performance test” to compare apples to apples. You really would need to drop it in, in a realistic usage case for searching in an application, and see if it was better or worse for that specific usage case.

If you were memory bandwidth bound, and thus had some compute to spare, this search seems like it could possibly be a nice option. Or, in exotic situations where reading a list was VERY VERY slow (remote servers, homomorphic encryption, data stored on disk not in memory?) this could be a better algorithm. In those exotic situations where reads are way more expensive that computation, you’d probably want to go further though, and use more advanced algorithms to really make every guess count, using a lot more CPU to do so.

Lastly on perf: none of this code has been optimized. I wrote it for clarity, not speed. It’s possible that the comparison landscape could change (either for better or worse) with optimized code.

If anyone investigates perf more deeply, I’d love to hear results and in what context those results were found. Thanks!

Quadratic Fit Search and Beyond?

An obvious questions is: can this search technique extend to quadratic and beyond?

I do think so. Let’s look at how that might work, and then i’ll point out some complications that make it more challenging.

Let’s think about the quadratic case. You’d need to start with a quadratic fit of the data, which would require 3 data samples from the list. Two data samples would be the first and last index just like the linear search, but where should the third data point be from?

One place it could be is in the middle of the list. If you can afford more processing time than that, you might consider picking whatever index gives the lowest error between the quadratic fit and the actual data stored in the array.

Now you have a quadratic fit of the data in the array and can begin searching. You have some y=f(x) function that is quadratic, and you invert it to get a x=f(y) function. All is well so far.

You make your first guess by pluggin your search value in for y and getting an x out which is your first guess for where the number is. When you read that number, if it is the search value, you are done. If it doesn’t match though, what do you do?

Your guess point is going to be between your min and max, but it might be to the left or the right of the third point you have in the quadratic fit. That is two possibilities.

Your guess may also be too low, or too high. That is two more possibilities, making for four possible outcomes to your guess.

Let’s say your guess was to the left of the “third point” and deal with these two outcomes first:

  • If your guess was less than the search value, it means that your guess is the new minimum.
  • If your guess was greater that the search value it means that your guess is the new maximum. A problem though is that your “third point” is now to the right of the search maximum. This isn’t so bad because it still fits real data on the curve but it seems a little weird.

If your guess was on the right of the “third point”, we have these two outcomes to deal with:

  • If your guess was less than the search value, the guess is the new minimum, and the “third point” in the quadratic fit is to the left and is less than the minimum.
  • If your guess was greater than the search value, the guess is the new maximum.

Are you with me so far? the “third point” seems oddly stationary at this point, but the next round of searching fixes that.

On the second step of searching (and beyond), we have some new possibilities to add to the previous four. The “third point” can either be less than the minimum or greater than the maximum. That is two possibilities.

And once again, we have two possibilities in regards to what our guess found: The guess value could be lower than the search value, or it could be higher.

Due to symmetry, let’s just consider the “third point” to be greater than our max, and then we can just consider the less than and greater than case:

  • If our guess was too small, it’s the new minimum.
  • If our guess was too large, it’s the new maximum, but the old maximum becomes the new “third point”. This moves the “third point” to be more local, giving us a more local quadratic fit of our data, which should help the search make better guesses.

So now, the “third point” moves around, and the quadratic fit is updated to be a localized fit, like we want it to be.

For the cubic case and above, I’ll leave that to you to sort out. It just is updating the minimum and maximums based on the guess value vs search value, and then doing a dance to make sure and keep the most local points around for the curve fit of the data, and throwing out the less local points to make room. I am pretty sure it’s extendable to any degree you want, and that one algorithm could be written to satisfy arbitrary degrees.

Now onto a complication!

Our very first step is to make an initial fit of data of whatever degree and then invert it. To invert the function, it needs to be monotonically increasing – aka there is no part on the graph where if you look at the point to the left, it’s higher. Each point on the graph should be higher than the point to the left.

The bad news is that if even looking at the quadratic case, making a quadratic curve pass through 3 data points A, B, C where A <= B <= C, the result is very often NOT going to be monotonic.

That means you are going to have a bad time trying to invert that function to make a guess for where a search value should be in the list.

I think a good plan of attack would be to fit it with a monotonic quadratic function that didn't necessarily pass through the 3 data points. That would affect the quality of your guess, but it might (probably should??) do better at guessing than a line fit, at the cost of being more computationally expensive. I'm not sure how to do that specifically, but I'd be surprised if there wasn't an algorithm for it.

For details on how even quadratic often isn't monotonic:
https://twitter.com/Atrix256/status/1108031089493184512

Some possibly good leads to dealing with this:

https://math.stackexchange.com/questions/3129051/how-to-restrict-coefficients-of-polynomial-so-the-function-is-strictly-monotoni

https://en.wikipedia.org/wiki/Monotone_cubic_interpolation

Closing

Thanks for reading. Hopefully you found it enjoyable.

If you use this, or do any related experimentation, I’d love to hear about it.

You can find me on twitter at https://twitter.com/Atrix256