Interpolating Data Over Arbitrary Shapes With Laplace’s Equation and Walk on Spheres

A very cool paper was accepted to SIGGRAPH 2020 called “Monte Carlo Geometry Processing: A Grid-Free Approach to PDE-Based Methods on Volumetric Domains”

The paper is here:
https://www.cs.cmu.edu/~kmcrane/Projects/MonteCarloGeometryProcessing/index.html

You can find the authors on twitter at
@rohansawhney1
@keenanisalive.

The paper shows some really interesting things so I set out to learn more about it and arrived at the topic of this blog post.

In this blog post we are going to take common ideas of interpolation, like:

  • linearly interpolating between two points on a line
  • linearly interpolating between 3 points on a triangle using barycentric coordinates – or interpolating between N points on a simplex (including on a line, which is the last bullet point)
  • Bilinear interpolation between 4 pixels when you zoom into a texture – or N-linear interpolation between pixels for volume textures and mipmaps

…and we are going to generalize them to interpolating data over any shape, where values are specified on the boundary of the shape (not the vertices!), and interpolated through the interior of the shape. Furthermore, the shape doesn’t even need to be contiguous, or closed.

Unfortunately, unlike the interpolation methods I mentioned above, this interpolation isn’t as friendly for real time applications.

Despite the mathy nature of the topic, I’m going to try to keep it real easy to understand, so don’t get scared off!

Introduction – Laplace’s Equation / Dirichlet problem

Let’s start with the mathy bit, in a very math light way. The below describes the mathematics of our interpolation, which is a Dirichlet problem to solve Laplace’s Equation if you wanted some terminology.

Let’s talk about it’s two parts independently:

Section #1 (The Blue Box)

  • On the left, it says “The Laplacian of the function is 0…”
  • On the right, it says “…if the point is inside the shape (not on or outside the boundary)”

The Laplacian being zero inside the shape means that if you take an average of the values around any point inside the shape, they will average to the value at the center point.

If this was a 1 dimensional function, it would mean that if you pick a number on the graph, take a step to the right and the value was 1 higher than the current number, then the number if you take the same sized step to the left must be 1 lower than the current number (making sure to stay inside the boundaries). Below is a function y=3x+2 which has a Laplacian of 0.

In 1D that means the graph is a straight line (linear) but in higher dimensions, you can get more interesting shapes. Below is a 2d function z=e^x \sin(y) which has a Laplacian of 0.

You can see the graph here: https://www.wolframalpha.com/input/?i=z%3De%5Ex*sin%28y%29

You can see that it has a Laplacian of 0 here: https://www.wolframalpha.com/input/?i=laplacian+e%5Ex*sin%28y%29

I got that function from this great Khan Academy video: https://www.youtube.com/watch?v=JQSC0lCPG24

So, section 1 doesn’t exactly say that our function is linear, but it does give a certain smoothness to the interior that feels a bit like a linear function.

To me, I think of how a bilinear interpolation is linear on the x and y axis, but can go quadratic as soon as you get off of those axes. Having a Laplacian of zero seems to satisfy the spirit of that “linear on an axis” property, without being limited to a specific axis.

Section #2 (The Green Box)

  • On the left it says “The function has specific values…”
  • On the right it says “…if the point is on the border”

All this means is that there are values along the border, and it doesn’t matter how those values are defined.

Putting it Together

So section 2 says there are values on the border of the shape, and section 1 says that any point inside the shape is an average of the values around it – even if that point is near the border. This defines how our interpolation works.

So, let’s take a walk. Pretend you started at a border and walked across the shape in a straight line until you hit the border at the other side of the shape.

When you started walking, you’d have the value of the border where you started. As you walked into the shape, the value would mix with values from other nearby borders (basically weighted by their distance to you).

When you hit the midway point of your walk, the border you started at and the border you ended at would have equal weighting in the current value you had, but other border values would be contributing as well.

When you got closer to the border you are ending at, the value there would start to weigh more heavily into the value, until you finally reached the destination point of your walk, and the value was just the border value there.

This essentially is our interpolation, as described by those two mathematical statements.

How To Do It – Walk On Spheres Algorithm

If you have ray marched signed distance fields, prepare to have your mind blown. There is a huge similarity here 🙂

To interpolate data in a shape, where the data values are defined by values at the boundary, you need to be able to get a distance from a point to the nearest border. You don’t need to be able to get a direction, just the distance. You need a (signed) distance field.

If you can’t get an ACTUAL distance to the border, but can get a conservative estimate (an estimate that is guaranteed to be <= the actual distance), that works fine too.

Here is a neat article from Inigo Quilez that talks about how to get that kind of an estimate using function gradients: https://www.iquilezles.org/www/articles/distance/distance.htm

So the algorithm is to start at your point and…

  1. Get the distance from the point to the border. This defines a sphere representing the largest step you can take without leaving the shape. (sphere if 3d, circle if 2d, hypersphere if 4d or higher, line segment if 1d)
  2. If the distance to the border is smaller than some epsilon, return the value at the border there.
  3. Else, take a step of that distance in a uniform random direction
  4. GOTO 1

If you do that full loop 1 time, you will get a single value. This is 1 sample.

If you do that N times and average the result, you’ll get a value that approaches the correct value. This is where the “monte carlo” part mentioned in the paper comes in. We are doing monte carlo integration, where each sample is a random walk in the shape from the starting point to whatever part of the border it exits at.

Higher values of N give more accurate results with less error.

That is all there is to it!

It works because the likelihood of hitting an specific boundary point depends entirely on how far it is from your starting point. Closer boundary points are more likely to be reached, while farther boundary points are less likely to be reached.

If you are doing this process for an image, you would run this N times for each pixel individually, to get a noisy result that got less noisy as you took more samples, just like path tracing. You could even use denoising on the image before it converged.

A non obvious detail is that the shape doesn’t need to be convex. It can be concave which means that the random walk needs to “go around corners” to reach borders. The distance used for interpolation weighting is the “around the corner” distance, not a straight line distance where the straight line exits the shape and then re-enters it.

Another interesting thing is that the shape doesn’t strictly need to be closed. You could have a U shape with values along the U, and this algorithm works just fine. It just now applies to shapes inside of the bowl of the U as well as the points outside the bowl of the U, off into infinity. (Is this still interpolation? it isn’t quite extrapolation I don’t think… hrm!)

You could also have multiple shapes that aren’t connected scattered around with values on them and this algorithm still works just fine.

Being Monte Carlo integration, you can apply ideas like importance sampling, and Russian Roulette too. Importance sampling to reach boundaries more quickly. Russian Roulette to kill paths that are wandering away form useful areas in an unbiased way.

I haven’t found a way to apply low discrepancy sequences for faster convergence though, nor have I found a way to apply blue noise for better perceptual error. The search continues though and there are some resources regarding LDS in the links section.

Examples – My Shadertoys

I wrote a shadertoy which uses this technique to do 1d interpolation from red to yellow. You could do the same thing with a lerp easily, but it is a simple demonstration of the technique.

The shadertoy is at https://www.shadertoy.com/view/ttByDw

Here it is after 1 sample. Every pixel has done one random walk on a numberline.

Here it is after 10 seconds, which is 600 samples, because of vsync locking it to 60fps.

I also made a shadertoy which works in 2D using 2D SDFs, with a handful of scenes (there is a SCENE define in the common tab to control which is being viewed) at https://www.shadertoy.com/view/wtSyWm

From that shadertoy, here is a circle which uses a sine wave around the border to make sections of white and black. This is the noisy unconverged version.

Here it is more converged.

Here is a version using rainbow colors around the border:

Here is a setup using 3 lines to make a triangle, where each line has a different color on the inside and the outside. Even though the lines are boundaries, there is an “inside and outside”, so the boundary isn’t a typical boundary.

Here is the same setup, but opening the triangle into a U shape, to show it works with shapes that aren’t closed.

Here are 3 circles and two lines. The lines are black colored on the side facing the center of the screen, and white colored on the outside.

A 3d version will be shown in the next section!

Examples – Other Shadertoys

The website for the paper on monte carlo geometry processing links to 3 really interesting shadertoys. I want to explain what they are doing real quick, because it is pretty clever stuff.

The first is by inigo quilez (@iquilezles). https://www.shadertoy.com/view/WsXBzl

This shadertoy does the same things my 2d shadertoys were doing. There are two lines that cross each other, and have a different color on each side.

The next is also by inigo. https://www.shadertoy.com/view/WdXfzl

Believe it or not, this is also doing the same thing as the last shadertoy, and my 2d shadertoy. It’s just interpolating colors between lines. Below is the lines without the interpolation/diffusion!

The last shadertoy is by Ricky Reusser (@rickyreusser). https://www.shadertoy.com/view/wdffWj

This one is actually doing the algorithm in 3d. if you are looking at the outside of the shape, those yellow/black zebra stripes are the boundary conditions. The pickish color in the middle is the interpolated values, but they are tinted slightly red, as a variation on the plain vanilla algorithm.

Closing

So, in this blog post we talked about how to use monte carlo integration and the walk on spheres algorithm to interpolate data between very arbitrary borders in any dimension.

This is a large part of the basis of the paper on monte carlo geometry processing, but it goes a lot farther than this as well. I don’t really understand how the other things shown in the paper use WOS to do what they do. As I learn more, I may make more shadertoys and make other blog posts.

Links

Since (signed) distance functions are really important to this technique, you might be wondering where you can go to read more about them. Luckily inigo quilez has you covered, and has been writing stuff about them for years. He talks about them in the context of ray marching, but that is an extemely similar usage case. Basically just imagine that instead of a random direction walk, you walked a specific direction down a ray, from the ray origin, to see what the ray hit. That is ray marching in a nutshell.

Here is information from him about 2d distance functions:
https://www.iquilezles.org/www/articles/distfunctions2d/distfunctions2d.htm

Here is information about 3d distance functions:
https://www.iquilezles.org/www/articles/distfunctions/distfunctions.htm

Here is another write up on this topic
“Walk on Spheres Method in Julia” from Philip Zucker (@SandMouth)
https://www.philipzucker.com/walk-on-spheres-method-in-julia/

Apparently harmonic coordinates are related, and seem to be a generalization of barycentric coordinates.
“Harmonic Coordinates for Character Articulation” from Pixar
https://graphics.pixar.com/library/HarmonicCoordinatesB/paper.pdf

I believe these techniques can be applied to Poisson blending, where you paste pixel deltas (instead of absolute values) into a destination image, so that what you are pasting takes on the lighting and coloring of the destination image.
https://erkaman.github.io/posts/poisson_blending.html

Poisson blending and others are talked about in this next paper. WoS can solve the Laplacian equation, but it can also solve a generalization of the Laplacian equation called the Poisson equation, so Poisson is very related.
“Poisson Image Editing”

Click to access 2003_siggraph_perez.pdf

“Walk on Spheres Method” on wikipedia
https://en.wikipedia.org/wiki/Walk-on-spheres_method

Regarding trying to apply low discrepancy sequences (or blue noise!) to the random walk on spheres, here is some interesting info about quasi random (aka LDS) random walks.
https://hpi.de/friedrich/research/rotor-router/theory.html

Thanks for reading!

Weighted Round Robin (Weighted Random Integers) Using The Golden Ratio Low Discrepancy Sequence

UPDATE 6/25/2020: The bottom of the post now has a section about using 2D low discrepancy sequences with an alias table and compares results vs the 1D methods. The code has been updated as well to include that stuff.

In this post we are going to talk about Round Robin first, then Weighted Round Robin.

The ~500 lines of standalone C++ that generated the data for this blog post can be found at https://github.com/Atrix256/WeightedRR

Round Robin

Let’s say you find yourself in one of these 2 equivalent scenarios:

  1. You want to repeatedly choose from a list of 10 objects in a fair way, where the objects are chosen at roughly the same rate – like, a loot drop table for killing monsters in a game.
  2. You have 10 queues of work that you want to process work items from in a fair way, where work is consumed from the queues at roughly the same rate.

In either case you want a stream of integers between 0 and 9 where ideally any section – large or small – should have roughly even counts of those numbers chosen. In other words, it should have a flat histogram for both large and small sections.

One way to deal with this is to just walk through the numbers 0 through 9 over and over and over. You could increment a number whenever you wanted the next index and use modulus to ensure it was between 0 and 9.

I’m calling this “Sequential” in the graphs and that would work, but depending on your usage case, it might be undesirable that it was so obviously going in order from 0 to 9.

Another way to deal with this would be to roll a random integer 0 through 9 and use that as your answer. In the code, I roll a random float between 0 and 1 and remap that to an integer 0 through 9 so it’s more like the other methods. I’m calling this “White Noise” in the graphs and that also would work in that the histogram is going to be flat after an infinite number of samples, but it won’t be very flat in small numbers of samples.

Yet another way to deal with this is to use a low discrepancy sequence (deterministic or randomized aka blue noise) such as the golden ratio additive recurrence low discrepancy sequence which looks like this:

float GetNextValue(float x)
{
  float newX = x + 1.61803398875f;
  return newX - floor(newX);
}

In the above, the first value you ever plug into this function can be any number. It can be 0.0, but it doesn’t have to be.

1.61803398875 is the golden ratio and since we are dealing with values between 0 and 1, you could also use 0.61803398875 instead which is the fractional part of the golden ratio, but is also 1 divided by the golden ratio (the golden ratio conjugate).

The golden ratio is a good choice here because it’s an irrational number, meaning that from a mathematical point of view (ignoring limitations of computers for a second) it won’t ever repeat values.

The golden ratio is also an excellent choice among irrational numbers because it is the most irrational number. Less irrational numbers – like pi – still don’t repeat but they have “near repeats” which make them less good of a choice. Then there are numbers like the square root of 2, which are pretty decently irrational – more than pi, but less than the golden ratio.

An alternate way to write the code is like this, which gives you “random access” to the sequence, but is more prone to having floating point precision issues.

float GetValue(int index, float seed)
{
  float newX = seed + float(index) * 1.61803398875f;
  return newX - floor(newX);
}

If you use this in our problem for generating floating point values between 0 and 1 and remapping to integers 0 through 9, you will get fairly even counts of occurrences of those integers (a fairly flat histogram) from both small and large sections of the sequence.

Here is a screenshot of the program generating a sequence of 80 items, using the techniques talked about so far.

It’s hard to tell what the histogram looks like from that, but it is still a bit interesting looking at the difference in patterns of the numbers chosen. There are seeming patterns even in the golden ratio because the values are quantized to 10 possible values.

Here are histograms of 100 samples. It shows that white noise is doing noticeably worse than the irrational numbers, but the irrational numbers don’t have a winner that is easy to discern with the naked eye. Sequential can’t even be seen because it is by definition a completely flat histogram and so is a flat line at y=0.1.

Here are histograms of 10,000 samples. White noise and pi look about the same, but golden ratio and square root of 2 are much closer to the actual flat histogram value / sequential sequence.

Here are histograms of 1,000,000 samples. White noise is way worse than everything else, but it’s hard to see a winner from the rest of the group.

Here we remove white noise from the graph to see the others better. Pi shows being the worst. Square root of two is next, and then the golden ratio which is closest to the sequential / flat histogram line.

Round Robin vs Random Integers

When using white noise, round robin is just random integers. It’s the same as rolling dice. So what is the difference?

In the last section I say that using a low discrepancy sequence is better than using white noise, because it reaches the target histogram more accurately with fewer samples – fewer rolls of the dice.

The downside to this though is that the numbers are no longer random… a person viewing the sequence of numbers can almost certainly guess what the next number is going to be.

So, that makes it not useful to use these for places where randomization is important – like when gambling or for cryptography.

But, where this is useful is for places where fairness matters but randomness does not.

One example could be pulling work items from a work queue. Who cares if it is possible to guess which thread is going to get assigned the next item of work? The important thing is that the work is spread evenly.

Another example is in numerical integration in monte carlo integration, and also other stochastic rendering situations. This is more what I care about. In these situations, you don’t care if the choice can be predicted or not, but if you can reduce the amount of error vs the target distribution in fewer samples that means you’ll have less noise in your resulting image, which is ::chef kiss:: oh so great. A better image for the same processing power.

If you were in a situation that wanted randomness, but you still wanted some of these benefits, you should check out “blue noise”, which is a randomized version of low discrepancy sequences. A good place to start might be my post on generating blue noise in 2 dimensions, which you could modify to only generate 1 dimensional blue noise and use for this purpose.

https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/

Weighted Round Robin

Let’s say you find yourself in the same situation as before but now there are weighted probabilities involved.

  • For the loot drop situation, different loot should be awarded with different probabilities.
  • For the work queues, you could have different powered CPUs servicing the work queues maybe. More powerful processors should be more likely to get work served to them perhaps.

For our situation, we’ll say that the weight of item 0 is 1, the weight of item 1 is 2, and so on, up to the weight of item 9 being 10.

Let’s check out our options again.

First, we could go sequentially. This means that our sequence would go like this: 011222333344444… etc.

A possible issue with this (again, depending on your specific needs) is that it is even, but it goes through every occurrence of a number before moving to the next number. If it was desired that the numbers be visited in a more interleaved fashion, you’d want to try something else.

The next thing you could try is to take the item weights and divide them by the total of the item weights, so that they add up to 1.0. This would make them normalized. From here, we could roll a random floating point number between 0 and 1, find the item that has the range that contains that floating point number, and use that as the selected item.

That would solve the problem I mentioned, but would introduce the problem of white noise: while large sample counts have the right histogram (distribution), small sample counts don’t.

We could also use irrational numbers again, which would do better at reaching the correct histogram than white noise but become deterministic like sequential, but still shuffled “a bit”.

Here’s the program output for 80 item sequences of the various techniques:

Here are histograms with 100 samples. The irrationals are all doing pretty well. White noise is not doing great. Strangely, sequential is also not doing real great for the last value because the sequence length doesn’t divide 100 evenly this time!

Here are histograms with 10,000 samples. It’s harder to see for sure who is winning by nature of the graph, compared to the last section, but we can see white noise is not doing great.

Here are histograms with 1,000,000 samples. It’s nearly impossible to see anything about it.

Here is the graph of subtracting sequential from the others. It shows that white noise is by far the worst.

Removing white noise from that graph, we can see that pi is worst, golden ratio is best, and square root of 2 isn’t far behind the golden ratio.

Old Closing (Before Alias Table Section Added)

If you look at the code, you might be saying to yourself that the weighted version isn’t great because you need a loop to convert the 1D floating point value in [0,1] to the weighted item chosen. If you switched to using an alias table, it’d be an O(1) constant time operation, but then the 1D value would not be enough – you’d need a 1d value and a biased coin flip. AKA just a 2d value that is [0,1] on each axis. It would be interesting to compare an alias table / 2D LDS sequence to see how it converged versus this method. I really, really, should have done that in this post and should do it soon.

Another option though would be to use a “branchless, fixed step count, binary search” which would also be a constant time lookup. Basically, if you had 16 items you were searching through, you can do it with four branchless operations. This is great for shaders, SIMD programming, hardware, and similar. You can read more about that technique in my post here: https://blog.demofox.org/2017/06/20/simd-gpu-friendly-branchless-binary-search/

Alias Tables & 2D Low Discrepancy Sequences

This section shows the effect of using 2d low discrepancy sequences with an alias table to find the weighted item in constant time O(1), instead of having to loop through the weights to find the correct item.

Using an alias table is faster, but having to use 2d low discrepancy sequences instead of 1d has the possibility to change things. The concept of evenly spaced and similar ideas gets a lot more open to interpretation when you leave the first dimension.

So, first things first, here’s a graph of 1d white noise compared with 2d white noise used to do a lookup into an alias table, for 10,000 samples. This is to show that the alias table works just like not using one, and that we are doing apples to apples comparisons. I’ve also added a “weights” to the graphs as the actual ground truth of the histogram, instead of relying on sequential which was imperfect.

I’m using three 2d low discrepancy sequences. They are:

  1. R2 – a generalization of the golden ratio to 2 dimensions. (more info here: http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/)
  2. Sobol – a very strong and well known LDS, which is often the best LDS shown in the papers i’ve seen (except when reading blue noise papers, but that’s a different usage case and quality metric hehe)
  3. Golden Ratio / Square root of 2 – I use the golden ratio sequence on the x axis and the square root of 2 sequence on the y axis.

Here are the histograms for 100 samples, comparing the 3 LDSs vs white noise and the actual weights. 1D golden ratio is also here to compare how the 2d alias method techniques compare to the 1d golden ratio sampling. It seems to be showing that white noise and Sobol are the worst outliers, and everything else is better behaved.

Here are 10,000 samples. I’m switching to error (subtracted out “weights”) so it’s easier to see small differences. Now alias table white noise is the single outlier, and everything else seems to be playing pretty nicely and all just about even.

If we remove white noise we can zoom in and see things a little better. It looks like 1D golden ratio is the best, then Sobol. It’s hard to tell, but R2 seems like it might be doing better than golden ratio / square root of 2.

Here is the same for one million samples. 1D golden ratio seems to be the winner here and the others all seem to be doing about as well as each other.

Conclusion: If you are using an alias table, you definitely should use a decent 2d low discrepancy sequence. However, it looks like you will not do as well quality wise, as using 1d golden ratio sampling without an alias table. Maybe there is a different sampling pattern to use in the alias table case that will do better though. If you know of such a thing, please share!

I wanted to show one more thing though before we go. You might notice the graphs all call R2 “additive”. If you are wondering why that is, it’s this 2D irrational low discrepancy sequence can be calculated two different ways, just like the 1D golden ratio low discrepancy sequence can. You can either multiply an index by the irrational number(s) and fract it to get the result, or you can start at index 1, add the irrational number(s) in, fract it, then go to index 2, repeating until you get to the index you want.

Doing it the first way with a multiply causes you to run into numerical issues (error due to floating point rounding, due to finite memory storage for floats), while the second has better precision. The second way is “additive” and i want to show you just how big a difference this makes by showing you the error of both kinds after 1 million samples, along with the error for white noise.

The error from r2 done via multiplication is so large that it dwarves even the error from white noise. It isn’t as noticeable at lower sample counts, because the floating point numbers are smaller, so will hit less severe rounding problems, but it still exists at lower sample counts. This isn’t unique to R2 either… this is true of any irrational additive recurrence low discrepancy sequence (including the 1D golden ratio sequence).

You probably would have better luck with doubles.

Another thing you can do is make it into a fixed point “unsigned int” representation of the irrational so that you don’t waste storage bits on features you don’t need from floats. More info on that here:
http://marc-b-reynolds.github.io/distribution/2020/01/24/Rank1Pre.html

Links

If you found this post interesting, you might like this other post, which talks about using LDS and blue noise for making loot drop tables, and how that helps the average player experience be more predictable, compared to regular old random numbers.

“Using Low Discrepancy Sequences & Blue Noise in Loot Drop Tables for Games”
https://blog.demofox.org/2020/03/01/using-low-discrepancy-sequences-blue-noise-in-loot-drop-tables-for-games/

There is a great video from numberphile that talks about why the golden ratio is the most irrational number. Check it out 🙂

When making the alias table, i used the stable Vose method from this page: https://www.keithschwarz.com/darts-dice-coins/

Casual Shadertoy Path Tracing 3: Fresnel, Rough Refraction & Absorption, Orbit Camera

Posts in this series:

  1. Basic Camera, Diffuse, Emissive
  2. Image Improvement and Glossy Reflections
  3. Fresnel, Rough Refraction & Absorption, Orbit Camera

Below is a screenshot of the shadertoy that goes with this post. Click to view full size. That shadertoy can be found at:
https://www.shadertoy.com/view/ttfyzN

On the menu today is:

  • Fresnel – This makes objects shinier at grazing angles which increases realism.
  • Rough Refraction & Absorption – This makes it so we can have transparent objects, for various definitions of the term transparent.
  • Orbit Camera – This lets you control the camera with the mouse to be able to see the scene from different angles

Fresnel

First up is the fresnel effect. I said I wasn’t going to do it in this series, but it really adds a lot to the end result, and we are going to need it (and related parameters) for glossy reflections anyways.

Fresnel makes objects shinier when you view them at grazing angles and helps objects look more realistic. In the image below, the left sphere does not have fresnel, but the right sphere does. Fresnel adds a better sense of depth and shape.

The fresnel function we are going to use is this, so put this in common or buffer A above the raytracing shading code.

float FresnelReflectAmount(float n1, float n2, vec3 normal, vec3 incident, float f0, float f90)
{
        // Schlick aproximation
        float r0 = (n1-n2) / (n1+n2);
        r0 *= r0;
        float cosX = -dot(normal, incident);
        if (n1 > n2)
        {
            float n = n1/n2;
            float sinT2 = n*n*(1.0-cosX*cosX);
            // Total internal reflection
            if (sinT2 > 1.0)
                return f90;
            cosX = sqrt(1.0-sinT2);
        }
        float x = 1.0-cosX;
        float ret = r0+(1.0-r0)*x*x*x*x*x;

        // adjust reflect multiplier for object reflectivity
        return mix(f0, f90, ret);
}

n1 is the “index of refraction” or “IOR” of the material the ray started in (air, and we are going to use 1.0). n2 is the “index of refraction” of the material of the object being hit. normal is the normal of the surface where the ray hit. incident is the ray direction when it hit the object. f0 is the minimum reflection of the object (when the ray and normal are 0 degrees apart), f90 is the maximum reflection of the object (when the ray and normal are 90 degrees apart).

For the index of refraction of objects, i used a value of “1.0” in the image with the 2 orange spheres above. for f90, the maximum reflection of the objects, we are going to just use “1.0” to make objects fully reflective at the edge.

Below are spheres with a minimum reflection (reflection chance!) of 0.02, and the IOR go from 1 (on the left) to 2 (on the right). To me, the one on the right looks like a pearl. (this is the SCENE define value 2 scene in the shadertoy)

To apply fresnel to our shader, find this code:

// calculate whether we are going to do a diffuse or specular reflection ray 
float doSpecular = (RandomFloat01(rngState) < hitInfo.material.percentSpecular) ? 1.0f : 0.0f;

Change it to this:

// apply fresnel
float specularChance = hitInfo.material.percentSpecular;
if (specularChance > 0.0f)
{
    specularChance = FresnelReflectAmount(
        1.0,
        hitInfo.material.IOR,
        rayDir, hitInfo.normal, hitInfo.material.percentSpecular, 1.0f);  
}
      
// calculate whether we are going to do a diffuse or specular reflection ray 
float doSpecular = (RandomFloat01(rngState) < specularChance) ? 1.0f : 0.0f;

Besides that, a float for "IOR" needs to be added to SMaterialInfo too. Here is the scene from the end of last chapter, using fresnel and with everything using an IOR of 1.0.

And here is the scene from last chapter without fresnel. The difference is pretty subtle so you might want to open each image in a browser tab to flip back and forth. The biggest difference is on the edges of the yellow and pink sphere. Fresnel makes the biggest difference for objects that are a little bit shiny. Objects that are very shiny or not at all shiny won't have as noticeable fresnel effects.

Better Diffuse / Specular Selection

In the last post we used a random number tested against a percent chance for specular to choose whether a reflected ray was going to be a diffuse ray or a specular ray.

When doing that, we should have also divided the throughput by the probability of the ray actually chosen (to not make one count more than the other in the final average) but we didn’t do that. This is a “mathy detail” but easy enough to put in casually, so let’s do the right thing.

Find the place where the doSpecular float is set based on the random roll. Put this underneath that:

// get the probability for choosing the ray type we chose
float rayProbability = (doSpecular == 1.0f) ? specularChance : 1.0f - specularChance;
        
// avoid numerical issues causing a divide by zero, or nearly so (more important later, when we add refraction)
rayProbability = max(rayProbability, 0.001f);     

Then put this right before doing the russian roulette:

// since we chose randomly between diffuse and specular,
// we need to account for the times we didn't do one or the other.
throughput /= rayProbability;

The biggest difference from the previous image is that the pink and yellow spheres brighten up a bit, and so do their reflections, of course!

Rough Refraction & Absorption

Time for the fun! We’ll talk about the features and show the results in this section, then show how to get them into the path tracer in the next section.

Thinking about the last post for a second… in simple rendering, and old style raytracers, reflection works by using the reflect() formula/function to find the perfectly sharp mirror reflection angle for a ray hitting a surface, and traces that ray. In the last post we showed how to inject some randomness into that reflected ray, by using a material roughness value to lerp between the perfectly reflected ray and a random ray over the normal oriented hemisphere.

For refraction we are going to do much the same thing.

For simple rendering and old style raytracers, there is a refract() formula/function that find a perfectly sharp refraction angle for a ray hitting a surface, where that surface has a specific IOR (Index of refraction. Same parameter as used for fresnel). The only difference is that since refracted rays go THROUGH an object instead of reflecting off an object, the random hemisphere is going to be oriented with the negative normal of the surface.

Below is an image of refractive spheres going from IOR 1 on the left, to 1.5 on the right. Notice how the background gets distorted as the IOR increases, and also notice how the light under the spheres gets focused. Those are “caustics” and get way more interesting with other shaped meshes. (this is SCENE define value 1 in the shadertoy)

You might notice some dark streaks on the ground under the spheres in the middle. When i first saw these, i thought it was a bug, but through experimentation found out that they are actually projections of the images in the sky (the top of the skybox).

To further demonstrate this, here’s a scene where the spheres are all the same IOR but they are different distances from the floor, which focuses/defocuses the image projected through the sphere, as well as the light above the spheres. (SCENE define value 4 in the shadertoy)

Changing the skybox, it looks like this. Notice the projection on the ground at the left has that circle shape?

Looking up into at what is above the spheres you can see that circle that was projected onto the ground. This is one of the coolest things about path tracing… you get a lot of techniques “for free” that are just emergent features of the math. It’s a lot different than approaching graphics in rasterization / non path traced rendering, where every feature you want, you basically have to make explicit code for to approximate.

Going back to roughness, here are some refractive spheres using an IOR of 1.1, but varying in roughness from 0 on the left, to 0.5 on the right. Notice that the spheres themselves get obviously more rough, but also their shadow / caustics change with roughness too. (SCENE define value 6 in the shadertoy)

It’s important to notice that in these, it’s just the SURFACE of the sphere that has any roughness to it. If you were able to take one of these balls and crack it in half, the inside would be completely smooth and transparent. In a future post (next, i think!) we’ll talk about how to make some roughness inside of objects, which is also how you render smoke, fog, clouds, and also things like skin, milk, and wax.

Another fun feature you can add to transparent objects is “absorption” which means that light is absorbed over distance as it travels through the object.

We’re going to use Beer’s law to get a multiplier for light that travels through the object, to make that light decrease. The formula for that is just this:

\text{Multiplier} = e^{(-\text{absorb} \cdot \text{distance})}

For that, we can use a different absorption value per color channel so that different colors absorb quicker or slower as the light goes through the object.

Here are some spheres that have progressively more absorption. A percentage multiplier goes from 0 on the left to 1 on the right and is multiplied by (1.0, 2.0, 3.0) to get the absorption value. Notice how the spheres change but so do the caustics / shadows. This is SCENE define value 3 in the shadertoy.

You can combine roughness and absorption to make some interesting things. Here is the same spheres but with some absorption (SCENE define value 0).

Here’s the same scene from a different view, which shows how rich the shading for these objects are.

Implementing Rough Refraction & Absorption

Let’s implement this stuff!

The first thing to do is to add more fields to our SMaterialInfo struct.

We should have something roughly like this right now:

struct SMaterialInfo
{
    vec3 albedo;           // the color used for diffuse lighting
    vec3 emissive;         // how much the surface glows
    float percentSpecular; // percentage chance of doing specular instead of diffuse lighting
    float roughness;       // how rough the specular reflections are
    vec3 specularColor;    // the color tint of specular reflections
    float IOR;             // index of refraction. used by fresnel and refraction.
};

Change it to this:

struct SMaterialInfo
{
    // Note: diffuse chance is 1.0f - (specularChance+refractionChance)
    vec3  albedo;              // the color used for diffuse lighting
    vec3  emissive;            // how much the surface glows
    float specularChance;      // percentage chance of doing a specular reflection
    float specularRoughness;   // how rough the specular reflections are
    vec3  specularColor;       // the color tint of specular reflections
    float IOR;                 // index of refraction. used by fresnel and refraction.
    float refractionChance;    // percent chance of doing a refractive transmission
    float refractionRoughness; // how rough the refractive transmissions are
    vec3  refractionColor;     // absorption for beer's law    
};

You might notice that there is both a specular roughness as well as a refraction roughness. That could be combined into a single “surface roughness” which was used by both if desired. You lose the ability to make reflections have different roughness than refraction, but that wouldn’t be a huge loss.

Also, there is a specular chance, and refraction chance, with an implied diffuse chance making those sum to 1.0.

Instead of doing chances that way, another thing to try could be to make the diffuse chance be the sum of the albedo components, the specular chance be the sum of the specular color components, and the refraction chance be the sum of refraction color components. You could then divide them all by the sum of all 3 chances, so that they summed up to 1.0.

This would let you get rid of those percentages, and should be pretty good choices. The only weird thing is that the meaning of refraction color is sort of reversed compared to the others. The others are how much light to let through, but refraction color is how much light to block. You might have better luck subtracting refraction color from (1,1,1) and using that to make a refraction chance.

Since the light from rays is divided by the probability of choosing that type of ray (diffuse, specular, or refractive), it doesn’t really matter how you choose the probabilities for choosing the ray type, but when the probabilities better match the contribution to the pixel’s final color (more contribution = higher percentage), you are going to get a better (more converged) image more quickly. This is straight up importance sampling, and i’ll stop there so we can keep this casual 🙂

The next change is to address a problem we are likely to hit. Now that our material has quite a few components, it would be real easy to forget to set one when setting material properties. If you ever forget to set one, you are either going to have uninitialized values, or values coming from a previous ray hit that is no longer valid. That is going to make some real hard to track bugs, along with the possibility of undefined behavior due to uninitialized values (AFAIK). So, I made a function that initializes a material roughly to zero, but mainly just to sane values. The idea is that you call this function to init a material, and then set the specific values you care about.

SMaterialInfo GetZeroedMaterial()
{
    SMaterialInfo ret;
    ret.albedo = vec3(0.0f, 0.0f, 0.0f);
    ret.emissive = vec3(0.0f, 0.0f, 0.0f);
    ret.specularChance = 0.0f;
    ret.specularRoughness = 0.0f;
    ret.specularColor = vec3(0.0f, 0.0f, 0.0f);
    ret.IOR = 1.0f;
    ret.refractionChance = 0.0f;
    ret.refractionRoughness = 0.0f;
    ret.refractionColor = vec3(0.0f, 0.0f, 0.0f);
    return ret;
}

Every time i set material info (like, every time a ray intersection passes), i first call this function, and then set the specific values I care about. I also call it on the SRayHitInfo struct itself when i first declare it in GetColorForRay().

Next, also add a bool “fromInside” to the SRayHitInfo. We are going to need this to know when we intersect an object if we hit it from the inside, or the outside. Refraction made it so we can go inside of objects, and we need to know when that happens now.

struct SRayHitInfo
{
    bool fromInside;
    float dist;
    vec3 normal;
    SMaterialInfo material;
};

In TestQuadTrace(), where it modifies the ray hit info, make sure and set fromInside to false. We are going to say there is no inside to a quad. You could make it so the negative side of the quad (based on it’s normal) was inside, and maybe build boxes etc out of quads, but I decided not to deal with that complexity.

TestSphereTrace() already has the concept of whether the ray hit the sphere from the inside or not, but it doesn’t expose it to the caller. I just have it set the ray hit info fromInside bool based on that fromInside bool that already exists.

The various uses of roughness and percentSpecular are going to need to change to specularRoughness and specularChance.

The final shading inside the bounce for loop changes quite a bit so i’m going to just paste this below, but it is commented pretty heavily so hopefully makes sense.

The biggest thing worth explaining I think is how absorption happens. We don’t know how much absorption is going to happen when we enter an object because we don’t know how far the ray will travel through the object. Because of this, we can’t change the throughput to account for absorption when entering an object. We need to instead wait until we hit the far side of the object and then can calculate absorption and update the throughput to account for it. Another way of looking at this is that “when we hit an object from the inside, it means we should calculate and apply absorption”. This also handles the case of where a ray might bounce around inside of an object multiple times before leaving (due to specular reflection and fresnel happening INSIDE an object) – absorption would be calculated and applied at each internal specular bounce.

for (int bounceIndex = 0; bounceIndex < = c_numBounces; ++bounceIndex)
{
    // shoot a ray out into the world
    SRayHitInfo hitInfo;
    hitInfo.material = GetZeroedMaterial();
    hitInfo.dist = c_superFar;
    hitInfo.fromInside = false;
    TestSceneTrace(rayPos, rayDir, hitInfo);
    
    // if the ray missed, we are done
    if (hitInfo.dist == c_superFar)
    {
        ret += SRGBToLinear(texture(iChannel1, rayDir).rgb) * c_skyboxBrightnessMultiplier * throughput;
        break;
    }
    
    // do absorption if we are hitting from inside the object
    if (hitInfo.fromInside)
        throughput *= exp(-hitInfo.material.refractionColor * hitInfo.dist);
    
    // get the pre-fresnel chances
    float specularChance = hitInfo.material.specularChance;
    float refractionChance = hitInfo.material.refractionChance;
    //float diffuseChance = max(0.0f, 1.0f - (refractionChance + specularChance));
    
    // take fresnel into account for specularChance and adjust other chances.
    // specular takes priority.
    // chanceMultiplier makes sure we keep diffuse / refraction ratio the same.
    float rayProbability = 1.0f;
    if (specularChance > 0.0f)
    {
        specularChance = FresnelReflectAmount(
            hitInfo.fromInside ? hitInfo.material.IOR : 1.0,
            !hitInfo.fromInside ? hitInfo.material.IOR : 1.0,
            rayDir, hitInfo.normal, hitInfo.material.specularChance, 1.0f);
        
        float chanceMultiplier = (1.0f - specularChance) / (1.0f - hitInfo.material.specularChance);
        refractionChance *= chanceMultiplier;
        //diffuseChance *= chanceMultiplier;
    }
    
    // calculate whether we are going to do a diffuse, specular, or refractive ray
    float doSpecular = 0.0f;
    float doRefraction = 0.0f;
    float raySelectRoll = RandomFloat01(rngState);
    if (specularChance > 0.0f && raySelectRoll < specularChance)
    {
        doSpecular = 1.0f;
        rayProbability = specularChance;
    }
    else if (refractionChance > 0.0f && raySelectRoll < specularChance + refractionChance)
    {
        doRefraction = 1.0f;
        rayProbability = refractionChance;
    }
    else
    {
        rayProbability = 1.0f - (specularChance + refractionChance);
    }
    
    // numerical problems can cause rayProbability to become small enough to cause a divide by zero.
    rayProbability = max(rayProbability, 0.001f);
    
    // update the ray position
    if (doRefraction == 1.0f)
    {
        rayPos = (rayPos + rayDir * hitInfo.dist) - hitInfo.normal * c_rayPosNormalNudge;
    }
    else
    {
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
    }
     
    // Calculate a new ray direction.
    // Diffuse uses a normal oriented cosine weighted hemisphere sample.
    // Perfectly smooth specular uses the reflection ray.
    // Rough (glossy) specular lerps from the smooth specular to the rough diffuse by the material roughness squared
    // Squaring the roughness is just a convention to make roughness feel more linear perceptually.
    vec3 diffuseRayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));
    
    vec3 specularRayDir = reflect(rayDir, hitInfo.normal);
    specularRayDir = normalize(mix(specularRayDir, diffuseRayDir, hitInfo.material.specularRoughness*hitInfo.material.specularRoughness));

    vec3 refractionRayDir = refract(rayDir, hitInfo.normal, hitInfo.fromInside ? hitInfo.material.IOR : 1.0f / hitInfo.material.IOR);
    refractionRayDir = normalize(mix(refractionRayDir, normalize(-hitInfo.normal + RandomUnitVector(rngState)), hitInfo.material.refractionRoughness*hitInfo.material.refractionRoughness));
            
    rayDir = mix(diffuseRayDir, specularRayDir, doSpecular);
    rayDir = mix(rayDir, refractionRayDir, doRefraction);
    
    // add in emissive lighting
    ret += hitInfo.material.emissive * throughput;
    
    // update the colorMultiplier. refraction doesn't alter the color until we hit the next thing, so we can do light absorption over distance.
    if (doRefraction == 0.0f)
        throughput *= mix(hitInfo.material.albedo, hitInfo.material.specularColor, doSpecular);
    
    // since we chose randomly between diffuse, specular, refract,
    // we need to account for the times we didn't do one or the other.
    throughput /= rayProbability;
    
    // Russian Roulette
    // As the throughput gets smaller, the ray is more likely to get terminated early.
    // Survivors have their value boosted to make up for fewer samples being in the average.
    {
        float p = max(throughput.r, max(throughput.g, throughput.b));
        if (RandomFloat01(rngState) > p)
            break;

        // Add the energy we 'lose' by randomly terminating paths
        throughput *= 1.0f / p;            
    }
}

Orbit Camera

Now that we can do all these cool renders, it kind of sucks that the camera is stuck in one place.

It really is a lot of fun being able to move the camera around and look at things from different angles. It’s also really helpful in debugging – like when we were able to see the circle on the ceiling of the skybox!

Luckily an orbit camera is pretty easy to add!

First drop this function in the buffer A tab, right above the mainImage function. You can leave the constants there or put them into the common tab, whichever you’d rather do.

// mouse camera control parameters
const float c_minCameraAngle = 0.01f;
const float c_maxCameraAngle = (c_pi - 0.01f);
const vec3 c_cameraAt = vec3(0.0f, 0.0f, 20.0f);
const float c_cameraDistance = 20.0f;

void GetCameraVectors(out vec3 cameraPos, out vec3 cameraFwd, out vec3 cameraUp, out vec3 cameraRight)
{
    // if the mouse is at (0,0) it hasn't been moved yet, so use a default camera setup
    vec2 mouse = iMouse.xy;
    if (dot(mouse, vec2(1.0f, 1.0f)) == 0.0f)
    {
        cameraPos = vec3(0.0f, 0.0f, -c_cameraDistance);
        cameraFwd = vec3(0.0f, 0.0f, 1.0f);
        cameraUp = vec3(0.0f, 1.0f, 0.0f);
        cameraRight = vec3(1.0f, 0.0f, 0.0f);
        return;
    }
    
    // otherwise use the mouse position to calculate camera position and orientation
    
    float angleX = -mouse.x * 16.0f / float(iResolution.x);
    float angleY = mix(c_minCameraAngle, c_maxCameraAngle, mouse.y / float(iResolution.y));
    
    cameraPos.x = sin(angleX) * sin(angleY) * c_cameraDistance;
    cameraPos.y = -cos(angleY) * c_cameraDistance;
    cameraPos.z = cos(angleX) * sin(angleY) * c_cameraDistance;
    
    cameraPos += c_cameraAt;
    
    cameraFwd = normalize(c_cameraAt - cameraPos);
    cameraRight = normalize(cross(vec3(0.0f, 1.0f, 0.0f), cameraFwd));
    cameraUp = normalize(cross(cameraFwd, cameraRight));   
}

The mainImage function also changes quite a bit between the seeding of rng and the raytracing – basically the stuff that controls jitter for TAA and calculates the camera vectors. The whole function is below, hopefully well enough commented to understand.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);    
    
    // calculate subpixel camera jitter for anti aliasing
    vec2 jitter = vec2(RandomFloat01(rngState), RandomFloat01(rngState)) - 0.5f;

    // get the camera vectors
    vec3 cameraPos, cameraFwd, cameraUp, cameraRight;
    GetCameraVectors(cameraPos, cameraFwd, cameraUp, cameraRight);    
    vec3 rayDir;
    {   
        // calculate a screen position from -1 to +1 on each axis
        vec2 uvJittered = (fragCoord+jitter)/iResolution.xy;
        vec2 screen = uvJittered * 2.0f - 1.0f;
        
        // adjust for aspect ratio
        float aspectRatio = iResolution.x / iResolution.y;
        screen.y /= aspectRatio;
                
        // make a ray direction based on camera orientation and field of view angle
        float cameraDistance = tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);       
        rayDir = vec3(screen, cameraDistance);
        rayDir = normalize(mat3(cameraRight, cameraUp, cameraFwd) * rayDir);
    }
    
    // raytrace for this pixel
    vec3 color = vec3(0.0f, 0.0f, 0.0f);
    for (int index = 0; index  0.1);
    
    // average the frames together
    vec4 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy);
    float blend = (iFrame  0.0 || lastFrameColor.a == 0.0f || spacePressed) ? 1.0f : 1.0f / (1.0f + (1.0f / lastFrameColor.a));
    color = mix(lastFrameColor.rgb, color, blend);

    // show the result
    fragColor = vec4(color, blend);
}

After you’ve done that, you can drag the left mouse in the image to move the camera around 🙂

If you follow these steps with the scene from last post, and change the central sphere to be a bit refractive, you can finally view the box from a different angle!

Closing

Before we go, I want to show some refractive spheres with IOR 1, refraction roughness 0, but fading between a purely refractive surface on the left, to a purely diffuse surface on the right. This is SCENE define value 5 in the shadertoy.

Compare this one to the purely absorption scene shown before, and also the purely roughness scene.

All of these 3 obscured what was behind them more as the effect was turned up. All of them also could have the effect partially in effect, to any level.

So, all 3 of these examples were semi transparent.

Which of these is it we are talking about when we are doing traditional alpha blending in a rasterized scene?

Well it turns out to be a bit ambiguous.

If the shadow of a semi transparent object is colored, that means absorption is the best model for it. If the shadow of a semi transparent object is just darkened, the partially diffuse surface may be the best model (could also be absorption though). If the object doesn’t cast a shadow (or barely does), it is an object made semi transparent by roughness on the surface of the object, or INSIDE the object (we’ll talk about that case later). That is especially true if the appearance (lighting) of the object changes significantly when the lighting behind it (or around it) changes, vs an object which is only lit from the front.

Kind of a strange thing thinking about alpha as all these different effects, but good to be aware of it too if doing realistic rendering.

Who knew transparency would be so opaque?

Oh PS – a color tip I got from an artist once was to never use pure primary colors (I think it was Ron Harvey @ monolith. If you are reading this Ron, hello!). This is nice from an artistic standpoint, but also, if you think about how lighting is multiplied by these various colors, whenever you make any color channel 0, it means none of that color channel will make it through that multiplication. Completely killing light of any kind feels wrong and does make things look subtly wrong (or not subtly!), because nothing in the real world is so perfect or pure that it doesn’t reflect SOME light of a different frequency than it’s main frequency or frequencies. If it were so impossibly pure, dust would fall on it and change that. Also, it’s probably a good idea to stay away from 1 for similar reasons – nothing perfectly reflects anything in the real world, there is always some imperfection – again, if it were, dust would fall on it and change it.

I think volumetrics and depth of field are next 🙂

Thanks for reading, and i’d love to see anything you make. You can find me at @Atrix256

Casual Shadertoy Path Tracing 2: Image Improvement and Glossy Reflections

Posts in this series:

  1. Basic Camera, Diffuse, Emissive
  2. Image Improvement and Glossy Reflections
  3. Fresnel, Rough Refraction & Absorption, Orbit Camera

Below is a screenshot of the shadertoy that goes with this post. Click to view full size. That shadertoy can be found at:
https://www.shadertoy.com/view/WsBBR3

In this blog post we are going to improve the quality of our rendered image from last time, and also add specular / glossy reflections, to end up with the image above. Compare that to the image below, which is where we are starting from, after part 1.

On the menu today is…

  • Anti Aliasing – This makes pixelated edges be smooth, which makes for a much better render
  • sRGB – The space we do lighting calculations in are not the same as the display wants. This fixes that.
  • Tone Mapping – This converts unbounded lighting to a displayable range.
  • Exposure – This lets us brighten or darken our image before tone mapping, just like a camera’s exposure setting does.
  • Glossy Reflections – We make some objects shiny

Anti Aliasing

Adding anti aliasing to a path tracer is super easy. All we need to do is add a random number between -0.5 and +0.5 to our pixel’s location, on the x and y axis.

Since the pixel color is the average over lots of frames, what this does is give us the “average color of the pixel”. That means if the edge of a shape goes through a pixel, the pixel will average the color of the shape, and also the color of whatever is in the background. That gives us smooth anti aliased edges.

To do this, find this code:

// calculate coordinates of the ray target on the imaginary pixel plane.
// -1 to +1 on x,y axis. 1 unit away on the z axis
vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);

and replace it with this:

// calculate subpixel camera jitter for anti aliasing
vec2 jitter = vec2(RandomFloat01(rngState), RandomFloat01(rngState)) - 0.5f;
    
// calculate coordinates of the ray target on the imaginary pixel plane.
// -1 to +1 on x,y axis. 1 unit away on the z axis
vec3 rayTarget = vec3(((fragCoord+jitter)/iResolution.xy) * 2.0f - 1.0f, cameraDistance);

Doing that gives us this image below. You might want to click to view it full size and compare it vs last image, to see how the edges of the spheres are smooth now. Viewing the smaller image hides the pixelation.

sRGB

The colors that we send to a display are not the same as the colors that we see. This is because human vision is not linear.

With only 256 shades of color in 8 bit per color channel images, if we split the shades evenly, we’d find that there wasn’t enough dark shades because our eyes can see more detail in darker shades.

To make up for this, sRGB was made, which makes it so the values between 0 and 255 are not linearly spaced – there are more values in darker colors and less in the lighter colors.

If we do our lighting calculations in sRGB though, the math doesn’t work right because there was a non linear transform applied to our color channels.

To fix this, whenever we read a color from a texture, we should convert from sRGB to linear. Also, whenever we write out a pixel, we want to convert from linear to sRGB so it shows up correctly on the display.

We need to add these functions to our shadertoy. Make a new “common” tab and put it in there. The common tab is a header included by all other tabs, so things we put there are usable by all other tabs. In the shader that goes with this post, i put many utility functions, constants, and user controllable parameters in the common tab.

vec3 LessThan(vec3 f, float value)
{
    return vec3(
        (f.x < value) ? 1.0f : 0.0f,
        (f.y < value) ? 1.0f : 0.0f,
        (f.z < value) ? 1.0f : 0.0f);
}

vec3 LinearToSRGB(vec3 rgb)
{
    rgb = clamp(rgb, 0.0f, 1.0f);
    
    return mix(
        pow(rgb, vec3(1.0f / 2.4f)) * 1.055f - 0.055f,
        rgb * 12.92f,
        LessThan(rgb, 0.0031308f)
    );
}

vec3 SRGBToLinear(vec3 rgb)
{
    rgb = clamp(rgb, 0.0f, 1.0f);
    
    return mix(
        pow(((rgb + 0.055f) / 1.055f), vec3(2.4f)),
        rgb / 12.92f,
        LessThan(rgb, 0.04045f)
	);
}

ERRATA SRGBToLinear() just returned the input, in the original version of this post. That was incorrect, and correcting that changes the results you get from here on out. The only thing using the function was the skybox lookup, but that means you will get different results than are shown in the screenshots here. The problem isn’t on your end, it’s mine. Sorry about that! In fact, instead of skybox lighting being too bright as i talk about later on and having to multiply it by 0.5, it ends up being too dim and in the final shader you’ll see i multiply it by 2. Apologies and thank you for reading 🙂

There are two places we need to use this functionality. The first place is where we read the cube map when rays miss the scene geometry.

Find this code:

// if the ray missed, we are done
if (hitInfo.dist == c_superFar)
{
    ret += texture(iChannel1, rayDir).rgb * throughput;
    break;
}

Change it to this:

// if the ray missed, we are done
if (hitInfo.dist == c_superFar)
{
    ret += SRGBToLinear(texture(iChannel1, rayDir).rgb) * throughput;
    break;
}

The other place we want to change is where we write the final pixel color out.

In the “Image” tab, find this code:

vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;

and replace it with this:

vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;

// convert from linear to sRGB for display
color = LinearToSRGB(color);

Doing that, we get this image:

You might notice the balls changed color significantly. That’s because they were sRGB colors before but now they are linear colors. We can adjust their colors to make them look better though. If we change the 0.75s if their albedo to 0.5s, that gives us an image like this:

Tone Mapping & Exposure

The next thing to tackle is the fact that when we are calculating lighting, the range of light goes from 0 (perfect black) up to an unbounded amount of brightness, but our display can only display values between 0 and 1.

Tone mapping is the process of remapping the values to the 0 to 1 range. Tone mapping preserves detail in the darks while also making pixels that are too bright also have recognizable detail. We are going to use a luminance only fit of the ACES tone mapper. This tone mapping code was made by Krzysztof Narkowicz and you can read more about it on his website: https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/

Here is the function to put in the common tab:

// ACES tone mapping curve fit to go from HDR to LDR
//https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/
vec3 ACESFilm(vec3 x)
{
    float a = 2.51f;
    float b = 0.03f;
    float c = 2.43f;
    float d = 0.59f;
    float e = 0.14f;
    return clamp((x*(a*x + b)) / (x*(c*x + d) + e), 0.0f, 1.0f);
}

That function takes linear color in and gives linear color out, so we need to call it after we get the raw color, but before we convert that color to sRGB, in the image tab.

    vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;

    // convert unbounded HDR color range to SDR color range
    color = ACESFilm(color);

    // convert from linear to sRGB for display
    color = LinearToSRGB(color);
    
    fragColor = vec4(color, 1.0f);

Doing that we get this image.

The lighting on the back wall has mellowed out a bit and looks a lot more natural which is nice. It still seems a bit bright though.

To fix this we are going to do 2 things. Firstly, we are going to multiply the value we read from the skybox by 0.5 to darken the sky box a little. (i won’t paste code for that, just multiply the texture value by 0.5 after converting it to linear)

Secondly, we are going to apply EXPOSURE to the scene, by multiplying the color by a value of 0.5 before we put it through the tone mapper. That makes it so the entirety of the code in the “Image” tab is this:

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;

    // apply exposure (how long the shutter is open)
    color *= c_exposure;

    // convert unbounded HDR color range to SDR color range
    color = ACESFilm(color);

    // convert from linear to sRGB for display
    color = LinearToSRGB(color);
    
    fragColor = vec4(color, 1.0f);
}

With those two changes, our render now looks like this:

Glossy Reflections

Now for the part that everyone is waiting for: glossy reflections to make shiny things!

If you’ve done simple lighting in shaders before, you’ve probably used reflect() to calculate a light reflection ray to do some specular highlights (shiny spots) on objects.

We are going to put albedo and emissive into a SMaterialInfo struct inside of SRayHitInfo, and we are also going to add these fields:

  • percentSpecular – A float between 0 and 1. What percentage of the light that hits this object is going to be reflected specularly instead of diffusely. This is the percentage chance that a ray hitting this surface is going to choose a specular reflection instead of a diffuse one
  • roughness – A float between 0 and 1. How rough the surface is, which controls how blurry the reflection is. A value of 0 is a very sharp clean mirror like reflection, and a value of 1 is so blurry it looks just like diffuse.
  • specularColor – A vec3 which is the color of specular reflections, just like how albedo is the color of diffuse reflections. This lets you have different colors for diffuse and specular reflections, letting you have colored metal and similar.

Here is how our shading is going to work:

  • We are going to roll a random number from 0 to 1. If it’s less than percentSpecular, we are doing a specular ray, else we are doing a diffuse ray.
  • Diffuse rays use cosine weighted random hemisphere samples like before.
  • A fully rough specular ray is going to use that diffuse ray direction.
  • A zero roughness specular ray is going to use a ray direction calculated from reflect().
  • A specular ray with roughness between 0 and 1 is going to square the roughness (personal preference, you don’t have to square it), and use that value to linearly interpolate between the specular and diffuse ray directions, normalizing the result.
  • Instead of always multiplying throughput by albedo, if it’s a specular ray, we are going to instead multiply it by specularColor

That is all there is to it! Find this code:

// update the ray position
rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;

// calculate new ray direction, in a cosine weighted hemisphere oriented at normal
rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));

// add in emissive lighting
ret += hitInfo.emissive * throughput;

// update the colorMultiplier
throughput *= hitInfo.albedo;

And replace it with this:

// update the ray position
rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;

// calculate whether we are going to do a diffuse or specular reflection ray 
float doSpecular = (RandomFloat01(rngState) < hitInfo.material.percentSpecular) ? 1.0f : 0.0f;

// Calculate a new ray direction.
// Diffuse uses a normal oriented cosine weighted hemisphere sample.
// Perfectly smooth specular uses the reflection ray.
// Rough (glossy) specular lerps from the smooth specular to the rough diffuse by the material roughness squared
// Squaring the roughness is just a convention to make roughness feel more linear perceptually.
vec3 diffuseRayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));
vec3 specularRayDir = reflect(rayDir, hitInfo.normal);
specularRayDir = normalize(mix(specularRayDir, diffuseRayDir, hitInfo.material.roughness * hitInfo.material.roughness));
rayDir = mix(diffuseRayDir, specularRayDir, doSpecular);

// add in emissive lighting
ret += hitInfo.material.emissive * throughput;

// update the colorMultiplier
throughput *= mix(hitInfo.material.albedo, hitInfo.material.specularColor, doSpecular);

If we update our scene geometry and materials a bit, we now can get the final image:

The green balls in the back have a percentSpecular of 1.0, so they are all specular. Their roughness from left to right is: 0, 0.25, 0.5, 0.75, 1.0. Their specular color is (0.3, 1.0, 0.3) which makes them have a green tint. In PBR speak, this would be a green metal (we aren't using PBR specular calculations though!).

The yellow ball in the front left has a percent specular of 0.1, a roughness of 0.2 and a specular color of (0.9f, 0.9f, 0.9f). In PBR speak, this is a dielectric.

The pinkish ball in the front center has the same except a percent specular of 0.3, so is shinnier. (Still a dielectric)

The blue and magenta ball in the front right is there to show that if you pick bad material values, you can get something that doesn't look very good. That ball has an albedo of pure blue, a percent specular of 0.5, a roughness of 0.5 and a specular color of pure red. That isn't a very realistic thing you could see in the real world – a blue object with red reflections – so it doesn't look very good in the render. Garbage in, garbage out. In PBR speak, you could get this by having a dual layer material… a blue very rough or diffuse dielectric, with a clear coat that acts like a metal? It doesn't make much sense in PBR which further shows that it isn't a very well motivated material, which is why it doesn't look very good.

If you feel like taking things a little further, our render could look better with fresnel, making glancing ray angles be more reflective. I'm not planning on doing that in this series but I do have a blog post on it here: https://blog.demofox.org/2017/01/09/raytracing-reflection-refraction-fresnel-total-internal-reflection-and-beers-law/

Bonus 1: Russian Roulette

Each ray will bounce up to 8 times, but there are times when it really doesn’t need 8 bounces to give the right look.

If we were somehow able to detect this, we could end a ray early, especially if we knew how to handle the probabilities correctly to not add bias from ending early.

Russian roulette is a method for doing just that.

We are going to find the maximum color channel value in our throughput, R, G or B, and that value will be the percentage chance that we keep going with our ray after each bounce. When we do keep going, we are going to divide our throughput by that chance of keeping the ray, which makes up for the fact that we kill some of the rays there some of the other times, by making future bounces brighter.

What this means is that as the throughput gets lower, meaning future ray bounces are likely to impact the final image less, we become more likely to stop the ray bouncing around early.

This is mainly just a performance optimization, but is good because it lets you converge faster due to being able to get more samples more quickly.

Just put this code right under where we update the throughput by multiplying it by diffuse or specular.

// Russian Roulette
// As the throughput gets smaller, the ray is more likely to get terminated early.
// Survivors have their value boosted to make up for fewer samples being in the average.
{
    float p = max(throughput.r, max(throughput.g, throughput.b));
    if (RandomFloat01(rngState) > p)
        break;

    // Add the energy we 'lose' by randomly terminating paths
    throughput *= 1.0f / p;
}

To give an idea of how this affects performance, on my machine at (i forget) resolution and (i forget) renders per frame, i was getting 30 fps which is 33.3 milliseconds per frame. After I put in this Russian roulette code, it went up to 40 fps which is 25 milliseconds per frame. Same exact visual results, but 8ms less render time per frame, or a 25% improvement in speed.

Bonus 2: Space Bar to Reset

It can be hard to get a full screen screenshot in shadertoy, because you have to click the button to make it full screen (see image below) and then it already has some of the small rendering still there on the screen.

We can actually make it so when you press the space bar, that it resets the render.

How you do this is instead of using iFrame to figure out how much to blend this frame into the next, you store the blend (1/framesRendered) in the alpha channel of each pixel. You can then assign the keyboard as a texture to iChannel2 (it’s under the Misc tab) and then read that “keyboard texture” to see if the space key was pressed.

Find this code:

// average the frames together
vec3 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy).rgb;
color = mix(lastFrameColor, color, 1.0f / float(iFrame+1));

// show the result
fragColor = vec4(color, 1.0f);

And replace it with this (after assigning the keyboard as a texture in the iChannel2 slot!):

// see if space was pressed. if so we want to restart our render.
// This is useful for when we go fullscreen for a bigger image.
bool spacePressed = (texture(iChannel2, vec2(32.5/256.0,0.25)).x > 0.1);

// average the frames together
vec4 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy);
float blend = (lastFrameColor.a == 0.0f || spacePressed) ? 1.0f : 1.0f / (1.0f + (1.0f / lastFrameColor.a));
color = mix(lastFrameColor.rgb, color, blend);

// show the result
fragColor = vec4(color, blend);

Now, you can flip to full screen and press space bar to reset the render to get a better larger render.

Closing

Don’t forget… if you are trying to make a render and you are seeing 60fps, that means you probably could be converging faster and you are waiting for no good reason. Try turning c_numRendersPerFrame up until you aren’t at 60fps anymore. Anything lower than 60 means it’s working near capacity, getting it down to 10 or 5 or lower fps isn’t going to help you, so don’t worry about bringing your shadertoy tab to a crawl, it doesn’t do any good.

Be on the lookout for the next post in this series. We have some more topics to casually implement, including: depth of field and bokeh, camera movement, participating media (fog), bloom, refraction & transparencies, and more.

Thanks for reading and share any cool things you make with me on twitter at @Atrix256

A Link Between Russian Roulette and Rejection Sampling / Importance Sampling

This post explains how Russian Roulette can be viewed as rejection sampling a PDF to do importance sampling.

The explanation sticks to simple 1D functions and has some ~500 lines of c++ to accompany it that made the data for the explanations.

The code is at: https://github.com/Atrix256/RussianRoulette1D

Background Topics

Here are some quick definitions and links to more info.

Monte Carlo Integration – Using random samples of a function to find the area under the curve if you graphed it. Aka using random numbers to integrate a function numerically.

Importance Sampling – Using random numbers in Monte Carlo integration where the probabilities of numbers drawn aren’t the same for each number. As the graph of the probabilities looks more similar to the graph of the function being sampled, Monte Carlo integration gets closer to the right answer a lot more quickly. At the extreme, it only takes one sample to get the exact answer!

PDF – Probability density function. This is a function that describes how probable different numbers are for being generated. It integrates to 1, but might have values that go above one. The area under the curve between 2 values is the percent chance probability of a random number falling between those two values.

Uniform Distribution – Random numbers (A PDF) where all possible values are equally likely. If you made a histogram of the numbers generated from this distribution, it would be flat, if you generated an infinite number of values.

Rejection Sampling – If you know the shape of a PDF you would like to draw random numbers from, but you aren’t sure how to actually generate random numbers following that PDF, rejection sampling is one way to do that.

Russian Roulette – If you have something like an infinitely recursive (or iterative) integral, like you find with the rendering equation, Russian Roulette is a way to stop the infinite process early while still getting the correct answer. It stops early without adding bias. More generally it lets you skip doing some of the work you have to do, without adding bias.

Here are the links:

Monte Carlo Integration Explanation in 1D

Generating Random Numbers From a Specific Distribution With Rejection Sampling

Russian Roulette and Splitting (PBRT book chapter 13.7)

You’ll also see the code for this post doing a “lerp” to average sample points. If you want more info about that, check this out:

Incremental Averaging

Integrating y=x^2 from 0 to 1

Let’s look at a couple different ways to integrate a function numerically, to get a sense of things.

We’re going to integrate y=x^2 from 0 to 1 three different ways. First we’ll use regular Monte Carlo with a uniform distribution, second we’ll importance sample it using rejection sampling, and thirdly, we’ll use Russian roulette which as you’ll see is also importance sampling using rejection sampling.

Simple Monte Carlo integration works like this:

  1. Take N random samples of a function between A and B
  2. Sum them up and divide by N to get the average height of the function between A and B
  3. Multiply by B-A to get the area under the curve

Doing that with the accompanying code, after 1000 samples the value it comes up with is 0.298374. The actual value is 1/3 or 0.33333… so it got fairly close. Check out the “// monte carlo” section in the code inside of TestFlat().

We’ll do the same steps again but this time do importance sampling with rejection sampling.

Since we know y=x^2 is 0 when x is 0 and 1 where x is 1, we can use rejection sampling to make a probability density function that is similar.

Whenever we generate a random number “x” between 0 and 1 to use in the function, we’ll say there is an “x” percent chance that we keep that number, otherwise we throw it away and generate a new random number. Another way of saying this is that there is a 1-x percent chance that we are going to throw it away and generate a new random number.

That probability of keeping numbers looks like the function y=x when you do this. y=x isn’t a proper PDF though because the area under the curve is 0.5, not 1, so you need to multiply it by 2 to make it a proper PDF. This doesn’t actually change the probabilities, this is just making a “proper PDF” that describes the probabilities. So, the PDF for these random numbers is y=2x.

We have to change how we use these random numbers too though. After we calculate our y value by plugging x into y=x*x, we need to divide it by the PDF value for x before we add them to the sum (the sum which we later divide by N to get the average). You just plug x into the PDF function y=2x to get the PDF value. We divide by the PDF because what this does is make it so numbers that come up more often weigh less into the final average. For instance, if one number came up twice as often than another in your random numbers, you should make it count for half as much to the final average, to make things equal like they are in plain vanilla Monte Carlo.

When we do this with 1000 samples, we get a value of 0.334112 (Monte Carlo was 0.298374). You might find it interesting that we had to generate 2052 potential x values for the 1000 we actually used. 1052 were thrown away.

Next up we are going to do Russian Roulette. How this works is we are going to have a probability where we just aren’t going to take a sample AT ALL. We are going to modify the first step of the Monte Carlo algorithm:

  1. Take N random samples of a function between A and B. There is some percent chance “p” we won’t actually do a sample, but will just use a value of 0. If we do take a sample, we are going to divide the value we got by 1-p which will make it a larger value to make up for the times we didn’t take a sample.
  2. Sum them up and divide by N to get the average height of the function between A and B
  3. Multiply by B-A to get the area under the curve

If we do that with a probability of 1-x of killing a sample (aka an “x” percent probability of using a sample. Is this sounding familiar?), and do 1000 attempts, we get a value of 0.327171. We didn’t do as well as we did with importance sampling using rejection sampling, but that used a full 1000 samples, while this method only ended up using 746 samples actual samples.

So:

In the rejection sampling algorithm, we used x as the percent probability to determine whether we actually plug that x into the function, or a 1-x percent probability to generate a different x value. We loop until success.

In the Russian roulette algorithm, we used 1-x as the percent probability to use 0 instead of taking a sample, or x as the probability to take a sample and divide it by (1-x) to make it larger to make up for samples we killed.

Does those sound pretty similar to you?

As further evidence, here’s the histogram of x values generated by each of those algorithms, along with a graph of the PDF y=2x, modifying the code to take 100k samples instead of 1000, to reduce the random noise in the graph (make it converge more). This shows that they are generating random numbers from the same PDF, which is y=2x.

Here is the absolute error of each technique for each sample. Rejection sampling a PDF of y=2x is showing lowest error, followed by Russian Roulette with a probability of 1-x to kill a sample, and lastly in the rear comes vanilla Monte Carlo. It’s important to remember that rejection sampling and monte carlo are taking new samples each time you take a step forward on the x axis, but for russian roulette, it’s sometimes not taking a sample and just using a value of 0 instead.

Looking at the standard deviation, the square root of variance, this tells us how erratic the samples are.

Unfortunately, the worst variance comes from the Russian Roulette algorithm. I don’t have a good sense of why this is or how it can have worse variance than Monte Carlo but come up with a better result. Mathematically, I know the reason is due to the zeroes used when a sample is killed, but intuitively, I don’t grok how this can be true or what it means exactly to be better converged, but have worse variance. Something to chew on I guess (or feel free to educate me if you know!)

Why Does Russian Roulette Boost Values?

Ok so let’s take a quick break to get some intuition for why Russian roulette boosts values up when they survive the random killing.

Let’s say we have a function y=1, so it’s just a function that is 1 everywhere.

If we wanted to integrate that from 0 to 1 with Monte Carlo using four uniform random samples, we’d get four 1s, sum them up to get 4, divide by the number of samples (4) and get our result of 1.

That’s the right answer: a function that has a y value of 1 everywhere, for x being between 0 and 1 is a square of length 1 on both the x and y axis. That square has an area of 1.

If we wanted to use Russian roulette here with a 25% chance to kill a sample, we could assume that one of our four samples was killed. That means we get 3 samples each getting a value of 1, and the sample we killed has a value of 0.

If we sum the samples we’d get 3. If we divide them by the number of samples taken, we’d get 3/4 or 0.75.

This is the incorrect answer!

Since we know we are going to kill one out of every four samples, what we could do is boost every survivor to make up for the samples we killed.

The algorithm says we divide by 1 minus the probability, which is in this case 0.75 or 3/4.

Dividing by 3/4 is the same as multiplying by 4/3.

So, going back to our samples we have four samples. Three of them were 1, multiplied by 4/3 to get 4/3. The last sample was killed so is 0.

Summing these samples up, we get 12/3 which reduces to 4.

Dividing that sum by the total number of samples (4) we get 1.

That is the RIGHT answer.

Russian roulette boosts the survivors to make up for the missing information of the samples that were killed.

Sure, it’s random, and the boost amount may not actually match the number of samples actually killed, but given enough samples it’ll be true, or close enough to true.

Bonus: use blue noise or low discrepancy sequences to make this be more true for smaller sample counts.

Iterative / Recursive Functions

You might be saying to yourself that this example is stupid, because nobody uses Russian Roulette this way, and you may in fact be doubting that it is Russian Roulette because it’s used in such a foreign setting.

The usual place you see it used if you are a graphics person is in terminating a ray while it’s bouncing around a scene so that you can kill it early without bias, when the ray isn’t likely to be contributing much value to the result if it continues.

Well, same thing applies. Using it in this setting just lets you say “ah screw it, i’m stopping here” with some probability. The rest of the time, the values you DO get are boosted up to make up for the slackers who quit early.

The only real difference is that because it’s recursive, each survival grows the weight of not only the current iteration but all future iterations too – this is because the current iteration relies also on the future iterations.

Let’s quickly see some data for a recursive/iterative function:

  1. calculate y = f(x) = sin(x*pi/2)
  2. Divide x by 2 and do it again
  3. Do this 1000 times
  4. The sum of the above is the result

We are going to integrate that from 0 to 1, meaning we are going to take random numbers between 0 and 1, plug them into the above, and the average will be the area under the curve.

After doing this 10,000 times…

Monte Carlo gives us a value of 1.403612 with a standard deviation of 0.733499.

Since the basic function sin(x*pi/2) is 0 when x is 0 and 1 when x is 1, i used the same probabilities for Russian Roulette as before. That gave me a value of 1.285901, with a standard deviation of 3.493362. Instead of doing a full 1000 loops through the function though, for those 10000, it did 0 loops at minimum, 5 at max, and 0.71 on average. That is quite a savings on iteration count.

To be honest, i’m not sure what the right answer to this function is, so i can’t say how correct these answers are (A less lazy person would run Monte Carlo with a very large number of samples). They are pretty close to each other though which makes me think they must be in the ballpark. If that’s true, they have fairly similar error (at least in a spherical cow sense), but one did 1000 iterations, while the other did 0 or 1 on average, and 5 maximum. That is pretty cool! Admittedly the variance got worse again though.

Using a different probability for Russian Roulette, where the PDF more closely matched what the actual shape of the function was (the iterated function, not just a single incarnation of it?) would get you a better answer quicker. i didn’t really experiment with that very much.

I did think that 0 to 1 iterations on average was pretty low though, so instead of making the probability be 1-x to kill a sample, i tried making it (1-x)/5. That way, the probability shape is still the same, but samples are less likely to be killed, so they should survive more iterations.

That did in fact work!

After doing that 10,000 times, Russian Roulette gave a value of 1.378251 with a standard deviation of 0.939193. The minimum number of iterations is still zero, but the maximum went up to 45, and the average went up to 4.89. We got to keep the shape of our PDF, but got deeper through the iterations. So, we are sort of importance sampling on 2 axes – fun 😛

The value changed and I’m unsure if it’s for better or worse, but it got closer to the plain Monte Carlo answer, which is a good sign. The variance also got better which is neat, but is still worse than the Monte Carlo variance.

Here’s a graph of the value of the functions as the number of samples grow:

Here’s a graph of the standard deviation. The probability 1-x has really bad variance, while dividing it by 5 has much better variance.

Closing

So hopefully you now can see a link between Russian Roulette and importance sampling using rejection sampling.

What I found neat about this perspective is that instead of Russian Roulette being some “one off” technique, I can merge it mentally with rejection sampling and importance sampling, and free up some brain space.

I also like this “off brand” of Russian Roulette.

Basically, rejection sampling is a “no go” for real time graphics in general, like for use in a shader. The reason for this is that rejection sampling takes an unknown number of iterations each time to generate a single valid sample.

If you instead use Russian Roulette, it’s a fixed upper bound amount of work, but you will get some randomly less number of samples of your data.

Another way to put it is that when generating numbers from a PDF…

  • Rejection sampling requires an unknown sized loop to generate a fixed number of samples
  • Russian Roulette has a fixed loop to generate an unknown number of samples

Engineering is all about trade offs, so having this trade off is pretty handy IMO.

Let’s do a quick, simple recap on Rejection Sampling vs Russian Roulette for integrating a y=f(x) function using a specific PDF.

I’m defining a function A(x) which is the probability of accepting a value “x” chosen. The function A is just a scaled version of the PDF such that all values in the PDF are <= 1. This function gives you the probability of keeping a value "x" after randomly selecting it – for both rejection sampling, and russian roulette.

Rejection Sampling

  1. Draw a number x from a uniform distribution between a and b.
  2. Roll a random number from 0 to 1. If that number is greater than A(x), go to 1.
  3. calculate y=f(x)/pdf(x) to get a sample
  4. The result of the integral is the average y value, multiplied by (b-a)

The count of random numbers you had to generate is dynamic, to get a fixed number of samples.

The number of random x’s generated is >= the number of samples you got out.

The count on the left changes.

Russian Roulette

  1. Draw a number x from a uniform distribution between a and b.
  2. Roll a random number from 0 to 1. If that number is greater than A(x), y=0
  3. Else, y=f(x)/A(x)
  4. The result of the integral is the average y value, multiplied by (b-a)

The count of random samples you had to generate is fixed, to get a dynamic number of samples.

The number of random x’s generated is still >= the number of samples you got out.

The count on the right changes.

Last thing: I could be wrong, but i believe that Rejection Sampling is a “Las Vegas” algorithm, while Russian Roulette is a “Monte Carlo” algorithm.

https://en.wikipedia.org/wiki/Las_Vegas_algorithm#Relation_to_Monte_Carlo_algorithms

Assuming i’m correct in my understanding there, Monte Carlo algorithms are preferred in real time graphics, games, and other real time situations. The ability to have knowledge or control over how long something will take to execute is in general way more important than the quality of the result you get. In modern times of temporal accumulation, this is even more true, since poor data every now and then can be soaked up by the temporal accumulation, to an extent.

And in those situations, you can do even better by using random or quasi random sequences which have better properties over both space and time, to help ensure that you get better results over smaller regions of space, and over shorter periods of time.

Casual Shadertoy Path Tracing 1: Basic Camera, Diffuse, Emissive

Posts in this series:

  1. Basic Camera, Diffuse, Emissive
  2. Image Improvement and Glossy Reflections
  3. Fresnel, Rough Refraction & Absorption, Orbit Camera

Below is a screenshot of the shadertoy that goes with this post. Click to view full size. That shadertoy can be found at: https://www.shadertoy.com/view/tsBBWW

When you see a photorealistic image that someone says is “ray traced” what they likely mean is that it is “path traced”.

Path tracing can be pretty heavy on math and require a lot of attention to detail, but it can make for some really great images.

Funny thing though. You can throw out a lot of the formalism and still get superb results.

That is the premise of this series of blog posts. The goal is to get you path tracing and to let you play around, have fun, create art, without having to worry about integrals, PDFs and the existential question of graphics: “To pi or not to pi?”. Unfortunately that means that our images may take longer to render, and may not be exactly “correct”, but you can learn the more complicated techniques later when and if you are interested.

Rendering is an artistic and creative endeavor, so lets ignore some of the details for now. We are not making a ground truth renderer, we are just having fun making cool renderings.

Let’s get to it!

Shadertoy.com

Shadertoy is a website that lets you write small programs in a c-like language called GLSL which are ran for every pixel in an image, on your GPU (not CPU). Shadertoy uses WebGL but manages everything except the pixel shader(s) for you.

These programs have very limited input including pixel coordinate, frame number, time, and mouse position, and give only a color as output.

Despite the simplicity, people have done amazing things, like making a playable level of doom 1.

[SH16C] Doom by Paul Malin https://www.shadertoy.com/view/lldGDr

… Or make beautiful photo realistic images and videos which render in real time.

Snail by Inigo Quilez https://www.shadertoy.com/view/ld3Gz2

Shadertoy was founded by demosceners which are people that have parties & competitions to make very small executables that generate impressive audio and visuals.

Besides demosceners, shadertoy.com is frequented by professional and hobbyist game developers, graphics researchers, people who work in movies, and all sorts of other people as well.

It’s a great place to play and learn because it lets you dive in without a lot of fuss. I have learned so much there it boggles my mind sometimes… all sorts of math tricks, graphics techniques, dual numbers for automatic differentiation, ray marching solid geometry, and I even did real time ray tracing back in 2013. Real time ray tracing may SEEM new to the world, but demo sceners are doing real time ray tracing on things like the comodore 64 and have been doing so for years or decades. The rest of us are just catching up!

In these articles we are going to be doing path tracing in shadertoy because it’s very easy to do so. You should head over there, create an account and look around a bit at some of the shadertoys. Shadertoy works best in chrome or firefox.

We’ll get started when you come back 🙂

http://shadertoy.com

New Shader

To get started, after you log in, click on the “new” button in the upper right to create a new shader. You should be greeted by something that looks like this:

Go ahead and give the shader a name – the names must be globally unique! – as well as at least one tag, and a description. All of those are required so that you can click “Submit” and do the initial save for your shader.

After you have submitted your shader, the submit button will turn into a save button, and you will be good to go onto the next step.

Generating Rays

Path tracing is a type of ray tracing, which means we are going to shoot a ray into the world out of every pixel. For each individual ray, we are going to see what it hits and use that information to give a color to our pixel. Doing that for every pixel independently gives us our final image.

So, the first step to doing path tracing is to calculate the rays for each pixel. A ray has a starting point and a direction, so we are going to need to calculate where the ray starts, and what direction it is going in.

The ray is going to start at the camera location (you could think of this as the eye location), and it’s going to shoot at pixels on an imaginary rectangle in front of the camera, go through those pixels, and go out into the scene to see what they hit.

We are going to make this rectangle of pixels be from -1 to +1 on the x and y axis, and for now we’ll make it be 1 unit away on the z axis. We’ll say the camera is at the origin (0,0,0). Our goal is to calculate the 3d point on this rectangle for each pixel, then calculate the direction of the ray from the origin to that 3d point.

We’ll start by calculating each pixel’s percentage on the x and y axis from 0 to 1. We’ll show that in the red and green channel to make sure we are doing it correctly.

Put this code below in the code box in the right and click the play button (circled red in the screenshot) to see the results.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate pixel coordinates as a percetange of the screen, from 0 to 1 on each axis
    vec2 uv = fragCoord/iResolution.xy;

    // show percentage as red and green
    fragColor = vec4(uv, 0.0f, 1.0f);
}

We can see from this picture that (0,0) is in the lower left, where the color is black. As the pixels go to the right, they get more red, which shows us that the x axis goes to the right. As the pixels go up, they get more green, which shows us that the y axis goes upward.

Visualizing values like this is a great way to debug shadertoy shaders. The usual graphics debuggers and debugging techniques are not available when doing webgl, or when using shadertoy, so this “printf” style debugging is the best way to see what’s going on if you are having problems.

Now that we have screen percentage from 0 to 1, we want to change it to being from -1 to 1 so that it’s actually just the x and y coordinate on the imaginary pixel rectangle.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);
}

We can see now that it’s black in the middle of the screen, and going to the right makes the pixels more red while going up makes the pixels more green. However, going left and down doesn’t change the color at all and just leaves it black. This is because colors are clamped to be between 0 and 1 before being displayed, making all negative numbers just be zero.

if you wanted to make sure the negative values were down there, and it really wasn’t just zero, you could visualize the absolute value like this.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // calculate 2d coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on each axis.
    vec2 pixelTarget2D = (fragCoord/iResolution.xy) * 2.0f - 1.0f;
    
    // take absolute value of the pixel target so we can verify there are negative values in the lower left.
    pixelTarget2D = abs(pixelTarget2D);

    // show percentage as red and green
    fragColor = vec4(pixelTarget2D, 0.0f, 1.0f);
}

Ok so now we have the x,y location of the ray target on the imaginary pixel rectangle. We can make the z location be 1 since we want it to be 1 unit away, and that gives us a target for our ray. We now want to get a NORMALIZED vector from the camera position (the origin) to this ray target. That gives us our ray direction and we can visualize that ray to verify that it looks somewhat reasonable.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);

    // show the ray direction
    fragColor = vec4(rayDir, 1.0f);
}

it’s a little subtle, but as you go right from the center of the screen, the pixels get more red and become a purpleish color. As you go up from the center, the pixels get more green and become a little more teal. This shows us that our ray directions seem sensible. As we move right on the x axis, the normalized ray direction also starts deviating from straight ahead and points more to the right, more positively on the x axis. Similarly for the y axis and the green color.

Going left and down from center, the blue gets darker, but no red and green show up due to the x and y axis being negative values again.

If you play around with the z coordinate of the ray target, you can make the ray direction changes more or less obvious due to basically zooming in or out the camera by moving the imaginary pixel rectangle, but we’ll talk more about that in a little bit.

The nearly constant blue color on the screen is due to the positive z value of the ray direction. All of the rays are aiming forward, which is what we’d expect.

We have successfully calculated a ray starting position and direction, and are ready to start shooting rays into the world!

Rendering Geometry

Next we want to use these rays to actually render some geometry. To do this we need functions that will tell us whether a ray intersects a shape or not, and how far down the ray the intersection happens.

We are going to use the functions “TestSphereTrace” and “TestQuadTrace” for ray intersection vs sphere, and ray intersection vs rectangle (quad). Apologies for not having them in a copy/pastable format here in the post. WordPress malfunctions when I try to put them into a code block, but they are in the final shader toy.

There are other ray vs object intersection functions that you can find. I adapted these from Christer Ericson’s book “Real-Time Collision Detection” (https://www.amazon.com/Real-Time-Collision-Detection-Interactive-Technology/dp/1558607323), and there a bunch on the net if you google for them. Ray marching can also be used as a numerical method for ray vs object intersection, which lets you have much richer shapes, infinitely repeating objects, and more.

Anyhow, instead of using the ray direction for the color for each pixel, we are going to test the ray against a scene made up of multiple objects, and give a different color to each object. We do that in the GetColorForRay function.

// The minimunm distance a ray must travel before we consider an intersection.
// This is to prevent a ray from intersecting a surface it just bounced off of.
const float c_minimumRayHitTime = 0.1f;

// the farthest we look for ray hits
const float c_superFar = 10000.0f;

struct SRayHitInfo
{
    float dist;
    vec3 normal;
};

float ScalarTriple(vec3 u, vec3 v, vec3 w)
{
    return dot(cross(u, v), w);
}

bool TestQuadTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec3 a, in vec3 b, in vec3 c, in vec3 d)
{
// ...
}

bool TestSphereTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo info, in vec4 sphere)
{
// ...
}

vec3 GetColorForRay(in vec3 rayPos, in vec3 rayDir)
{
    SRayHitInfo hitInfo;
    hitInfo.dist = c_superFar;
    
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(1.0f, 0.1f, 0.1f);
    } 
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(0.1f, 1.0f, 0.1f);
    }    
    
    {
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
        {
            ret = vec3(0.7f, 0.7f, 0.7f);
        }
	}
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
    {
        ret = vec3(0.1f, 0.1f, 1.0f);
    }       
    
    return ret;
}

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

We see some things on the screen which is nice, but we have a ways to go yet before we are path tracing.

If you look at the code in the final shadertoy, TestSphereTrace and TestQuadTrace only return true if the ray intersects with the shape AND the intersection distance is closer than hitInfo.dist. A common mistake to make when first doing any kind of raytracing (path tracing or other types) is to accept the first thing a ray hits, but you actually want to test all objects and keep whichever hit is closest.

If you only kept the first hit found, not the closest hit, the blue ball would disappear, because the ray test for the grey quad comes before the blue ball, even though it’s farther away.

Correcting Aspect Ratio

You probably noticed that the spheres were stretched in the last image. The reason for that is that our imaginary pixel rectangle is a square (it’s from -1 to +1 on the x and y axis), while the image we are rendering is not a square. That makes the image get stretched horizontally.

The way to fix this is to either grow the imaginary pixel rectangle on the x axis, or shrink it on the y axis, so that the ratio of the width to the height is the same on the imaginary pixel rectangle as it is in our rendered image. I’m going to shrink the y axis. Here is the new mainImage function to use. Lines 10 through 12 were added to fix the problem.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, 1.0f);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

Camera Zoom (FOV)

I said before that the imaginary pixel rectangle would be 1 unit away, but if we make that distance smaller, it’s the same as zooming the camera out or increasing the field of view. If we make that distance larger, it’s the same as zooming the camera in or decreasing the field of view.

It turns out that having the pixel rectangle be 1 unit away gives you a field of view of 90 degrees.

The formula for calculating the distance for a specific field of view is on line 7 of the new mainImage function below and is used on line 11.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir);

    // show the result
    fragColor = vec4(color, 1.0f);
}

We are going to leave c_FOVDegrees at 90 degrees as we move forward, but here it is at 120 degrees

and here it is at 45 degrees

If you make the FOV too wide, you’ll start getting distortion at the edges. That’s a deep topic that cameras experience too that we won’t go into but I wanted to make sure you knew about that. You can get some real interesting renders by setting the FOV to really large values like 360 or 720 degrees!

Let’s Path Trace!

Now that we can shoot rays into the world, we are finally ready to do some path tracing!

To do shading we’ll need light sources, and our light sources will just be objects that have emissive lighting, aka glowing objects.

For this post we are only going to do diffuse light shading, which means no shiny reflections. We’ll do shiny reflections probably in the very next post in this series, but for now we’ll just do diffuse.

So, objects will have two fields that define their material:

  • a vec3 for emissive, specifying how much they glow in RGB
  • a vec3 for albedo, specifying what color they are under white light

How path tracing works in this setup is pretty simple. We’ll start out a pixel’s color at black, a “throughput” color at white, and we’ll shoot a ray out into the world, obeying the following rules:

  • When a ray hits an object, emissive*throughput is added to the pixel’s color.
  • When a ray hits an object, the throughput is multiplied by the object’s albedo, which affects the color of future emissive lights.
  • When a ray hits an object, a ray will be reflected in a random direction and the ray will continue (more info in a sec)
  • We will terminate when a ray misses all objects, or when N ray bounces have been reached. (N=8 in this shadertoy, but you can tune that value for speed vs correctness)

That is all there is to it!

The concept of “throughput” might seem strange at first but think of these scenarios as they increase in complexity:

  1. A ray hits a white ball, bounces off and hits a white light. The pixel should be white, because the white ball is lit by white light.
  2. A ray hits a red ball, bounces off and hits the same white light. The pixel should be red because red ball is lit by white light.
  3. A ray hits a white ball, bounces off and hits a red ball, bounces off and hits a white light. The pixel should be red because the white ball is lit by the red ball being lit by the white light.

When a ray bounces off a surface, all future lighting for that ray is multiplied by the color of that surface.

These simple rules are all you need to get soft shadows, bounce lighting, ambient occlusion, and all the rest of the cool things you see show up “automagically” in path tracers.

I said that the ray bounces off randomly but there are two different ways to handle this:

  1. Bounce randomly in the positive hemisphere of the normal and then multiply throughput by dot(newRayDir, surfaceNormal). This is the cosine theta term for diffuse lighting.
  2. Or, bounce randomly in a cosine weighted hemisphere direction of the surface normal. This uses importance sampling for the cosine theta term and is the better way to go.

To do #2, all we need to do is get a “random point on sphere” (also known as a random unit vector), add it to the normal, and normalize the result. It’s nice how simple it is.

You are probably wondering how to generate random numbers in a shader. There are many ways to handle it, but here’s how we are going to do it.

First, we are going to initialize a random seed for the current pixel, based on pixel position and frame number, so that each pixel gets different random numbers, and also so that each pixel gets different random numbers each frame.

We can just put this at the top of our mainImage function:

    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);

Then, we can toss these functions above our mainImage function, which take the rngState in, modify it, and generate a random number.

uint wang_hash(inout uint seed)
{
    seed = uint(seed ^ uint(61)) ^ uint(seed >> uint(16));
    seed *= uint(9);
    seed = seed ^ (seed >> 4);
    seed *= uint(0x27d4eb2d);
    seed = seed ^ (seed >> 15);
    return seed;
}

float RandomFloat01(inout uint state)
{
    return float(wang_hash(state)) / 4294967296.0;
}

vec3 RandomUnitVector(inout uint state)
{
    float z = RandomFloat01(state) * 2.0f - 1.0f;
    float a = RandomFloat01(state) * c_twopi;
    float r = sqrt(1.0f - z * z);
    float x = r * cos(a);
    float y = r * sin(a);
    return vec3(x, y, z);
}

To implement the path tracing rules we previously described, we are going to change some things…

  1. Move the ray vs object tests from GetColorForRay into a new function called TestSceneTrace. This lets us separate the scene testing logic from the iterative bounce raytracing we are going to do.
  2. Add albedo and emissive vec3’s to hit info and set both fields whenever we intersect an object. This is the object material data.
  3. Implement iterative raytracing in GetColorForRay as we described it
  4. Add a light to the scene so it isn’t completely dark
void TestSceneTrace(in vec3 rayPos, in vec3 rayDir, inout SRayHitInfo hitInfo)
{    
    {
        vec3 A = vec3(-15.0f, -15.0f, 22.0f);
        vec3 B = vec3( 15.0f, -15.0f, 22.0f);
        vec3 C = vec3( 15.0f,  15.0f, 22.0f);
        vec3 D = vec3(-15.0f,  15.0f, 22.0f);
        if (TestQuadTrace(rayPos, rayDir, hitInfo, A, B, C, D))
        {
            hitInfo.albedo = vec3(0.7f, 0.7f, 0.7f);
            hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
        }
	}    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(-10.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(1.0f, 0.1f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
    } 
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(0.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(0.1f, 1.0f, 0.1f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);        
    }    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 0.0f, 20.0f, 1.0f)))
    {
        hitInfo.albedo = vec3(0.1f, 0.1f, 1.0f);
        hitInfo.emissive = vec3(0.0f, 0.0f, 0.0f);
    }           
    
    
	if (TestSphereTrace(rayPos, rayDir, hitInfo, vec4(10.0f, 10.0f, 20.0f, 5.0f)))
    {
        hitInfo.albedo = vec3(0.0f, 0.0f, 0.0f);
        hitInfo.emissive = vec3(1.0f, 0.9f, 0.7f) * 100.0f;
    }         
}

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
{
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
    {
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
            break;
        
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    }
 
    // return pixel color
    return ret;
}

When we run the shadertoy we see something like the above, but the specific dots we see move around each frame due to being random over time.

This might not look great but we are really close now. We just need to make pixels show their average value instead of only a single value.

Averaging Pixel Values

To average the pixel values We are going to need to make our shader into a multipass shader.

The first step is to make another pass. Click the + sign next to the “image” tab and select “Buffer A”

Now you should have a Buf A tab next to the image tab.

Move everything from the image tab to the Buf A tab. In the image tab, select “Buffer A” for texture “iChannel0” and put this code in to read and show pixels from Buffer A.

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    vec3 color = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    fragColor = vec4(color, 1.0f);
}

The next step is to actually average the pixel values.

In the “Buf A” tab, also select “Buffer A” for texture “iChannel0” so that we can read Buffer A’s pixel from last frame, to average into the current frame’s output. Now we just add lines 27, 28, 29 below to average all the pixels together!

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    // initialize a random number state based on frag coord and frame
    uint rngState = uint(uint(fragCoord.x) * uint(1973) + uint(fragCoord.y) * uint(9277) + uint(iFrame) * uint(26699)) | uint(1);
    
    // The ray starts at the camera position (the origin)
    vec3 rayPosition = vec3(0.0f, 0.0f, 0.0f);
    
    // calculate the camera distance
	float cameraDistance = 1.0f / tan(c_FOVDegrees * 0.5f * c_pi / 180.0f);        
    
    // calculate coordinates of the ray target on the imaginary pixel plane.
    // -1 to +1 on x,y axis. 1 unit away on the z axis
    vec3 rayTarget = vec3((fragCoord/iResolution.xy) * 2.0f - 1.0f, cameraDistance);
    
    // correct for aspect ratio
	float aspectRatio = iResolution.x / iResolution.y;
    rayTarget.y /= aspectRatio;
    
    // calculate a normalized vector for the ray direction.
    // it's pointing from the ray position to the ray target.
    vec3 rayDir = normalize(rayTarget - rayPosition);
    
    // raytrace for this pixel
    vec3 color = GetColorForRay(rayPosition, rayDir, rngState);
    
    // average the frames together
    vec3 lastFrameColor = texture(iChannel0, fragCoord / iResolution.xy).rgb;
    color = mix(lastFrameColor, color, 1.0f / float(iFrame+1));

    // show the result
    fragColor = vec4(color, 1.0f);
}

After 5 minutes of rendering, the result looks pretty noisy, pixels are too bright and clip, and it isn’t a great scene, but we are in fact path tracing now! You can even see some neat lighting effects, like how you see some red on the wall next to the red ball, due to light reflecting off of it. The spheres are casting some nice looking shadows too.

We can clean up the scene a little bit, make something like a cornell box and get this render after 5 minutes

When Rays Miss

One last feature before we close up this first post in the series.

Right now when a ray misses the scene and flies out into the void, we stop raytracing, which implicitly is saying that the void outside the scene is black. Instead of having our background be black, let’s have it be an environment.

In the “Buf A” tab choose the forest cube map for texture iChannel1 and then add line 19 to GetColorForRay() below.

vec3 GetColorForRay(in vec3 startRayPos, in vec3 startRayDir, inout uint rngState)
{
    // initialize
    vec3 ret = vec3(0.0f, 0.0f, 0.0f);
    vec3 throughput = vec3(1.0f, 1.0f, 1.0f);
    vec3 rayPos = startRayPos;
    vec3 rayDir = startRayDir;
    
    for (int bounceIndex = 0; bounceIndex <= c_numBounces; ++bounceIndex)
    {
        // shoot a ray out into the world
        SRayHitInfo hitInfo;
        hitInfo.dist = c_superFar;
        TestSceneTrace(rayPos, rayDir, hitInfo);
        
        // if the ray missed, we are done
        if (hitInfo.dist == c_superFar)
        {
            ret += texture(iChannel1, rayDir).rgb * throughput;
            break;
        }
        
		// update the ray position
        rayPos = (rayPos + rayDir * hitInfo.dist) + hitInfo.normal * c_rayPosNormalNudge;
        
        // calculate new ray direction, in a cosine weighted hemisphere oriented at normal
        rayDir = normalize(hitInfo.normal + RandomUnitVector(rngState));        
        
		// add in emissive lighting
        ret += hitInfo.emissive * throughput;
        
        // update the colorMultiplier
        throughput *= hitInfo.albedo;      
    }
 
    // return pixel color
    return ret;
}

With those changes, here is a 30 second render.

Noise

In 30 seconds it looks about as noisy as the previous 5 minute render did. The 5 minute render looks so much worse because the scene has both very dark places and very bright places. The more different the lighting conditions are in different places in a scene, the more noise you will have. Also, smaller brighter lights will cause more noise than larger dimmer lights.

Learning more sophisticated path tracing techniques (like direct light sampling, also known as next event estimation) makes this not be true, but we are aiming for understandability in this path tracer, not convergence speed.

But, you might notice that as your scene is rendering, shadertoy is reporting 60fps, meaning that the scene rendering is being limited by vsync. The rendering would go faster than 60fps if it could, and so would converge faster, but it can’t.

There’s a shadertoy plugin for chrome that you can install to help this, and you can tell it to do up to 64 draws per frame, which really helps it converge more quickly. I’m on a ~5 year old gaming laptop with an nvidia 980m gpu in it, and I got this full screen render in a minute, which has converged pretty nicely! Click on it to view it full screen.

You can get the shadertoy plugin here: https://chrome.google.com/webstore/detail/shadertoy-unofficial-plug/ohicbclhdmkhoabobgppffepcopomhgl

An alternative to getting the shadertoy plugin for chrome is to just shoot N rays out for the pixel each frame, and average them, before putting them into the averaged buffer. That is more efficient than doing the multiple shader passes N times a frame since things are in the memory cache etc when iterating inside the shader. That means your image will converge more in the same amount of time.

Closing

The shadertoy that made this image, that contains everything talked about in this post, is at: https://www.shadertoy.com/view/tsBBWW

Try playing around with the path tracer, making a different scene, and modifying the code for other visual effects.

If you make something, I’d love to see it! You can find me on twitter at @Atrix256

There are going to be 3 or 4 more posts in this series that will be coming pretty quickly, on a variety of topics including depth of field and bokeh, shiny reflections, anti aliasing and fog/smoke. We also are not even sRGB correct yet! These things will improve the image significantly so stay tuned.

Some other awesome resources regarding path tracing are:

  1. Peter Shirley’s “Ray Tracing in One Weekend” (now a free PDF) https://www.realtimerendering.com/raytracing/Ray%20Tracing%20in%20a%20Weekend.pdf
  2. Morgan McGuire’s “Graphics Codex” https://graphicscodex.com/
  3. My more formal blog post on diffuse & emissive path tracing https://blog.demofox.org/2016/09/21/path-tracing-getting-started-with-diffuse-and-emissive/

Path tracing doesn’t only have to be used for realistic rendering though. Below is a stylized path tracing shadertoy I made, in vaporwave style.

“Neon Desert” stylize path tracer shadertoy: https://www.shadertoy.com/view/tdXBW8

Thanks for reading, part 2 coming soon!

Using Blue Noise For Raytraced Soft Shadows

Make sure and click the images in this post and view them at full size. Noise (especially blue noise) tends to disappear in smaller images.

There are 3 shadertoys that go with this post where you can see the techniques in action and see the source code of the implementations.

To go along with the blue noise ray marched fog and light shafts of the last post (Ray Marching Fog With Blue Noise), another fun usage case of blue noise is in raytracing soft shadows.

By soft shadows i mean the kinds of shadows you get from area lights, not the hard edged shadows you get from point lights and directional lights.

Soft shadows have a penumbra – the soft edges where it transitions from being fully in shadow to fully lit, like in the image below.

If you want to know more about why shadows work that way, give this post of mine a read: Why Are Some Shadows Soft And Other Shadows Hard?

So how do we use blue noise for this? We’re going to start with spherical directional lights first, then show how to extend it to spherical positional lights, and then to spherical spot lights.

Spherical Directional Light : White Noise

Shadertoy: https://www.shadertoy.com/view/3sfBWs

A directional light is a light that is shining from a specific direction, regardless of where you are at in the world.

These lights simulate light sources that are so far away, that any movement (translation) you do doesn’t measurably change your relative position to the light source because the sizes and distances involved are gigantic.

We are basically talking about the sun here.

Ok, so raytracing a shadow for a directional light like this just involves shooting a ray along the negative light direction (towards the sun) and seeing if it hits any geometry (the world, blocking the sun light). If it hits anything, the origin of the ray is in shadow. If it doesn’t hit anything, the origin of the ray is lit. The answer is binary and so the shadow edge is hard. There is no penumbra.

The problem with this model of lighting is that it pretends that the sun is an infinitely tiny point in the sky, when the sun is actually a circle, at least from our perspective. It is actually a sphere (or close enough to a sphere), but the “solid angle” 2d projection of it is a circle.

So, to have more realistic lighting and shadows, instead of shooting a ray at a single tiny point in space, we need to see if the ray origin can see the circle that is the spherical lighting source, and even more so, we need to know what percentage of the circle it can see.

The mathy explanation is we need to integrate visibility over the circle, but since our scene isn’t likely to be a closed form equation, we are going to need to resort to numerical methods and sample over the circle domain to get the shadow term.

The less mathy explanation is that we want to shoot rays at a couple places on that circle at random, and see what percentage of those rays were able to see the circle. We multiply the lighting by that percentage, and we automatically get soft shadows.

Ok so now that we know what we need to do, how do we do it?

So first up, we need to generate a “uniform point on a circle”, by which i mean generate a point where every point in the circle is equally likely to be chosen. If we have 2 uniform random numbers from 0 to 1 we can do that with the GLSL code below, assuming rng is a vec2 with two uniform random numbers in it. Any decent shader hash function or shader rng function should work for making those numbers, but i like using the “Hash without sine” functions from this shadertoy https://www.shadertoy.com/view/4djSRW

float pointRadius = c_lightRadius * sqrt(rng.x);
float pointAngle = rng.y * 2.0f * c_pi;
diskPoint = vec2(pointRadius*cos(pointAngle), pointRadius*sin(pointAngle));

Note c_lightRadius in the above. That value is the perceptual size of the circle in the sky. Like if you saw a photograph of the sky, this would be the radius of the circle of the sun if that photograph was 1 world unit away (the radius is in world space units under those strange viewing conditions). In the shadertoy demo I use a value of 0.1.

So now we have a uniform random point on a circle but what do we do with it? Well, to make it an actual point on the sun, we need a tangent basis so we have an X, Y and Z coordinate system to put this circle into.

Here is some GLSL to do that.

vec3 lightTangent = normalize(cross(c_lightDir, vec3(0.0f, 1.0f, 0.0f)));
vec3 lightBitangent = normalize(cross(lightTangent, c_lightDir));

This code assumes that the light direction is not ever going to point straight up. You could put in code to use a different vector if the light direction was ever too close to straight up if you want to fix that assumption. Also, c_lightDir is assumed to be normalized.

So now that we have a uniform random point on a circle, and we have a tangent basis aka coordinate system for that circle, it’s time to make a world space target for the ray to shoot at, so we can get a ray direction.

This part is pretty simple. We just pretend that the circle is 1 unit away and use the coordinate axes to convert the point on the circle into a world space point. From there, we subtract the ray’s position from the ray target position and normalize that vector, to get the ray direction.

vec3 rayTarget = rayPos + c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent;
vec3 shadowRayDir = normalize(rayTarget - rayPos);

Looking at that code, you might notice we are adding in the ray position, just to subtract it out again. We can skip a step and do this instead of the code above:

vec3 shadowRayDir = normalize(c_lightDir + diskPoint.x * lightTangent + diskPoint.y * lightBitangent);

If you do this, you do in fact get a penumbra. This is what made the image labeled “white noise raw” in the image at the top of this section, using 16 shadow rays per pixel.

Quick note about perf: 16 shadow rays per pixel is kind of a lot. One way to help perf would be to say “if the first 4 rays all agree as to whether we are in shadow or not, just return that answer without doing the 12 other rays”. That would make this code cheaper everywhere except when in the penumbra. Another way to help perf would be to do shadows at a lower resolution. Another way would be to use fewer rays, but filter spatially and/or temporally (over time) to hide that fact. I’ve also heard of people making low resolution shadow maps and skipping the raytracing for pixels very clearly no where near a penumbra.

Spherical Directional Light : Blue Noise Over Space, Low Discrepancy Over Time

Being the blue noise zealot I am, white noise has a strong stink to me, and i know that if white noise is being used, almost certainly better results can be had by either using blue noise, or low discrepancy sequences instead.

We are actually going to use blue noise in 2 different ways to improve the shadows. The first way is that instead of using white noise (uncorrelated) uniform random numbers on the circle, we are going to use blue noise (negatively correlated) uniform random numbers on the circle.

White noise clumps together making semi redundant samples, and also leaves big holes between samples, making for larger unknown areas. Blue noise, on the other hand, is randomized, but roughly evenly spaced samples. (More info: What the heck is blue noise?)

I have some blue noise in a circle points i generated by using mitchell’s best candidate algorithm, but when generating candidates, i made sure they were all in a circle before doing the rest of the logic (aka i used rejection sampling inside of MBC). I also made it not calculate distances toroidally, since wrap around doesn’t make sense in this context. The points are also generated in [-1,1] instead of the usual [0,1].

For details of Mitchell’s best candidate algorithm check out my blog post: Generating Blue Noise Sample Points With Mitchell’s Best Candidate Algorithm

Below is that list of points I generated. These blue noise points are progressive, meaning that if you use the first N, regardless of what N is, you will have blue noise. So, this supports up to 64 samples, or any lower number of samples.

const vec2 BlueNoiseInDisk[64] = vec2[64](
    vec2(0.478712,0.875764),
    vec2(-0.337956,-0.793959),
    vec2(-0.955259,-0.028164),
    vec2(0.864527,0.325689),
    vec2(0.209342,-0.395657),
    vec2(-0.106779,0.672585),
    vec2(0.156213,0.235113),
    vec2(-0.413644,-0.082856),
    vec2(-0.415667,0.323909),
    vec2(0.141896,-0.939980),
    vec2(0.954932,-0.182516),
    vec2(-0.766184,0.410799),
    vec2(-0.434912,-0.458845),
    vec2(0.415242,-0.078724),
    vec2(0.728335,-0.491777),
    vec2(-0.058086,-0.066401),
    vec2(0.202990,0.686837),
    vec2(-0.808362,-0.556402),
    vec2(0.507386,-0.640839),
    vec2(-0.723494,-0.229240),
    vec2(0.489740,0.317826),
    vec2(-0.622663,0.765301),
    vec2(-0.010640,0.929347),
    vec2(0.663146,0.647618),
    vec2(-0.096674,-0.413835),
    vec2(0.525945,-0.321063),
    vec2(-0.122533,0.366019),
    vec2(0.195235,-0.687983),
    vec2(-0.563203,0.098748),
    vec2(0.418563,0.561335),
    vec2(-0.378595,0.800367),
    vec2(0.826922,0.001024),
    vec2(-0.085372,-0.766651),
    vec2(-0.921920,0.183673),
    vec2(-0.590008,-0.721799),
    vec2(0.167751,-0.164393),
    vec2(0.032961,-0.562530),
    vec2(0.632900,-0.107059),
    vec2(-0.464080,0.569669),
    vec2(-0.173676,-0.958758),
    vec2(-0.242648,-0.234303),
    vec2(-0.275362,0.157163),
    vec2(0.382295,-0.795131),
    vec2(0.562955,0.115562),
    vec2(0.190586,0.470121),
    vec2(0.770764,-0.297576),
    vec2(0.237281,0.931050),
    vec2(-0.666642,-0.455871),
    vec2(-0.905649,-0.298379),
    vec2(0.339520,0.157829),
    vec2(0.701438,-0.704100),
    vec2(-0.062758,0.160346),
    vec2(-0.220674,0.957141),
    vec2(0.642692,0.432706),
    vec2(-0.773390,-0.015272),
    vec2(-0.671467,0.246880),
    vec2(0.158051,0.062859),
    vec2(0.806009,0.527232),
    vec2(-0.057620,-0.247071),
    vec2(0.333436,-0.516710),
    vec2(-0.550658,-0.315773),
    vec2(-0.652078,0.589846),
    vec2(0.008818,0.530556),
    vec2(-0.210004,0.519896) 
);

Just doing that change, we get this strange result:

The problem is that every pixel is using the same blue noise sample sequence, and there are only 16 samples. That means there are 16 overlapping shadows for each sphere basically. Blue noise has done a good job in that those shadows are pretty different from each other, but the shadows aren’t good enough yet.

Now we need to bring in the second blue noise. We are going to tile a blue noise texture in screen space (you can get one here: http://momentsingraphics.de/BlueNoise.html), so that we can get a “blue noise random value” per pixel that is between 0 and 1. We are going to multiply that value by 2 * pi and use that as an angle for how to rotate our 2d blue noise samples.

Each pixel will have a different rotation amount, and each pixel will use a rotation that is very different from the rotation of neighboring pixels. This is due to blue noise textures having that property: pixel values are very different from neighbor pixel values.

A nice optimization here is that since all samples are being rotated by the same amount for a pixel, you can calculate the cosine and sine of the angle outside of the loop for shooting shadow rays, and just re-use them inside that loop for rotation.

If we do that, we end up with the image on the right labeled “blue noise raw”. Compare it vs “white noise raw” to see how the noise in the penumbra is way less noticeable. Same number of samples, same amount of computational complexity.

We aren’t done yet though… we have to now consider the axis of time, before we can declare victory.

Cutting right to the case, we are going to add frameNumber * goldenRatio to the blue noise value and fract it to bring it back to the 0 to 1 range. We want to do that before we multiply by 2 * pi to make it an angle.

If we do that, the blue noise value for each pixel becomes a low discrepancy sequence over time. It damages our blue noise over space property a little but it is a net win.

For a deeper discussion about this topic, check out these posts:
Animating Noise For Integration Over Time
Animating Noise For Integration Over Time 2: Uniform Over Time

The short answer to why this is better is that the distribution of a pixel’s value over time for shorter time durations is going to be closer to the actual distribution of the pixel. This is in contrast to white noise which does have the right average over larger sample counts, but for shorter sample counts may clump values together, and leave voids of unseen values.

In short, this makes the pixel more temporally stable, which is great for TAA and other temporal filtering/accumulation methods, but also is nicer even when viewing it without temporal filtering. You can turn on and off the ANIMATE_NOISE define in the shader and look at the “blue noise raw” panel to see how it looks when animated vs not animated without filtering.

Here are the final results, and the link to the shadertoy again.

Shadertoy: https://www.shadertoy.com/view/3sfBWs

By the way, noise is way more noticeable when it’s high contrast with the things around it. For instance, here’s the same image as the above again, but the ambient lighting is 10 times brighter (0.5, instead of 0.05). It’s way harder to see the noise, even the white noise!

Spherical Positional Light

With spherical directional lights explained, the hard part is done!

You only need to make 2 modifications to your code to get from spherical directional lights to spherical positional lights.

The first change is that instead of c_lightDir being a constant, wherever you use that vector, you should instead use the normalized vector pointing from the pixel being shaded to the position of the light (the center of the light).

The second change is that c_lightRadius isn’t constant either anymore. Unlike the sun, which doesn’t noticeably change size as you move around the world, spherical positional lights DO noticeably change size. As you get farther away from the light, the 2d projected circle for that light gets smaller. As you get closer, the circle gets larger.

The formula is actually real simple. If c_lightRadius is the radius of the sphere, and dist is the distance from the shaded pixel to the center of the light, the actual radius you should use for the circle when doing the rest of the “spherical directional light” logic is just: c_lightRadius / dist.

Making those changes, you can get this:

Shadertoy: https://www.shadertoy.com/view/ts2BRh

Spherical Spot Light

Now that you have a spherical positional light, it’s pretty easy to modify it into a spherical spotlight.

The first thing to do is to re-introduce a light direction. This light direction is the direction that the spot light is shining in. We are going to dot product the vector from the light to the pixel by this light direction to know what angle the pixel is from the light, to know if it’s in the light cone of the spot light or not.

The next thing to do is to define a “cosine theta inner” and a “cosine theta outer”. When the dot product in the last paragraph is less than “cosine theta outer”, then there is no light. If it’s greater than “cosine theta inner” then it’s full light. Between those values we want it to fade from lit to unlit, and for that i like to use smoothstep to do a non linear fade. Here’s some glsl code that does this:

vec3 lightDir = normalize(c_lightPos - hitPos);
float angleAtten = dot(lightDir, -c_lightDir);
angleAtten = smoothstep(c_cosThetaOuter, c_cosThetaInner, angleAtten);
lightColor *= angleAtten;

The rest of the code runs the same as a Spherical Positional Light, and you get a result like this:

Shadertoy: https://www.shadertoy.com/view/tsjfRh

Summary

This post showed how to use blue noise in two different ways for raytracing soft shadows.

  1. Blue noise sample points in a circle were used to sample a circle representing the light. Using blue noise here results in less error than white noise.
  2. A blue noise texture was used to rotate those 2d sample points per each pixel to make the resulting error be screen space blue noise, which looks better, is harder to detect, and is filtered more easily, compared to white noise, even for the same amount of error.

An important note here is that while blue noise converges at the same rate as white noise (but starts with lower error), low discrepancy sequences can converge quite a bit faster than white noise or blue noise.

If we could use a low discrepancy sequence for sampling the circle in step 1 above and converge so that there was no error, we could be done and wouldn’t need to do step 2.

Unfortunately, low discrepancy sequences have aliasing and so have worse looking error than blue noise before they converge. For the ray sample counts we have in budget right now for real time rendering, we definitely can’t afford to converge, even with an LDS, so that is why we reached for blue noise.

This is important to know, because for non real time rendering usage cases, you are likely better off with an LDS, than blue noise. Blue noise is just a neat trick you can do when you are really strapped for samples (aka can’t afford many rays).

One last thing – it turns out that sampling the projected 2d circle for 3d spherical lights is not quite the correct thing to do. It’s a decent approximation but for more information, check out this link (and note, you can apply blue noise the same way with better spherical light sampling):

https://schuttejoe.github.io/post/arealightsampling/

Happy raytracing and feel free to hit me up on shadertoy or on twitter if you have any questions @Atrix256

Ray Marching Fog With Blue Noise

The shadertoy that goes with this post is at: https://www.shadertoy.com/view/WsfBDf

I talk about blue noise a lot more often than I show usage cases of blue noise. Ray marching fog is a great usage case of blue noise that I haven’t shared yet.

Make sure and click the images in this post and view them at their full size to best see the noise. Blue noise tends to melt in thumbnail images, due to the low pass filtering involved in making an image smaller.

Real quick before going into it, here are some other usage cases of blue noise with corresponding shadertoys.

So here’s an algorithm for ray marching fog, which can give some simple scattering effects, like crepuscular lighting aka “god rays”.

  1. Render normally.
  2. Take N steps down the ray for the pixel starting from the camera, going to the depth value for the pixel.
  3. At each step, look in a shadow map to see if that world position is in shadow or not. Calculate the percentage of steps that were in shadow.
  4. Use this percentage to lerp between an “unlit fog color” and a “lit fog color”, use that as the fog color for that pixel.
  5. Use the usual distance fog calculations for that pixel.

Here is an image of doing the algorithm above, doing 256 steps.

If you decrease the step count to 16, the algorithm gets a lot faster, but you get much worse results that look like this:

That looks real bad, but you can break up banding by using noise. When you do the ray marching, you can roll a random number (white noise) for each pixel, and use that 0 to 1 value as a percentage of “one step distance” to push the ray start down the ray. Doing that, you get this result.

That result is better than the banding, but is pretty ugly. Using a tiled screen space blue noise texture as a source of the random numbers instead gives you this result, which is lots better.

As a bonus, here’s a result using Jorge Jimenez’s “Interleaved Gradient Noise” which is designed to work best when used with TAA that uses the “3×3 neighborhood history rejection” method.

Other Notes

There is a shadertoy that shows all this at https://www.shadertoy.com/view/WsfBDf

The step count used in the ray march defines how many “shades” there are between the lit and unlit fog. For instance, 2 steps will only let the lit percentage be: 0%, 50%, 100%. That only gives 3 shades of fog. In the images above where 16 steps were used, it gives 17 different shades (it’s always numsteps+1). If you drop the count lower, the noise will have fewer shades and be more noticeable in all cases.

If you use PCF sampling of the shadow map (if doing this in a raster setup), the fog light shafts get softer edges, which can look real nice, if you want that look.

If this technique looks interesting to you, you should also give this a read, for making better fake fog: https://www.iquilezles.org/www/articles/fog/fog.htm

These still images tell only part of the story. If you animate the noise well over both space AND time, where a single frame has good noise patterns, but each pixel is also blue or low discrepancy over time, you’ll have even better results. The shadertoy has a define for animating the noise ANIMATE_NOISE that you can turn on and off to see the difference. The result is better RAW like the shadertoy shows it, but it’s also better under temporal filtering, because it makes the value temporally converge to something closer to the correct value with less flickering.

The blue noise is animated by adding the golden ratio * the frame number to the blue noise texture value and using fract to bring it back between 0 and 1. Repeatedly adding the golden ratio to any starting value, and fract’ing it, will make a progressive low discrepancy sequence, which is exactly what we want. Doing this process makes each pixel be low discrepancy over time while the blue noise texture makes the pixels be blue noise (randomized low discrepancy) over space. Unfortunately, this damages the blue noise over space a bit, but it is a net win.

Interleaved gradient noise in this shadertoy is animated by scrolling the texture on each axis by 5.588238 pixels each frame. This is a value Jorge Jimenez (the maker of IGN) found through much manual effort, to try and find a scroll amount that made pixels be low discrepancy sequences. He hasn’t published this information but said I was free to share it so long as I gave him credit. Thanks Jorge!

For more info on animating noise, check out these two posts of mine:

The sampling in this setup has four dimensions, but isn’t vanilla four dimensional. It uses blue noise for the 2 dimensions of screen space, it uses low discrepancy sampling for the dimension of time, but it uses regular spaced samples on the dimension of distance down the ray.

I believe there are at least a couple different better ways to sample this setup but I’m not 100% sure what they are, or what the ideal one would be.

The “regular spaced samples over distance” seem like something that would be nice to look at fixing, but it’s also possible that a triangular distributed blue noise could be better for the blue noise over space.

For another rabbit hole to go down, check out this very related presentation slide deck.
“Low Complexity, High Fidelity – INSIDE Rendering”
https://www.gdcvault.com/play/1023002/Low-Complexity-High-Fidelity-INSIDE

Guess a Number Between 1 and 10

When you were a kid and had a gumball to share but 2 of your friends wanted the gumball, what would you do?

When I was a kid, and now that I have two kids of my own, a common way to resolve disputes like this is to say “Pick a number between 1 and 10 and whoever is closer gets the gumball”.

This works with any number of people and with any number of rewards because you can say the N closest people out of M people total get a reward. Ties can be broken by repeating the process.

It SEEMS like a purely random choice too, doesn’t it?

Well, if you want to win, the right move is to always guess 5 or 6, actually. It makes sense if you think about how 5.5 is right in the middle, so 5 and 6 are the numbers least far from all the other numbers.

If you are only playing against one other person, and both the real number and the guess are really random (uniform independent random numbers), guessing 5 will let you win 55% of the time, and tie 14% of the time. That means you will lose only 31% of the time, or less than one out of three times!

There is a way to make this game fair again though, and that is by letting the numbers wrap around between 1 and 10 as if they were on a circle instead of a number line.

So, if the right answer is 1, and you guess 10, without wrap around, the distance would be 9. With wrap around, the distance would be 1.

You calculate wrap around distance by saying “if the difference is greater than 5, then the difference is actually 10 minus that number”.

Playing that way, it puts all guesses on equal footing. Every number acts like a 5, because every number is 5 or less away from every other number.

In that scenario, guessing a 5 will cause you to win 41% of the time, tie 18% of the time, and lose 41% of the time.

The game is fair again, as is shown experimentally below.

The code that generated this output is at:
https://github.com/Atrix256/GuessNumberOneTen

Another way kids have to solve these sorts of situations is through “inka binka” and similar, where there is a song and a deterministic process to pick a winner.

My 5 year old already realized the same position wins every time, which i was pretty proud of hehe.

This sort of stuff is pretty neat though and makes me think people out there must be studying childhood computational sociology 😛

Basic Methods For Finding Zeroes and Mins / Maxes of Functions

There is ~500 lines of simple standalone C++ code that goes with this post, that generated all data used and implements the things talked about. You can find it at: https://github.com/Atrix256/BasicRootFinding

Simple equations can be solved using algebra without too much fuss:

3x-6=0 \\ 3x=6 \\ x=2

You can even solve more complicated equations by cleverly using identities, the quadratic equation, and other mathematical tools you might have in your tool belt.

Some equations are beyond our ability to solve them using simple algebra though, like high degree polynomials:

3x^5+2x^2-20x+2=0

We still can find where functions like that equal zero, but we can’t do it analytically, we have to do it numerically.

That is, we do things like roll a ball down hill and hope we find a zero.

(Apparently there are some analytical solutions to be had from that function. I don’t know how to get them though, so would have to resort to numerical methods!)

This post talks about a few methods for finding zeroes of equations we can’t solve analytically, and talks about how to use those methods to find minimum and maximum values of functions as well.

The Basic Idea

Let’s say I tell you that the value of a function at a specific point is “3” and that for every step to the right you take, it subtracts 1 from that value.

Then I ask: “Where does this function equal zero?”

If you answered “three steps to the right” you are correct, and guess what – you’ve just used Newton’s method for root finding.

There is an asterisk here though. The slope (how the function value changes as you take a step to the right. Also called the derivative.) was constant in the above example, but is not going to be constant in the functions we want to solve. That slope is going to change depending where you are on the graph.

To deal with this, Newton’s method takes the step it thinks it should take, and then asks again for the value and slope (derivative) and takes another step.

The hope is that with a good initial guess, Newton’s method will find a zero without too much effort.

We’re going to get a little more formal, while also showing some results, but all the rest of what we are going to talk about is based on Newton’s method (or is even simpler).

Newton’s Method

We’ve seen this informally, but formally, a single step of Newton’s method looks like this:

x = x - \frac{f(x)}{f'(x)}

That is, we divide the y value at our current point by the derivative at our current point, and subtract that from our current x to get our new x.

Bullet Points:

  • Requires f(x) and f'(x)
  • Requires a “good guess” for an initial x value
  • Converges Quadratically

More info: https://en.wikipedia.org/wiki/Newton’s_method

Secant Method

What if you don’t have the analytical first derivative to your function and still want to use Newton’s Method?

Well, you can use finite differences to get the first derivative: move the x over a little bit, see how the y value changes over distance, and use that as your numerical derivative, instead of the analytical one. (A blog post of mine on finite differences: https://blog.demofox.org/2015/08/02/finite-differences/)

If you do that, you are now using the secant Method.

m = \frac{f(x+e)-f(x-e)}{2e}

x = x - \frac{f(x)}{m}

The above uses “central differences” to calculate the first derivative numerically. e is a tuneable parameter, but should be small (without triggering numerical issues) – like perhaps maybe 0.01.

Bullet Points:

  • Requires f(x)
  • Requires a “good guess” for an initial x value
  • Converges at a rate of 1.618… (the golden ratio! ??!) so is slower than Newton’s method, but can be handy if you don’t have the first derivative.

More info: https://en.wikipedia.org/wiki/Secant_method

Halley’s Method (& Householder’s method)

If the first derivative helped us find a zero, surely the second derivative can help us find it even faster, right?

Yep. If you generalize Newton’s method to using the second derivative, you get Halley’s method.

x = x - \frac{2f(x)f'(x)}{2f'(x)^2-f(x)f''(x)}

Bullet Points:

  • Requires f(x), f'(x) and f”(x). You can get the derivatives numerically but it will slow down convergence.
  • Requires a “good guess” for an initial x value
  • Converges cubically (fast!!)

Both Newton and Halley are part of a larger family of methods called “Householder’s Method” which generalizes Newton’s method to any order derivative.

More info:
https://en.wikipedia.org/wiki/Halley%27s_method
https://en.wikipedia.org/wiki/Householder%27s_method

Bisection

Bisection is simpler than the other methods. You start with a left x and a right x, with the requirement that the signs of the y values for these x’s have to be different (one positive, one negative) which means that a zero must be between the two x’s.

The algorithm itself is just a binary search, where you look at the value in the middle of the left and right x, and update the left or right x to be the middle, based on the sign of the y value at the middle.

Bullet Points:

  • Requires f(x)
  • Requires a min x and a max x that contains a zero between them, and the y values at those x’s have to have opposite signs
  • Converges Linearly (slow!)

More info:
https://en.wikipedia.org/wiki/Bisection_method

Experimental Results: y = x^2 – 1

Here’s the first equation: y=x^2-1

You can see visually that there’s a zero at x=-1 and at x=1.

Here’s how the various methods did at finding the roots. the x axis is how many steps were taken and the y axis is the absolute value of y at that step, which also happens to be the error too, since we are finding zeros.

Halley is the winner which is unsurprising with it’s cubic convergence rate, then newton with it’s quadratic convergence rate, then secant with it’s golden ratio convergence rate, and bisect comes in last after a strong start, with it’s linear convergence rate. Our experimental results match the theory, neat!

The values in the legend next to the result name specify what parameters were used. For newton and Halley, the value shown is the initial guess. For secant, x is the initial guess and px is the previous initial guess. For bisect it shows the range given to the bisection algorithm.

Choosing different parameters for those can drastically affect performance of the algorithms. Here are some less well chosen values shown along with the previous results. The previous results are the blue shades.

Bisection is now so bad that it dwarfs everything else. Let’s remove that.

Removing bisection, the first worst data point dropped to 100 from 2500 for all the “more poorly chosen parameters” which are orange, yellow and green. The previous values are the blues and the grey line, which basically look flat.

In these larger values with worse parameters, we are still seeing Halley winning, Newton coming in second, secant coming next, and then bisection, but we are seeing much worse performance. These parameters are important to getting good performance!

Experimental Results: y = sin(x)

Here’s the next equation: y=sin(x)

Here is the convergence for each technique:

You can’t see it, but the orange, grey and blue are all basically overlapping. Looking at the actual data below though, you can see that the winners are in the same order again.

Experimental Results: y = x^2-x-1

This is a fun equation, because a zero is at the golden ratio (and the other is at -1/goldenRatio): y=x^2-x-1

Here is the convergence graph:

An interesting thing happens here if we choose to start at the value “0.5” for Newton and Halley. The dark blue line for Newton is under the grey line for Halley which is flat and doesn’t go up and down. The reason this happens is that the first derivative is 0 at 0.5 so the algorithm doesn’t know which direction to go and gets stuck.

Using other initial guess values causes it not to be stuck, but you can see that newton and secant starting at 0.75 has a pretty big jump in error in the beginning.

Error Case: x^2+1

For the next equation we have: y=x^2+1

This graph has no zeroes. Our bisect bounds also do not have their needs met since one side is supposed to be negative and the other positive, but this graph is positive everywhere. That means we don’t have any results for bisect on this one.

Here’s the convergence graph:

Newton goes absolutely off the rails for a few samples. Lets remove it to see what the others are doing:

Secant goes real bad at the end, so let’s remove it to see that Halley is pretty well behaved despite there not actually being a root to find:

Ray vs Sphere

Let’s do something more interesting… let’s use these methods to find where a ray intersects a sphere. We can do this analytically (there is a formula to calculate this!) but we can use this as a simpler example of a more complex usage case.

We need a formula that is zero where a ray hits a sphere, and we need to be able to calculate it’s first and second derivative analytically ideally.

I fought with this for a bit but came up with a solution.

Here are the variables involved:

  • rayPos and rayDir, both are vec3’s
  • spherePos which is a vec3 and a sphereRadius which is a scalar
  • t – how far down the ray we are, and is a scalar

We are going to make the sphere center be the origin by subtracting it out of the ray position (then we don’t need spherePos anymore), so then all we have to find is where the magnitude of the vector from the origin to the ray position at time t is the sphere radius. To simplify calculating the derivatives, we are going to square both sides of the equation.

f(t) = MagnitudeSquared(rayPos + rayDir * t) - sphereRadius^2

If we can find where that equation is zero, that will tell us the time t that the ray hits the sphere, and we can get the actual intersected position (then normal, etc) from that.

For a 3d vector, Magnitude squared is just the below which is the distance equation without the square root.

MagnitudeSquared(P) = P.x^2+P.y^2+P.z^2

If you expand f(t) out for just the x axis to get the pattern per axis, you get:

rayPos.x^2+2*rayPos.x*rayDir.x*t+rayDir.x^2*t^2 - sphereRadius^2

You would add a similar thing for y and z – but not do the sphere radius part because it’s already handled above.

We can take the derivative of the above with respect to t, to get:

2*rayPos.x*rayDir.x+2*rayDir.x^2*t

So the first derivative is that equation, but also adding in the y and z axes:

f'(t) = 2*rayPos.x*rayDir.x+2*rayDir.x^2*t + 2*rayPos.y*rayDir.y+2*rayDir.y^2*t + 2*rayPos.z*rayDir.z+2*rayDir.z^2*t

That looks a bit like spaghetti but is way cleaner in code with a loop iteration for each axis 😛

We can then take the derivative of that first derivative to get the second derivative (still, with respect to t). That gives us this:

f''(t) = 2*rayDir.x^2 + 2*rayDir.y^2 + 2*rayDir.z^2

Ok cool, now that we have our equations, I used (0,0,0) for the ray position, (0,0,1) for the ray direction, (0.2, 0.3, 5.0) for the sphere position, and a sphere radius of 2.0.

Let’s see how our methods did at finding the intersection time of the ray vs the sphere.

Once again, Halley wins, newton comes in second, then secant, then bisect. Neat!

Finding Function Mins & Maxes

For being one of the headlines of the blog post title, this is actually a really simple thing to explain now that you know the rest.

The minimum and maximum of a function – by definition! – has a derivative (slope) of zero, because it’s right where the function goes flat. The function has either climbed to the top of a mountain, gone flat, and is heading back down, or it has gone to the bottom of a valley, gone flat, and is heading back up. There is actually also a third option called a saddlepoint where it is pausing it’s ascent or descent momentarily and then continues.

In any case, to find the minimum or maximum of a function, we need to find where it’s derivative is zero. We do this by doing root finding on it’s first derivative.

From there, we can plug that x value found into the original f(x) function to get our extrema value.

The code that goes with this blog post uses this technique to find the maximum value for the function y=-x^2+x+1.

It finds that the derivative (y=-2x+1) is zero at x=0.5. Plugging 0.5 into the original function for x gives us a value of 1.25. That does look like the correct x and y value for the maximum of the function, doesn’t it?

Links

The code that goes with this blog post can be found at: https://github.com/Atrix256/BasicRootFinding

A good 10 minute video on Newton’s Method

Here’s a way to get an analytic derivative of a function without having to do it symbolically. Dual numbers are a form of automatic differentiation, just like backpropagation is.
https://blog.demofox.org/2014/12/30/dual-numbers-automatic-differentiation/

You can extend this stuff to higher dimensions using the gradient instead of the derivative, and maybe a Jacobian matrix or Hesian matrix for higher derivatives. That might make a good future blog post.

We actually do something similar to Newton’s method when doing sphere tracing (ray marching with a distance estimate). We divide the distance value at a point in space (the y value, or f(x)) by the length of the gradient (which is basically the slope aka generalized derivative) to know that a surface can’t be any closer than that value, so we can step that length in whatever direction our ray wants to move. This also has applications to drawing 2d graphics. You can read about that here:
https://www.iquilezles.org/www/articles/distance/distance.htm