Calculating the Distance Between Points in “Wrap Around” (Toroidal) Space

Let’s say you are trying to find the distance between two points in 2D, but that these points are in a universe that “wraps around” like old video games – leaving the screen on the right, left, top or bottom side makes you re-appear on the opposite edge.

This universe is actually shaped like a toroid, also known as a doughnut. It’s actually an impossible object, a “flat torus”, so not exactly a doughnut, but whatever.

If you imagine yourself on the surface of a doughnut, it would behave exactly this way. If you go “down” you end up where you previously considered “up”. If you go far enough “left” you end up where you previously considered “right”.

How would you calculate the distance between two points in a universe like this?

Let’s imagine the situation below where we are trying to find the distance between the red point and the green point:

One way to do this would be to pick one of the points (I’m picking red in this case) and clone it 8 times to surround the cell like the below. You’d calculate the distance from the green point to each of the 9 red points, and whatever distance was smallest would be the answer.

Something not so desirable about this is that it takes 9 distance calculations to find the minimum distance. You can work with squared distances instead of regular distances to avoid a square root on each of these distance calculations, but that’s still a bit of calculation to do.

Going up in dimensions makes the problem even worse. In 3D, it requires 27 distance calculations to find the shortest point, and 81 distance calculations in 4D!

Luckily there’s a better way to approach this.

Let’s say that our universe (image) is 1 unit by 1 unit big (aka we are working in texture UVs). If you look at the image with 9 copies of the red dot, you can see that they are just the 9 possible combinations of having -1, +0, +1 on each axis added to the red dot’s coordinates. All possible combinations of the x and y axis having -1, +0 or +1 added to them are valid locations of the red dot.

Looking at the distance formula we can see that if we minimize each axis individually, that we will also end up with the minimal distance overall.

d = \sqrt{(x_2-x_1)^2+(y_2-y_1)^2}

So, the better way is to minimize each axis individually.

On the x axis you’d find if the x axis distance between the red and green point is minimal when you subtract 1 from the red dot’s x axis position, leave it alone, or add 1.

Whichever x axis value of the red dot gives you the minimal x axis 1D distance is the x axis location to use.

You’d repeat for the y axis to get the y axis location to use (and would repeat for any further axes for higher dimensions).

This gives you the closest point which you can then plug into the distance formula to get the distance between the points in this wrap around space.

You can actually do better though.

Still working on each axis individually, you can calculate the absoluate value of the 1D distance between the two points on that axis. If that distance is greater than 0.5, the real distance for that axis is 1-distance.

The intuition here is that if you are in a 1d repeating space, if going from A to B is more than half the distance, it means that you went the wrong way, and that going the other way is shorter. The distance of that other way is one minus whatever distance you just calculated since the distance from one point to itself is 1!

Do that for each axis and use those 1d distances in the distance formula to get the actual distance.

This lets you minimize the distance without having to explicitly figure out which combination makes the point closest.

More importantly, it lets you efficiently calculate the distance between the two points in toroidal space (doughnut space!)

The computational complexity is a lot better. It’s now linear in the number of dimensions: O(N), instead of O(3^N).

Here is some C++ to show you how it would work in 2D.

float ToroidalDistance (float x1, float y1, float x2, float y2)
{
    float dx = std::abs(x2 - x1);
    float dy = std::abs(y2 - y1);

    if (dx > 0.5f)
        dx = 1.0f - dx;

    if (dy > 0.5f)
        dy = 1.0f - dy;

    return std::sqrt(dx*dx + dy*dy);
}

I hit this problem trying to make a tileable texture. I needed to place a few circles on a texture such that the circles weren’t too close to each other, even when the texture was tiled.

The calculations above gave me the basic tool needed to be able to calculate distances between points. Subtracting circle radii from the distance between points let me get toroidal distance between circles and make sure I didn’t place them too closely to each other.

That let me make an image that kept the distance constraints even when it was tiled.

Here’s an example image by itself:

Here is the image tiled:

Half Tile Offset Streaming World Grids

A number of years ago I worked on an open world game that got cancelled. Despite it not being released, I learned a few things.

If you try to make a game where the player can walk around a large world without loading screens, chances are that the whole world won’t fit in memory at once.

Since it can’t all fit in memory at once, as the player moves around you are going to have to unload old chunks of the world to make room for the new chunks that the player is about to enter.

To simplify the problem, it’s really common to break the world up into a grid, and keep a radius of tiles loaded around the player at any one time.

In the above you can see that 9 tiles are kept in memory. If the player crosses the boundary to the cell to the left, we unload the cells on the right (red cells) and load new cells in on the left (green cells).

The idea is that we keep a border of cells loaded around the player at all times so that they never see the edge of the world, but we don’t have to keep the whole world in memory at the same time.

The size of the cells can vary depending on the actual needs of your game. If you can travel very quickly in your game, you may need larger cells.

Instead of just having 9 large cells loaded at any one time, you may instead opt to have smaller cells but have more layers of them loaded at any one time. The below has 2 layers of cells loaded around the player, so has to keep 5×5 = 25 cells loaded at any one time. This can give you more granularity if you need it.

The number of cells you have to keep in memory when keeping N layers of cells loaded around the player is (2N+1)^2.

1 layer means 9 cells, 2 layers mean 25 cells, 3 layers mean 49 cells, 4 layers mean 81 cells and so on.

This isn’t the only way to arrange your world tiles though. You can also make it so every row of tiles is offset by half a tile from the one above it. That gives you a setup like this:

With this setup, keeping a single layer of cells loaded around the player takes only 7 cells instead of 9. That might not sound like much, but that means that your memory budget is 129% of what it was the other way.

Alternately, it means you can keep your cells at the same quality level but only have to load 78% as much stuff from disk as the player moves around the world.

For keeping N layers of cells around the player you need to keep 3N^2+3N+1 cells in memory.

1 layer means 7 cells, 2 layers mean 19 cells, 3 layers mean 37 cells, 4 layers mean 61 cells.

Here’s a table to show you how the regular grid compares to the half offset grid for cell counts. The savings do get better with more layers, but not very quickly.

\begin{array}{c|c|c|c} \text{Layers} & \text{Regular Grid} & \text{Half Offset Grid} & \text{Size} \\ \hline 1 & 9 & 7 & 77.8\%\\ 2 & 25 & 19 & 76\%\\ 3 & 49 & 37 & 75.5\%\\ 4 & 81 & 61 & 75.3\%\\ \end{array}

Overall, this is a pretty cool technique that is pretty low cost if you do this early in the project. Once you have a lot of content divided into a regular grid, it can be a challenge to move over to this half tile offset grid.

Some Notes From Readers

@chrispewebb said that an issue he’s faced when going this method is having T junctions on LODed terrain, but that skirts should be able to help there.

@runevision pointed out that while the memory requirements are lowered, so is the shortest distance (radius) from the player to data that isn’t loaded. One idea to deal with this if it’s a problem could be to use smaller cell sizes and to do more layers to make up for it.

Generating Random Numbers From a Specific Distribution With Rejection Sampling

The last post showed how to transform uniformly generated random numbers into any random number distribution you desired.

It did so by turning the PDF (probability density function) into a CDF (cumulative density function) and then inverting it – either analytically (making a function) or numerically (making a look up table).

This post will show you how to generate numbers from a PDF as well, but will do so using rejection sampling.

Dice

Let’s say you wanted to simulate a fair five sided die but that you only had a six sided die.

You can use rejection sampling for this by rolling a six sided die and ignoring the roll any time a six came up. Doing that, you do in fact get a fair five sided die roll!

This shows doing that to get 10,000 five sided die rolls:

One disadvantage to this method is that you are throwing away die rolls which can be a source of inefficiency. In this setup it takes 1.2 six sided die rolls on average to get a valid five sided die roll since a roll will be thrown away 1/6 of the time.

Another disadvantage is that each time you need a new value, there are an unknown number of die rolls needed to get it. On average it’s true that you only need 1.2 die rolls, but in reality, it’s possible you may roll 10 sixes in a row. Heck it’s even technically possible (but very unlikely) that you could be rolling dice until the end of time and keep getting sixes. (Using PRNG’s in computers, this won’t happen, but it does take a variable number of rolls).

This is just to say: there is uneven and unpredictable execution time of this algorithm, and it needs an unknown (but somewhat predictable) amount of random numbers to work. This is true of the other forms of sampling methods I talk about lower down as well.

Instead of using a six sided die you could use a die of any size that is greater than (or equal to…) five. Here shows a twenty sided die simulating a five sided die:

It looks basically the same as using a six sided die, which makes sense (that shows that it works), but in this case, it actually took 4 rolls on average to make a valid five sided die roll, since the roll fails 15/20 times (3 out of 4 rolls will fail).

Quick Asides:

  • If straying from rejection sampling ideas for a minute, in the case of the twenty sided die, you could use modulus to get a fair five sided die roll each time: ((roll - 1) \% 5) + 1. This works because there is no remainder for 20 % 5. If there was a remainder it would bias the rolls towards the numbers <= the remainder, making them more likely to come up than the other numbers.
  • You could also get a four sided die roll at the same time if you didn’t want to waste any of this precious random information: ((roll - 1) / 5) + 1
  • Another algorithm to check out for discrete (integer) weighted random numbers is Vose’s method: Vose’s Method.

Box Around PDF

Moving back into the world of continuous valued random numbers and PDF’s, a simple version of how rejection sampling can be used is like this:

  1. Graph your PDF
  2. Draw a box around the PDF
  3. Generate a (uniform) random point in that box
  4. If the point is under the curve of the PDF, use the x axis value as your random number, else throw it out and go to 1

That’s all there is to it!

This works because the x axis value of your 2d point is the random number you might be choosing. The y axis value of your 2d point is a probability of choosing that point. Since the PDF graph is higher in places that are more probable, those places are more likely to accept your 2d point than places that have lower PDF values.

Furthermore, the average number of rejected samples vs accepted samples is based on the area under the PDF compared to the area of the box.

The number of samples on average will be the area of the box divided by the area of the PDF.

Since PDF’s by definition have to integrate to 1, that means that you are dividing by 1. So, to simplify: The number of samples on average will be the same as the area of the box!

If it’s hard to come up with the exact size of the box for the PDF, the box doesn’t have to fit exactly, but of course the tighter you can fit the box around the PDF, the fewer rejected samples you’ll have.

You don’t actually need to graph the PDF and draw a box to do this though. Just generate a 2d random number (a random x and a random y) and reject the point if PDF(x) < y.

Here I'm using this technique with the PDF y=2x where x is in [0,1) and I'm using a box that goes from (0,0) to (1,2) to get 100,000 samples.

As expected, it took on average 2 points to get a single valid point since the area of the box is 2. Here are how many failed tests each histogram bucket had. Unsurprisingly, lower values of the PDF have more failed tests!

Moving to a more complex PDF, let’s look at y=\frac{x^3-10x^2+5x+11}{10.417}

Here are 10 million samples (lots of samples to minimize the noise), using a box height of 1.2, which unsurprisingly takes 1.2 samples on average to get a valid sample:

Here is the graph of the failure counts:

Here the box has a height of 2.8. It still works, but uses 2.8 samples on average which is less efficient:

Here’s the graph of failure counts:

Something interesting about this technique is that technically, the distribution you are sampling from doesn’t even have to be a PDF! If you have negative parts of the graph, they will be treated as zero, assuming your box has a minimum y of 0. Also, the fact that your function may not integrate to (have an area of) 1 doesn’t matter at all.

Here we take the PDF from the last examples, and take off the division by a constant, so that it doesn’t integrate to 1: y=x^3-10x^2+5x+11

The interesting thing is that we get as output a normalized PDF (the red line), even though the distribution we were using to sample was not normalized (the blue line, which is mostly hidden behind the yellow line).

Here are the rejection counts:

Generating One PDF from Another PDF

In the last section we showed how to enclose a PDF in a box, make uniformly random 2d points, and use them to generate points from the PDF.

By enclosing it in a box, all we were really doing is putting it under a uniform distribition that was scaled up to be larger than the PDF at all points.

Now here’s the interesting thing: We aren’t limited to using the uniform distribution!

To generalize this technique, if you are trying to sample from a PDF f(x), you can use any PDF g(x) to do so, so long as you multiply g(x) by a scalar value M so that M*g(x)>= f(x) for all values of x. In other words: scale up g so that it’s always bigger than f.

Using this more generalized technique has one or two more steps than the other way, but allows for a tighter fit of a generating function, resulting in fewer samples thrown away.

Here’s how to do it:

  1. Generate a random number from the distribution g, and call it x.
  2. Calculate the percentage chance of x being chosen by getting a ratio of how likely that number is to be chosen in each PDF: \frac{f(x)}{M*g(x)}
  3. Generate a uniform random number from 0 to 1. If it’s less than the value you just calculated, accept x as the random number, else reject it and go back to 1.

Let’s see this in action!

We’ll generate numbers in a Gaussian distribution with a mean of 15 and a standard deviation of 5. We’ll truncate it to +/- 3 standard deviations so we want to generate random numbers from [0,30).

To generate these numbers, we’ll draw random numbers from the PDF y=x*0.002222. We’ll use an M value of 3 to scale up this PDF to always be greater than the Gaussian one.

Here is how it looks doing this with 20,000 samples:

We generate random numbers along the red line, multiply them by 3 to make them be the yellow line. Then, at whatever point we are at on the x axis, we divide the blue line value by the yellow line value and use that as an acceptance probability. Doing this and counting numbers in a histogram gives us our result – the green line. Since the end goal is the blue line, you can see it is indeed working! With a larger number of samples, the green line would more closely match the blue line.

Here’s the graph of the failed tests:

We have to take on average 3 samples before we get a valid random number. That shouldn’t be too surprising because both PDF’s start with area of 1, but we are multiplying one of them by 3 to make it always be larger than the other.

Something else interesting you might notice is that we have a lot fewer failed tests where the two PDF functions are more similar.

That is the power of this technique: If you can cheaply and easily generate samples that are “pretty close” to a harder distribution to sample from, you can use this technique to more cheaply sample from it.

Something to note is that just like in the last section, the target PDF doesn’t necessarily need to be a real PDF with only positive values and integrating to 1. It would work just the same with a non PDF function, just so long as the PDF generating the random numbers you start with is always above the function.

Some Other Notes

There is family of techniques called “adaptive rejection sampling” that will change the PDF they are drawing from whenever there is a failed test.

Basically, if you imagine the PDF you are drawing from as being a bunch of line segments connected together, you could imagine that whenever you failed a test, you moved a line segment down to be closer to the curve, so that when you sampled from that area again, the chances would be lower that you’d fail the test again.

Taking this to the limit, your sampling PDF will eventually become the PDF you are trying to sample from, and then using this PDF will be a no-op.

These techniques are a continued area of research.

Something else to note is that rejection sampling can be used to find random points within shapes.

For instance, a random point on a triangle, ellipse or circle could be done by putting a (tight) bounding box around the shape, generating points randomly in that box, and only accepting ones within the inner shape.

This can be extended to 3d shapes as well.

Some shapes have better ways to generate points within them that don’t involve iteration and rejected samples, but if all else fails, rejection sampling does indeed work!

At some point in the future I’d like to look into “Markov Chain Monte Carlo” more deeply. It seems like a very interesting technique to approach this same problem, but I have no idea if it’s used often in graphics, especially real time graphics.

Code

Here is the code that generated all the data from this post. The data was visualized with open office.

#define _CRT_SECURE_NO_WARNINGS
 
#include <stdio.h>
#include <random>
#include <array>
#include <unordered_map>

template <size_t NUM_TEST_SAMPLES, size_t SIMULATED_DICE_SIDES, size_t ACTUAL_DICE_SIDES>
void TestDice (const char* fileName)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<size_t> dist(0, ACTUAL_DICE_SIDES-1);

    // generate the histogram
    std::array<size_t, SIMULATED_DICE_SIDES> histogram = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        size_t roll = dist(rng);
        while (roll >= SIMULATED_DICE_SIDES)
        {
            ++rejectedSamples;
            roll = dist(rng);
        }
        histogram[roll]++;
    }

    // write the histogram and rejected sample count to a csv
    // an extra 0 data point forces the graph to include 0 in the scale. hack to make the data not look noisier than it really is.
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "Actual Count, Expected Count, , %0.2f samples needed per roll on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t value : histogram)
        fprintf(file, "%zu,%zu,0\n", value, (size_t)(float(NUM_TEST_SAMPLES) / float(SIMULATED_DICE_SIDES)));
    fclose(file);
}
 
template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void Test (const char* fileName, float maxPDFValue, const PDF_LAMBDA& PDF)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // Generate a sample from the PDF by generating a random 2d point.
        // If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float pointX = 0.0f;
        float pointY = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            pointX = dist(rng);
            pointY = dist(rng) * maxPDFValue;
            float pdfValue = PDF(pointX);
            validPoint = (pointY <= pdfValue);

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }
 
        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "PDF, Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
        float pdfSample = PDF(x);
        fprintf(file, "%f,%f,%f,%f\n",
            pdfSample,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
            maxPDFValue,
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void TestNotPDF (const char* fileName, float maxPDFValue, float normalizationConstant, const PDF_LAMBDA& PDF)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // Generate a sample from the PDF by generating a random 2d point.
        // If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float pointX = 0.0f;
        float pointY = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            pointX = dist(rng);
            pointY = dist(rng) * maxPDFValue;
            float pdfValue = PDF(pointX);
            validPoint = (pointY <= pdfValue);

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }
 
        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "Function, Simulated PDF, Scaled Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
        float pdfSample = PDF(x);
        fprintf(file, "%f,%f,%f,%f,%f\n",
            pdfSample,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES) * normalizationConstant,
            maxPDFValue,
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_F_LAMBDA, typename PDF_G_LAMBDA, typename INVERSE_CDF_G_LAMBDA>
void TestPDFToPDF (const char* fileName, const PDF_F_LAMBDA& PDF_F, const PDF_G_LAMBDA& PDF_G, float M, const INVERSE_CDF_G_LAMBDA& Inverse_CDF_G, float rngRange)
{
    // We generate a sample from PDF F by generating a sample from PDF G, and accepting it with probability PDF_F(x)/(M*PDF_G(x))

    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // generate random points until we have one that's accepted
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float sampleG = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            // Generate a sample from the soure PDF G
            sampleG = Inverse_CDF_G(dist(rng));

            // calculate the ratio of how likely we are to accept this sample
            float acceptChance = PDF_F(sampleG) / (M * PDF_G(sampleG));

            // see if we should accept it
            validPoint = dist(rng) <= acceptChance;

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }

        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "PDF F,PDF G,Scaled PDF G,Simulated PDF,Failed Tests,%0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) * rngRange / float(NUM_HISTOGRAM_BUCKETS);
        
        fprintf(file, "%f,%f,%f,%f,%f\n",
            PDF_F(x),
            PDF_G(x),
            PDF_G(x)*M,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / (float(NUM_TEST_SAMPLES)*rngRange),
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}
 
int main(int argc, char **argv)
{
    // Dice
    {
        // Simulate a 5 sided dice with a 6 sided dice
        TestDice<10000, 5, 6>("test1_5_6.csv");

        // Simulate a 5 sided dice with a 20 sided dice
        TestDice<10000, 5, 20>("test1_5_20.csv");
    }

    // PDF y=2x, simulated with a uniform distribution
    {
        auto PDF = [](float x) { return 2.0f * x; };

        Test<1000, 100>("test2_1k.csv", 2.0f, PDF);
        Test<100000, 100>("test2_100k.csv", 2.0f, PDF);
        Test<1000000, 100>("test2_1m.csv", 2.0f, PDF);
    }

    // PDF y=(x^3-10x^2+5x+11)/10.417, simulated with a uniform distribution
    {
        auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f) / (10.417f); };
        Test<10000000, 100>("test3_10m_1_15.csv", 1.15f, PDF);
        Test<10000000, 100>("test3_10m_1_5.csv", 1.5f, PDF);
        Test<10000000, 100>("test3_10m_2_8.csv", 2.8f, PDF);
    }

    // function (not PDF, Doesn't integrate to 1!) y=(x^3-10x^2+5x+11), simulated with a scaled up uniform distribution
    {
        auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f); };
        TestNotPDF<10000000, 100>("test4_10m_12_5.csv", 12.5f, 10.417f, PDF);
    }

    // Generate samples from PDF F using samples from PDF G.  random numbers are from 0 to 30.
    // F PDF = gaussian distribution, mean 15, std dev of 5.  Truncated to +/- 3 stddeviations.
    // G PDF = x*0.002222
    // G CDF = 0.001111 * x^2
    // G inverted CDF = (1000 * sqrt(x)) / sqrt(1111)
    // M = 3
    {
        // gaussian PDF F
        const float mean = 15.0f;
        const float stddev = 5.0f;
        auto PDF_F = [=] (float x) -> float
        {
            return (1.0f / (stddev * sqrt(2.0f * (float)std::_Pi))) * std::exp(-0.5f * pow((x - mean) / stddev, 2.0f));
        };

        // PDF G
        auto PDF_G = [](float x) -> float
        {
            return x * 0.002222f;
        };

        // Inverse CDF of G
        auto Inverse_CDF_G = [] (float x) -> float
        {
            return 1000.0f * std::sqrtf(x) / std::sqrtf(1111.0f);
        };

        TestPDFToPDF<20000, 100>("test5.csv", PDF_F, PDF_G, 3.0f, Inverse_CDF_G, 30.0f);
    }

    return 0;
}

Generating Random Numbers From a Specific Distribution By Inverting the CDF

The last post talked about the normal distribution and showed how to generate random numbers from that distribution by generating regular (uniform) random numbers and then counting the bits.

What would you do if you wanted to generate random numbers from a different, arbitrary distribution though? Let’s say the distribution is defined by a function even.

It turns out that in general this is a hard problem, but in practice there are a few ways to approach it. The below are the most common techniques for achieving this that I’ve seen.

  • Inverting the CDF (analytically or numerically)
  • Rejection Sampling
  • Markov Chain Monte Carlo
  • Ziggurat algorithm

This post talks about the first one listed: Inverting the CDF.

What Is A CDF?

The last post briefly explained that a PDF is a probability density function and that it describes the relative probability of numbers being chosen at random. A requirement of a PDF is that it has non negative value everywhere and also that the area under the curve is 1.

It needs to be non negative everywhere because a negative probability doesn’t make any sense. It needs to have an area under the curve of 1 because that means it represents the full 100% probability of all possible outcomes.

CDF stands for “Cumulative distribution function” and is related to the PDF.

A PDF is a function y=f(x) where y is the probability of the number x number being chosen at random from the distribution.

A CDF is a function y=f(x) where y is the probability of the number x, or any lower number, being chosen at random from that distribution.

You get a CDF from a PDF by integrating the PDF.

Why Invert the CDF? (And Not the PDF?)

With both a PDF and a CDF, you plug in a number, and you get information about probabilities relating to that number.

To get a random number from a specific distribution, we want to do the opposite. We want to plug in a probability and get out the number corresponding to that probability.

Basically, we want to flip x and y in the equation and solve for y, so that we have a function that does this. That is what we have to do to invert the CDF.

Why invert the CDF though and not the PDF? Check out the images below from Wikipedia. The first is some Gaussian PDF’s and the second is the same distributions as CDF’s:


The issue is that if we flip x and y’s in a PDF, there would be multiple y values corresponding to the same x. This isn’t true in a CDF.

Let’s work through sampling some PDFs by inverting the CDF.

Example 0: y=1

This is the easiest case and represents uniform random numbers, where every number is evenly likely to be chosen.

Our PDF equation is: y=1 where x \in [0,1]. The graph looks like this:

If we integrate the pdf to get the cdf, we get y=x where x \in [0,1] which looks like this:

Now, to invert the cdf, we flip x and y, and then solve for y again. It’s trivially easy…

y=x \Leftarrow \text{CDF}\\ x=y \Leftarrow \text{Flip x and y}\\ y=x \Leftarrow \text{Solve for y again}

Now that we have our inverted CDF, which is y=x, we can generate uniform random numbers, plug them into that equation as x and get y which is the actual value drawn from our PDF.

You can see that since we are plugging in numbers from an even distribution and not doing anything to them at all, that the result is going to an even distribution as well. So, we are in fact generating uniformly distributed random numbers using this inverted CDF, just like our PDF asked for.

This is so trivially simple it might be confusing. If so, don’t sweat it. Move onto the next example and you can come back to this later if you want to understand what I’m talking about here.

Note: The rest of the examples are going to have x in [0,1] as well but we are going to stop explicitly saying so. This process still works when x is in a different range of values, but for simplicity we’ll just have x be in [0,1] for the rest of the post.

Example 1: y=2x

The next easiest case for a PDF is y=2x which looks like this:

You might wonder why it’s y=2x instead of y=x. This is because the area under the curve y=x is 0.5. PDF’s need to have an area of 1, so I multiplied by 2 to make it have an area of 1.

What this PDF means is that small numbers are less likely to be picked than large numbers.

If we integrate the PDF y=2x to get the CDF, we get y=x^2 which looks like this:

Now let’s flip x and y and solve for y again.

y=x^2 \Leftarrow \text{CDF}\\ x=y^2 \Leftarrow \text{Flip x and y}\\ y=\sqrt{x} \Leftarrow \text{Solve for y again}

We now have our inverted CDF which is y=\sqrt{x} and looks like this:

Now, if we plug uniformly random numbers into that formula as x, we should get as output samples that follow the probability of our PDF.

We can use a histogram to see if this is really true. We can generate some random numbers, square root them, and count how many are in each range of values.

Here is a histogram where I took 1,000 random numbers, square rooted them, and put their counts into 100 buckets. Bucket 1 counted how many numbers were in [0, 0.01), bucket 2 counted how many numbers were in [0.01, 0.02) and so on until bucket 100 which counted how many numbers were in [0.99, 1.0).

Increasing the number of samples to 100,000 it gets closer:

At 1,000,000 samples you can barely see a difference:

The reason it doesn’t match up at lower sample counts is just due to the nature of random numbers being random. It does match up, but you’ll have some variation with lower sample counts.

Example 2: y=3x^2

Let’s check out the PDF y=3x^2. The area under that curve where x is in [0,1) is 1.0 and it’s non negative everywhere in that range too, so it’s a valid PDF.

Integrating that, we get y=x^3 for the CDF. Then we invert the CDF:

y=x^3 \Leftarrow \text{CDF}\\ x=y^3 \Leftarrow \text{Flip x and y}\\ y=\sqrt[3]{x} \Leftarrow \text{Solve for y again}

And here is a 100,000 sample histogram vs the PDF to verify that we got the right answer:

Example 3: Numeric Solution

So far we’ve been able to invert the CDF to get a nice easy function to transform uniform distribution random numbers into numbers from the distribution described by the PDF.

Sometimes though, inverting a CDF isn’t possible, or gives a complex equation that is costly to evaluate. In these cases, you can actually invert the CDF numerically via a lookup table.

A lookup table may also be desired in cases where eg you have a pixel shader that is drawing numbers from a PDF, and instead of making N shaders for N different PDFs, you want to unify them all into a single shader. Passing a lookup table via a constant buffer, or perhaps even via a texture can be a decent solution here. (Note: if storing in a texture you may be interested in fitting the data with curves and using this technique to store it and recall it from the texture: GPU Texture Sampler Bezier Curve Evaluation)

Let’s invert a PDF numerically using a look up table to see how that would work.

Our PDF will be:

y=\frac{x^3-10x^2+5x+11}{10.417}

And looks like this:

It’s non negative in the range we care about and it integrates to 1.0 – or it integrates closely enough… the division by 10.417 is there for that reason, and using more digits would get it closer to 1.0.

What we are going to do is evaluate that PDF at N points to get a probability for those samples of numbers. That will give us a lookup table for our PDF.

We are then going to make each point be the sum of all the PDF samples to the left of it to make a lookup table for a CDF. We’ll also have to normalize the CDF table since it’s likely that our PDF samples don’t all add up (integrate) to 1.0. We do this by dividing every item in the CDF by the last entry in the CDF. If you look at the table after that, it will fully cover everything from 0% to 100% probability.

Below are some histogram comparisons of the lookup table technique vs the actual PDF.

Here is 100 million samples (to make it easier to see the data without very much random noise), in 100 histogram buckets, and a lookup table size of 3 which is pretty low quality:

Increasing it to a lookup table of size 5 gives you this:

Here’s 10:

25:

And here’s 100:

So, not surprisingly, the size of the lookup table affects the quality of the results!

Code

here is the code I used to generate the data in this post, which i visualized with open office. I visualized the function graphs using wolfram alpha.

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <random>
#include <array>
#include <unordered_map>

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA, typename INVERSE_CDF_LAMBDA>
void Test (const char* fileName, const PDF_LAMBDA& PDF, const INVERSE_CDF_LAMBDA& inverseCDF)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);

    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // put a uniform random number into the inverted CDF to sample the PDF
        float x = dist(rng);
        float y = inverseCDF(x);

        // increment the correct bin on the histogram
        size_t bin = (size_t)std::floor(y * float(NUM_HISTOGRAM_BUCKETS));
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
    }

    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "PDF, Inverted CDF\n");
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
        float pdfSample = PDF(x);
        fprintf(file, "%f,%f\n",
            pdfSample,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES)
        );
    }
    fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, size_t LOOKUP_TABLE_SIZE, typename PDF_LAMBDA>
void TestPDFOnly (const char* fileName, const PDF_LAMBDA& PDF)
{
    // make the CDF lookup table by sampling the PDF
    // NOTE: we could integrate the buckets by averaging multiple samples instead of just the 1. This bucket integration is pretty low tech and low quality.
    std::array<float, LOOKUP_TABLE_SIZE> CDFLookupTable;
    float value = 0.0f;
    for (size_t i = 0; i < LOOKUP_TABLE_SIZE; ++i)
    {
        float x = float(i) / float(LOOKUP_TABLE_SIZE - 1); // The -1 is so we cover the full range from 0% to 100%
        value += PDF(x);
        CDFLookupTable[i] = value;
    }

    // normalize the CDF - make sure we span the probability range 0 to 1.
    for (float& f : CDFLookupTable)
        f /= value;

    // make our LUT based inverse CDF
    // We will binary search over the y's (which are sorted smallest to largest) looking for the x, which is implied by the index.
    // I'm sure there's a better & more clever lookup table setup for this situation but this should give you an idea of the technique
    auto inverseCDF = [&CDFLookupTable] (float y) {

        // there is an implicit entry of "0%" at index -1
        if (y < CDFLookupTable[0])
        {
            float t = y / CDFLookupTable[0];
            return t / float(LOOKUP_TABLE_SIZE);
        }

        // get the lower bound in the lut using a binary search
        auto it = std::lower_bound(CDFLookupTable.begin(), CDFLookupTable.end(), y);

        // figure out where we are at in the table
        size_t index = it - CDFLookupTable.begin();

        // Linearly interpolate between the values
        // NOTE: could do other interpolation methods, like perhaps cubic (https://blog.demofox.org/2015/08/08/cubic-hermite-interpolation/)
        float t = (y - CDFLookupTable[index - 1]) / (CDFLookupTable[index] - CDFLookupTable[index - 1]);
        float fractionalIndex = float(index) + t;
        return fractionalIndex / float(LOOKUP_TABLE_SIZE);
    };

    // call the usual function to do the testing
    Test<NUM_TEST_SAMPLES, NUM_HISTOGRAM_BUCKETS>(fileName, PDF, inverseCDF);
}

int main (int argc, char **argv)
{
    // PDF: y=2x
    // inverse CDF: y=sqrt(x)
    {
        auto PDF = [] (float x) { return 2.0f * x; };
        auto inverseCDF = [] (float x) { return std::sqrt(x); };

        Test<1000, 100>("test1_1k.csv", PDF, inverseCDF);
        Test<100000, 100>("test1_100k.csv", PDF, inverseCDF);
        Test<1000000, 100>("test1_1m.csv", PDF, inverseCDF);
    }

    // PDF: y=3x^2
    // inverse CDF: y=cuberoot(x) aka y = pow(x, 1/3)
    {
        auto PDF = [] (float x) { return 3.0f * x * x; };
        auto inverseCDF = [](float x) { return std::pow(x, 1.0f / 3.0f); };

        Test<100000, 100>("test2_100k.csv", PDF, inverseCDF);
    }

    // PDF: y=(x^3-10x^2+5x+11)/10.417
    // Inverse CDF Numerically via a lookup table
    {
        auto PDF = [] (float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f) / (10.417f); };
        TestPDFOnly<100000000, 100, 3>("test3_100m_3.csv", PDF);
        TestPDFOnly<100000000, 100, 5>("test3_100m_5.csv", PDF);
        TestPDFOnly<100000000, 100, 10>("test3_100m_10.csv", PDF);
        TestPDFOnly<100000000, 100, 25>("test3_100m_25.csv", PDF);
        TestPDFOnly<100000000, 100, 25>("test3_100m_100.csv", PDF);
    }

    return 0;
}

Counting Bits & The Normal Distribution

I recently saw some interesting posts on twitter about the normal distribution:

I’m not really a statistics kind of guy, but knowing that probability distributions come up in graphics (Like in PBR & Path Tracing), it seemed like a good time to upgrade knowledge in this area while sharing an interesting technique for generating normal distribution random numbers.

Basics

Below is an image showing a few normal (aka Gaussian) distributions (from wikipedia).

Normal distributions are defined by these parameters:

  • \mu – “mu” is the mean. This is the average value of the distribution. This is where the center (peak) of the curve is on the x axis.
  • \sigma^2 – “sigma squared” is the variance, and is just the standard deviation squared. I find standard deviation more intuitive to think about.
  • \sigma – “sigma” is the standard deviation, which (surprise surprise!) is the square root of the variance. This controls the “width” of the graph. The area under the cover is 1.0, so as you increase standard deviation and make the graph wider, it also gets shorter.

Here’s a diagram of standard deviations to help understand them (also from wikipedia):

I find the standard deviation intuitive because 68.2% of the data is within one standard deviation from the mean (on the plus and minus side of the mean). 95.4% of the data is within two standard deviations of the mean.

Standard deviation is given in the same units as the data itself, so if a bell curve described scores on a test, with a mean of 80 and a standard deviation of 5, it means that 68.2% of the students got between 75 and 85 points on the test, and that 95.4% of the students got between 70 and 90 points on the test.

The normal distribution is what’s called a “probability density function” or pdf, which means that the y axis of the graph describes the likelyhood of the number on the x axis being chosen at random.

This means that if you have a normal distribution that has a specific mean and variance (standard deviation), that numbers closer to the mean are more likely to be chosen randomly, while numbers farther away are less likely. The variance controls how the probability drops off as you get farther away from the mean.

Thinking about standard deviation again, 68.2% of the random numbers generated will be within 1 standard deviation of the mean (+1 std dev or -1 std dev). 95.4% will be within 2 standard deviations.

Generating Normal Distribution Random Numbers – Coin Flips

Generating uniform random numbers, where every number is as likely as every other number, is pretty simple. In the physical world, you can roll some dice or flip some coins. In the software world, you can use PRNGs.

How would you generate random numbers that follow a normal distribution though?

In C++, there is std::normal_distribution that can do this for you. There is also something called the Box-Muller transform that can turn uniformly distributed random numbers into normal distribution random numbers (info here: Generating Gaussian Random Numbers).

I want to talk about something else though and hopefully build some better intuition.

First let’s look at coin flips.

If you flip a fair coin a million times and keep a count of how many heads and tails you saw, you might get 500014 heads and 499986 tails (I got this with a PRNG – std::mt19937). That is a pretty uniform distribution of values in the range of [0,1]. (breadcrumb: pascal’s triangle row 2 is 1,1)

Let’s flip two coins at a time though and add our values together (say that heads is 0 and tails is 1). Here’s what that graph looks like:

Out of 1 million flips, 250639 had no tails, 500308 had one tail, and 249053 had two tails. It might seem weird that they aren’t all even, but it makes more sense when you look at the outcome of flipping two coins: we can get heads/heads (00), heads/tails (01), tails/heads (10) or tails/tails (11). Two of the four possibilities have a single tails, so it makes sense that flipping two coins and getting one coin being a tail would be twice as likely as getting no tails or two tails. (breadcrumb: pascal’s triangle row 3 is 1,2,1)

What happens when we sum 3 coins? With a million flips I got 125113 0’s, 375763 1’s, 373905 2’s and 125219 3’s.

If you work out the possible combinations, there is 1 way to get 0, 3 ways to get 1, 3 ways to get 2 and 1 way to get 3. Those numbers almost exactly follow that 1, 3, 3, 1 probability. (breadcrumb: pascal’s triangle row 4 is 1,3,3,1)

If we flip 100 coins and sum them, we get this:

That looks a bit like the normal distribution graphs at the beginning of this post doesn’t it?

Flipping and summing coins will get you something called the “Binomial Distribution”, and the interesting thing there is that the binomial distribution approaches the normal distribution the more coins you are summing together. At an infinite number of coins, it is the normal distribution.

Generating Normal Distribution Random Numbers – Dice Rolls

What if instead of flipping coins, we roll dice?

Well, rolling a 4 sided die a million times, you get each number roughly the same percentage of the time as you’d expect; roughly 25% each. 250125 0’s, 250103 1’s, 249700 2’s, 250072 3’s.

If we sum two 4 sided dice rolls we get this:

If we sum three 4 sided dice rolls we get this:

And if we sum one hundred we get this, which sure looks like a normal distribution:

This isn’t limited to four sided dice though, here’s one hundred 6 sided dice being summed:

With dice, instead of being a “binomial distribution”, it’s called a “multinomial distribution”, but as the number of dice goes to infinity, it also approaches the normal distribution.

This means you can get a normal distribution with not only coins, but any sided dice in general.

An even stronger statement than that is the Central Limit Theorem which says that if you have random numbers from ANY distribution, if you add enough of em together, you’ll often approach a normal distribution.

Strange huh?

Generating Normal Distribution Random Numbers – Counting Bits

Now comes a fun way of generating random numbers which follow a normal distribution. Are you ready for it?

Simply generate an N bit random number and return how many 1 bits are set.

That gives you a random number that follows a normal distribution!

One problem with this is that you have very low “resolution” random numbers. Counting the bits of a 64 bit random number for instance, you can only return 0 through 64 so there are only 65 possible random numbers.

That is a pretty big limitation, but if you need normal distribution numbers calculated quickly and don’t mind if they are low resolution (like in a pixel shader?), this technique could work well for you.

Another problem though is that you don’t have control over the variance or the mean of the distribution.

That isn’t a super huge deal though because you can easily convert numbers from one normal distribution into another normal distribution.

To do so, you get your normal distribution random number. First you subtract the mean of the distribution to make it centered on 0 (have a mean of 0). You then divide it by the standard deviation to make it be part of a distribution which has a standard deviation of 1.

At this point you have a random number from a normal distribution which has a mean of 0 and a standard deviation of 1.

Next, you multiply the number by the standard deviation of the distribution you want, and lastly you add the mean of the distribution you want.

That’s pretty simple (and is implemented in the source code at the bottom of this post), but to do this you need to know what standard deviation (variance) and mean you are starting with.

If you have some way to generate random numbers in [0, N) and you are summing M of those numbers together, the mean is M*(N-1)/2. Note that if you instead are generating random numbers in [1,N], the mean instead is M*(N+1)/2.

The variance in either case is M*(N^2-1)/12. The standard deviation is the square root of that.

Using that information you have everything you need to generate normal distribution random numbers of a specified mean and variance.

Thanks to @fahickman for the help on calculating mean and variance of dice roll sums.

Code

Here is the source code I used to generate the data which was used to generate the graphs in this post. There is also an implementation of the bit counting algorithm i mentioned, which converts to the desired mean and variance.

#define _CRT_SECURE_NO_WARNINGS

#include <array>
#include <random>
#include <stdint.h>
#include <stdio.h>
#include <limits>

const size_t c_maxNumSamples = 1000000;
const char* c_fileName = "results.csv";

template <size_t DiceRange, size_t DiceCount, size_t NumBuckets>
void DumpBucketCountsAddRandomNumbers (size_t numSamples, const std::array<size_t, NumBuckets>& bucketCounts)
{
    // open file for append if we can
    FILE* file = fopen(c_fileName, "a+t");
    if (!file)
        return;

    // write the info
    float mean = float(DiceCount) * float(DiceRange - 1.0f) / 2.0f;
    float variance = float(DiceCount) * (DiceRange * DiceRange) / 12.0f;
    if (numSamples == 1)
    {
        fprintf(file, "\"%zu random numbers [0,%zu) added together (sum %zud%zu). %zu buckets.  Mean = %0.2f.  Variance = %0.2f.  StdDev = %0.2f.\"\n", DiceCount, DiceRange, DiceCount, DiceRange, NumBuckets, mean, variance, std::sqrt(variance));
        fprintf(file, "\"\"");
        for (size_t i = 0; i < NumBuckets; ++i)
            fprintf(file, ",\"%zu\"", i);
        fprintf(file, "\n");
    }
    fprintf(file, "\"%zu samples\",", numSamples);

    // report the samples
    for (size_t count : bucketCounts)
        fprintf(file, "\"%zu\",", count);

    fprintf(file, "\"\"\n");
    if (numSamples == c_maxNumSamples)
        fprintf(file, "\n");

    // close file
    fclose(file);
}

template <size_t DiceSides, size_t DiceCount>
void AddRandomNumbersTest ()
{
    std::mt19937 rng;
    rng.seed(std::random_device()());
    std::uniform_int_distribution<size_t> dist(size_t(0), DiceSides - 1);

    std::array<size_t, (DiceSides - 1) * DiceCount + 1> bucketCounts = { 0 };

    size_t nextDump = 1;
    for (size_t i = 0; i < c_maxNumSamples; ++i)
    {
        size_t sum = 0;
        for (size_t j = 0; j < DiceCount; ++j)
            sum += dist(rng);

        bucketCounts[sum]++;

        if (i + 1 == nextDump)
        {
            DumpBucketCountsAddRandomNumbers<DiceSides, DiceCount>(nextDump, bucketCounts);
            nextDump *= 10;
        }
    }
}

template <size_t NumBuckets>
void DumpBucketCountsCountBits (size_t numSamples, const std::array<size_t, NumBuckets>& bucketCounts)
{
    // open file for append if we can
    FILE* file = fopen(c_fileName, "a+t");
    if (!file)
        return;

    // write the info
    float mean = float(NumBuckets-1) * 1.0f / 2.0f;
    float variance = float(NumBuckets-1) * 3.0f / 12.0f;
    if (numSamples == 1)
    {
        fprintf(file, "\"%zu random bits (coin flips) added together. %zu buckets.  Mean = %0.2f.  Variance = %0.2f.  StdDev = %0.2f.\"\n", NumBuckets - 1, NumBuckets, mean, variance, std::sqrt(variance));
        fprintf(file, "\"\"");
        for (size_t i = 0; i < NumBuckets; ++i)
            fprintf(file, ",\"%zu\"", i);
        fprintf(file, "\n");
    }
    fprintf(file, "\"%zu samples\",", numSamples);

    // report the samples
    for (size_t count : bucketCounts)
        fprintf(file, "\"%zu\",", count);

    fprintf(file, "\"\"\n");
    if (numSamples == c_maxNumSamples)
        fprintf(file, "\n");

    // close file
    fclose(file);
}

template <size_t NumBits> // aka NumCoinFlips!
void CountBitsTest ()
{

    size_t maxValue = 0;
    for (size_t i = 0; i < NumBits; ++i)
        maxValue = (maxValue << 1) | 1;

    std::mt19937 rng;
    rng.seed(std::random_device()());
    std::uniform_int_distribution<size_t> dist(0, maxValue);

    std::array<size_t, NumBits + 1> bucketCounts = { 0 };

    size_t nextDump = 1;
    for (size_t i = 0; i < c_maxNumSamples; ++i)
    {
        size_t sum = 0;
        size_t number = dist(rng);
        while (number)
        {
            if (number & 1)
                ++sum;
            number = number >> 1;
        }

        bucketCounts[sum]++;

        if (i + 1 == nextDump)
        {
            DumpBucketCountsCountBits(nextDump, bucketCounts);
            nextDump *= 10;
        }
    }
}

float GenerateNormalRandomNumber (float mean, float variance)
{
    static std::mt19937 rng;
    static std::uniform_int_distribution<uint64_t> dist(0, (uint64_t)-1);

    static bool seeded = false;
    if (!seeded)
    {
        seeded = true;
        rng.seed(std::random_device()());
    }

    // generate our normal distributed random number from 0 to 65.
    // 
    float sum = 0.0f;
    uint64_t number = dist(rng);
    while (number)
    {
        if (number & 1)
            sum += 1.0f;
        number = number >> 1;
    }

    // convert from: mean 32, variance 16, stddev 4
    // to: mean 0, variance 1, stddev 1
    float ret = sum;
    ret -= 32.0f;
    ret /= 4.0f;

    // convert to the specified mean and variance
    ret *= std::sqrt(variance);
    ret += mean;
    return ret;
}

void VerifyGenerateNormalRandomNumber (float mean, float variance)
{
    // open file for append if we can
    FILE* file = fopen(c_fileName, "a+t");
    if (!file)
        return;

    // write info
    fprintf(file, "\"Normal Distributed Random Numbers. mean = %0.2f.  variance = %0.2f.  stddev = %0.2f\"\n", mean, variance, std::sqrt(variance));

    // write some random numbers
    fprintf(file, "\"100 numbers\"");
    for (size_t i = 0; i < 100; ++i)
        fprintf(file, ",\"%f\"", GenerateNormalRandomNumber(mean, variance));
    fprintf(file, "\n\n");

    // close file
    fclose(file);
}

int main (int argc, char **argv)
{
    // clear out the file
    FILE* file = fopen(c_fileName, "w+t");
    if (file)
        fclose(file);

    // coin flips
    {
        // flip a fair coin 
        AddRandomNumbersTest<2, 1>();

        // flip two coins and sum them
        AddRandomNumbersTest<2, 2>();

        // sum 3 coin flips
        AddRandomNumbersTest<2, 3>();

        // sum 100 coin flips
        AddRandomNumbersTest<2, 100>();
    }

    // dice rolls
    {
        // roll a 4 sided die
        AddRandomNumbersTest<4, 1>();

        // sum two 4 sided dice
        AddRandomNumbersTest<4, 2>();

        // sum three 4 sided dice
        AddRandomNumbersTest<4, 3>();

        // sum one hundred 4 sided dice
        AddRandomNumbersTest<4, 100>();

        // sum one hundred 6 sided dice
        AddRandomNumbersTest<6, 100>();
    }

    CountBitsTest<8>();
    CountBitsTest<16>();
    CountBitsTest<32>();
    CountBitsTest<64>();

    VerifyGenerateNormalRandomNumber(0.0f, 20.0f);

    VerifyGenerateNormalRandomNumber(0.0f, 10.0f);

    VerifyGenerateNormalRandomNumber(5.0f, 10.0f);

    return 0;
}

WebGL PBR Implementation

Just want to see the demo? Click the link below. Warning: it loads quite a few images, some of which are ~10MB, so may take some time to load (it does report loading progress though):

http://demofox.org/WebGLPBR/

More Info

There is a great PBR (Physically Based Rendering) tutorial at: https://learnopengl.com/#!PBR/Theory

I followed that tutorial, making a WebGL PBR implementation as I went, but also making some C++ for pre-integrating diffuse and specular IBL (Image Based Lighting) and making the splitsum texture.

Pre-integrating the diffuse and specular (and using the splitsum texture) allows you to use an object’s surroundings as light sources, which is more in line with how real life works; we don’t just have point lights and directional lights in the real world, we have objects that glow because they are illuminated by light sources, and we have light sources which are in odd shapes.

It’s possible that there are one or more math errors or bugs in the C++ as well as my WebGL PBR implementation. At some point in the future I’ll dig deeper into the math of PBR and try and write up some simple blog posts about it, at which point I’ll be more confident about correctness other than “well, it looks right…”.

The source code for the C++ pre-integrations are on github:
IBL Diffuse Cube Map Integration
IBL Specular Cube Map Integration + Split Sum Texture

The WebGL PBR implementation is also on github:
WebGLPBR

Here are some screenshots:





Links

Learn WebGL2:

https://webgl2fundamentals.org/

Free PBR Materials:

http://freepbr.com/materials/rusted-iron-pbr-metal-material-alt/

PBR Links:

http://blog.selfshadow.com/publications/s2014-shading-course/frostbite/s2014_pbs_frostbite_slides.pdf

https://learnopengl.com/#!PBR/Theory

http://renderwonk.com/publications/s2010-shading-course/hoffman/s2010_physically_based_shading_hoffman_b_notes.pdf

https://disney-animation.s3.amazonaws.com/library/s2012_pbs_disney_brdf_notes_v2.pdf

http://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_slides.pdf

A Tool To Debug Teams (Knoster)

In the professional world, programmers work in teams as a rule, with very few exceptions.

For the programmers aiming to remain programmers, and not going into management, we are often focused on our specific trade or area of expertise though, and so we spend less time learning about or thinking about what makes a team successful.

We learn some from personal experience – realizing that certain things are bad for a team many times by seeing the failures manifest in front of us – but we are definitely more likely to pick up a book on algorithms than we are a book on team management.

My mother in law is the opposite however, as part of what she does is mentor people to being leaders of teams and large organizations, and also consults to organizations in the field of education to fix budgetary and organizational problems they may be having.

She showed me an interesting chart the other day that is really eye opening. It’s a formalized look at how to identify some things that may be going wrong with a team.

The chart itself is from the educational sector (Tim Knoster in ~1990), and is meant to be used to “Manage Complex Change”, but looking at it, and having been a professional programmer for 16 years, it is definitely applicable to any team.

The chart is valuable whether you are leading a team, part of a team, or observing a team you are not a part of.

How you use this chart is you look on the right side to see what sort of problems your team may be having: confusion, anxiety, resistance, frustration, or false starts.

From there you scan left until you find the black box. That box is the element missing which is causing the problem for the team.

That’s all there is to it, it’s pretty simple. It actually seems like pretty obvious stuff too in hindsight, but I wouldn’t have been able to formalize something like that.

Obviously not every situation can be boiled down into a simple chart like this, and there are variations of this chart including more or different rows and columns, but this is a good start at trying to “debug a team” to figure out the source of an issue.

Want more details? Here are some links:

http://www.belb.org.uk/downloads/rc_knoster_managing_complex_change.pdf

http://www.d11.org/LRS/PersonalizedLearning/Documents/KnosterMANAGINGCOMPLEXCHANGE.pdf

http://nebula.wsimg.com/90f9e490329402583fea599cad009bb0?AccessKeyId=8AAC8D005153628DDDFA&disposition=0&alloworigin=1

Why Are Some Shadows Soft And Other Shadows Hard?

This is a quick post on why some shadows have soft edges, and other shadows have hard edges.

The picture below looks pretty normal right?

Let’s zoom into the shadows on the ground:

The shadows of the circular platforms on the right are sharp, but get softer as they go to left.

Here you can see a similar effect with a light post, where the shadow is sharp on the left and soft on the right (click these images to zoom in if you want to):

And lastly you can see that the plants in this picture have a sharp shadow (and so does the curb), while the trees above it (out of the picture) cast a soft shadow:

Why are some shadows soft and some shadows hard?

The crux of what is going on here is that shadows that are nearer to the objects casting the shadow are sharper. Shadows that are farther from the objects casting the shadow are softer.

More plainly: Things closer to the ground have sharper edged shadows.

Go have another look at the pictures if you want (click them to see them full sized) and see how distance from the ground affects the sharpness of the shadow’s edge.

Why Does This Happen?

The reason this happens is actually pretty simple. Let’s look at the problem in 2d where we have a light source (the sun), the ground to cast a shadow on, and an object casting a shadow:

Now let’s think about where the ground would be completely in shadow. We can draw a line where all the ground to the left is completely in shadow. This is the point where all the ground to the right can “see” the sun, but all the ground to the left cannot see it. This area is called the “umbra” which is latin for shadow.

Now let’s think about where the ground would be completely lit up. We can draw a line where all the ground to the right is completely lit up by the sun. This is the point where all the right to the right can “see” the sun completely, but all the ground to the left has some amount of the sun obscured, so can only see some of the sun if any of it.

This leaves us with the area in the middle of the two where the ground can see some of the sun, but not the whole sun. This area is called the “penumbra”, which in latin literally means “almost shadow”. (You may remember the “pen” prefix from peninsula which is also latin, meaning “almost an island”)

So the penumbra is where the soft edge of a shadow is, but how is this related to distance?

Here is the situation when the shadow casting (brown) object gets closer to the ground. Note how the penumbra is a lot smaller.

Here it is when the shadow casting (brown) object gets farther away from the ground. Note how the penumbra gets larger!

Distance isn’t the only thing that can affect penumbra size though. Here you can see that a larger light makes a larger penumbra.

Here you can see how a smaller light makes a smaller penumbra.

If a light was infinitely small (a point light), it would not make a soft shadow edge, no matter how far or close the shadow was to the thing casting the shadow. While point lights do exist in computer graphics, you likely would still want to make a soft shadow for them if you are able to, as point lights can’t exist in real life.

If you’ve never noticed this property of shadows before, you will probably never be able to un-see this.

This is what it’s like being a graphics programmer (or an artist, photographer, etc, I’m sure!) – looking at and understanding how things like this work completely changes how you see the world. Lately, everywhere I look, I’m checking out the reflections and thinking about SSR (screen space reflections). Just check out the cool reflections below, that you probably didn’t even think anything of when you first saw the picture!

SIMD / GPU Friendly Branchless Binary Search

The other day I was thinking about how you might do a binary search branchlessly. I came up with a way, and I’m pretty sure I’m not the first to come up with it, but it was fun to think about and I wanted to share my solution.

Here it is searching a list of 8 items in 3 steps:

size_t BinarySearch8 (size_t needle, const size_t haystack[8])
{
    size_t ret = (haystack[4] <= needle) ? 4 : 0;
    ret += (haystack[ret + 2] <= needle) ? 2 : 0;
    ret += (haystack[ret + 1] <= needle) ? 1 : 0;
    return ret;
}

The three steps it does are:

  1. The list has 8 items in it. We test index 4 to see if we need to continue searching index 0 to 3, or index 4 to 7. The returned index becomes either 0xx or 1xx.
  2. The list has 4 items in it now. We test index 2 to see if we need to continue searching index 0 to 1, or index 2 to 3. The returned index becomes one of the following: 00x, 01x, 10x or 11x.
  3. The list has 2 items in it. We test index 1 to see if we need to take the left item or the right item. The returned index becomes: 000, 001, 010, 011, 100, 101, 110, or 111.

But Big O Complexity is Worse!

Usually a binary search can take up to O(log N) steps, where N is the number of items in the list. In this post’s solution, it always takes log_2N steps.

It probably seems odd that this branchless version could be considered an improvement when it has big O complexity that is always the worst case of a regular binary search. That is strange, but in the world of SIMD and shader programs, going branchless can be a big win that is not captured by looking at big O complexity. (Note that cache coherancy and thread contention are two other things not captured by looking at big O complexity).

Also, when working in something like video games or other interactive simulations, an even frame rate is more important than a high frame rate for making a game look and feel smooth. Because of this, if you have algorithms that have very fast common cases but much slower worst cases, you may actually prefer to use an algorithm that is slower in the common case but faster in the worst case just to keep performance more consistent. Using an algorithm such as this, which has a constant amount of work regardless of input can be a good trade off there.

Lastly, in cryptographic applications, attackers can gather secret information by seeing how long certain operations take. For instance, if you use a shorter password than other people, an attacker may be able to detect that by seeing that it consistently takes you a little bit less time to login than other people. They now have an idea of the length of your password, and maybe will brute force you, knowing that you are low hanging fruit!

These timing based attacks can be thwarted by algorithms which run at a constant time regardless of input. This algorithm is one of those algorithms.

As an example of another algorithm that runs in constant time regardless of input, check out CORDIC math. I really want to write up a post on that someday, it’s pretty cool stuff.

Traditional Binary Searching

You might have noticed that if the item you are searching for isn’t in the list, the function doesn’t return anything indicating that, and you might think that’s strange.

This function actually just returns the largest index that isn’t greater than the value you are searching for. If all the numbers are greater than the value you are searching for, it returns zero.

This might seem odd but this can actually come in handy if the list you are searching represents something like animation data, where there are keyframes sorted by time, and you want to find which two keyframes you are between so that you can interpolate.

To actually test if your value was in the list, you could do an extra check:

    size_t searchValue = 3;
    size_t index = BinarySearch8(searchValue, list);
    bool found = (list[index] == searchValue);

If you need that extra check, it’s easy enough to add, and if you don’t need that extra check, it’s nice to not have it.

Without Ternary Operator

If in your setup you don’t have a ternary operator, or if the ternary operator isn’t branchless for you, you get the same results using multiplication:

size_t BinarySearch8 (size_t needle, const size_t haystack[8])
{
    size_t ret = (haystack[4] <= needle) * 4;
    ret += (haystack[ret + 2] <= needle) * 2;
    ret += (haystack[ret + 1] <= needle) * 1;
    return ret;
}

Note that on some platforms, the less than or equal test will be a branch! None of the platforms or compilers I tested had that issue but if you find yourself hitting that issue, you can do a branchless test via subtraction or similar.

Here is a godbolt link that lets you view the assembly for various compilers. When you open the link you’ll see clang doing this work branchlessly.
View Assembly

@adamjmiles from twitter also verified that GCN does it branchlessly, which you can see at the link below. Thanks for that!
View GCN Assembly

Something to keep in mind for the non GPU case though is that if you were doing this in SIMD, you’d be using SIMD intrinsics.

Larger Lists

It’s trivial to search larger numbers of values. Here it is searching 16 items in 4 steps:

size_t BinarySearch16 (size_t needle, const size_t haystack[16])
{
    size_t ret = (haystack[8] <= needle) ? 8 : 0;
    ret += (haystack[ret + 4] <= needle) ? 4 : 0;
    ret += (haystack[ret + 2] <= needle) ? 2 : 0;
    ret += (haystack[ret + 1] <= needle) ? 1 : 0;
    return ret;
}

And here it is searching 32 items in 5 steps:

size_t BinarySearch32 (size_t needle, const size_t haystack[32])
{
    size_t ret = (haystack[16] <= needle) ? 16 : 0;
    ret += (haystack[ret + 8] <= needle) ? 8 : 0;
    ret += (haystack[ret + 4] <= needle) ? 4 : 0;
    ret += (haystack[ret + 2] <= needle) ? 2 : 0;
    ret += (haystack[ret + 1] <= needle) ? 1 : 0;
    return ret;
}

Non Power of 2 Lists

Let’s say that your list is not a perfect power of two in length. GASP!

You can still use the technique, but you treat it as if it has the next power of 2 up items, and then make sure your indices stay in range. The nice part here is that you don’t have to do extra work on the index at each step of the way, only in the places where it’s possible for the index to go out of range.

Here it is searching an array of size 7 in 3 steps:

size_t BinarySearch7 (size_t needle, const size_t haystack[7])
{
    size_t ret = 0;
    size_t testIndex = 0;

    // test index is at most 4, so is within range.
    testIndex = ret + 4;
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    // test index is at most 6, so is within range.
    testIndex = ret + 2;
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    // test index is at most 7, so could be out of range.
    // use min() to make sure the index stays in range.
    testIndex = std::min<size_t>(ret + 1, 6);
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    return ret;
}

There are some other techniques for dealing with non power of 2 sized lists that you can find in the links at the bottom, but there was one particularly interesting that my friend and ex boss James came up with.

Basically, you start out with something like this if you were searching a list of 7 items:

    // 7 because the list has 7 items in it.
    // 4 because it's half of the next power of 2 that is >= 7.
    ret = (haystack[4] <= needle) * (7-4);

The result is that instead of having ret go to either 0 or 4, it goes to 0 or 3.

From there, in both cases you have 4 items in your sublist remaining, so you don’t need to worry about the index going out of bounds from that point on.

Code

Here’s some working code demonstrating the ideas above, as well as it’s output.

#include <algorithm>
#include <stdlib.h>

size_t BinarySearch8 (size_t needle, const size_t haystack[8])
{
    // using ternary operator
    size_t ret = (haystack[4] <= needle) ? 4 : 0;
    ret += (haystack[ret + 2] <= needle) ? 2 : 0;
    ret += (haystack[ret + 1] <= needle) ? 1 : 0;
    return ret;
}

size_t BinarySearch8b (size_t needle, const size_t haystack[8])
{
    // using multiplication
    size_t ret = (haystack[4] <= needle) * 4;
    ret += (haystack[ret + 2] <= needle) * 2;
    ret += (haystack[ret + 1] <= needle) * 1;
    return ret;
}

size_t BinarySearch7 (size_t needle, const size_t haystack[7])
{
    // non perfect power of 2.  use min() to keep it from going out of bounds.
    size_t ret = 0;
    size_t testIndex = 0;

    // test index is 4, so is within range.
    testIndex = ret + 4;
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    // test index is at most 6, so is within range.
    testIndex = ret + 2;
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    // test index is at most 7, so could be out of range.
    // use min() to make sure the index stays in range.
    testIndex = std::min<size_t>(ret + 1, 6);
    ret = (haystack[testIndex] <= needle) ? testIndex : ret;

    return ret;
}

int main (int argc, char **argv)
{
    // search a list of size 8
    {
        // show the data
        printf("Seaching through a list with 8 items:\n");
        size_t data[8] = { 1, 3, 5, 6, 9, 11, 15, 21 };
        printf("data = [");
        for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); ++i)
        {
            if (i > 0)
                printf(", ");
            printf("%zu", data[i]);
        }
        printf("]\n");

        // do some searches on it using ternary operation based function
        printf("\nTernary based searches:\n");
        #define FIND(needle) printf("Find " #needle ": index = %zu, value = %zu, found = %s\n", BinarySearch8(needle, data), data[BinarySearch8(needle, data)], data[BinarySearch8(needle, data)] == needle ? "true" : "false");
        FIND(2);
        FIND(3);
        FIND(0);
        FIND(22);
        FIND(16);
        FIND(15);
        FIND(21);
        #undef FIND

        // do some searches on it using multiplication based function
        printf("\nMultiplication based searches:\n");
        #define FIND(needle) printf("Find " #needle ": index = %zu, value = %zu, found = %s\n", BinarySearch8b(needle, data), data[BinarySearch8b(needle, data)], data[BinarySearch8b(needle, data)] == needle ? "true" : "false");
        FIND(2);
        FIND(3);
        FIND(0);
        FIND(22);
        FIND(16);
        FIND(15);
        FIND(21);
        #undef FIND

        printf("\n\n\n\n");
    }

    // search a list of size 7
    {
        // show the data
        printf("Seaching through a list with 7 items:\n");
        size_t data[7] = { 1, 3, 5, 6, 9, 11, 15};
        printf("data = [");
        for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); ++i)
        {
            if (i > 0)
                printf(", ");
            printf("%zu", data[i]);
        }
        printf("]\n");

        // do some searches on it using ternary operation based function
        printf("\nTernary based searches:\n");
        #define FIND(needle) printf("Find " #needle ": index = %zu, value = %zu, found = %s\n", BinarySearch7(needle, data), data[BinarySearch7(needle, data)], data[BinarySearch7(needle, data)] == needle ? "true" : "false");
        FIND(2);
        FIND(3);
        FIND(0);
        FIND(22);
        FIND(16);
        FIND(15);
        FIND(21);
        #undef FIND

        printf("\n\n\n\n");
    }

    system("pause");
    return 0;
}

Closing

Another facet of binary searching is that it isn’t the most cache friendly algorithm out there. There might be some value in combining the above with the information in the link below.

Cache-friendly binary search

If you like this sort of thing, here is an interesting paper from this year (2017):
Array Layouts For Comparison-Based Searching

And further down the rabbit hole a bit, this talks about re-ordering the search array to fit things into a cache line better:
https://people.mpi-inf.mpg.de/~rgemulla/publications/schlegel09search.pdf

Taking the next step is Intel and Oracle’s FAST paper:
http://www.timkaldewey.de/pubs/FAST__TODS11.pdf

Florian Gross from twitch made me aware of the last two links and also mentioned his master’s these in this area (thank you Florian!):
https://www.researchgate.net/profile/Florian_Gross/publication/275971053_Index_Search_Algorithms_for_Databases_and_Modern_CPUs/links/554cffca0cf29f836c9cd539.pdf

@rygorous mentioned on twitter some improvements such as ternary and quaternary search, as well as a way to handle the case of non power of 2 sized lists without extra index checks:
https://twitter.com/rygorous/status/877418592752488449/photo/1

Thanks to everyone who gave feedback. It’s a very interesting topic, of which this post only seems to scratch this surface!

Hopefully you found this interesting. Questions, comments, corrections, let me know!

When Random Numbers Are Too Random: Low Discrepancy Sequences

Random numbers can be useful in graphics and game development, but they have a pesky and sometimes undesirable habit of clumping together.

This is a problem in path tracing and monte carlo integration when you take N samples, but the samples aren’t well spread across the sampling range.

This can also be a problem for situations like when you are randomly placing objects in the world or generating treasure for a treasure chest. You don’t want your randomly placed trees to only be in one part of the forest, and you don’t want a player to get only trash items or only godly items when they open a treasure chest. Ideally you want to have some randomness, but you don’t want the random number generator to give you all of the same or similar random numbers.

The problem is that random numbers can be TOO random, like in the below where you can see clumps and large gaps between the 100 samples.

For cases like that, when you want random numbers that are a little bit more well distributed, you might find some use in low discrepancy sequences.

The standalone C++ code (one source file, standard headers, no libraries to link to) I used to generate the data and images are at the bottom of this post, as well as some links to more resources.

What Is Discrepancy?

In this context, discrepancy is a measurement of the highest or lowest density of points in a sequence. High discrepancy means that there is either a large area of empty space, or that there is an area that has a high density of points. Low discrepancy means that there are neither, and that your points are more or less pretty evenly distributed.

The lowest discrepancy possible has no randomness at all, and in the 1 dimensional case means that the points are evenly distributed on a grid. For monte carlo integration and the game dev usage cases I mentioned, we do want some randomness, we just want the random points to be spread out a little more evenly.

If more formal math notation is your thing, discrepancy is defined as:
D_{N}(P)=\sup _{{B\in J}}\left|{\frac  {A(B;P)}{N}}-\lambda _{s}(B)\right|

You can read more about the formal definition here: Wikipedia:
Equidistributed sequence

For monte carlo integration specifically, this is the behavior each thing gives you:

  • High Discrepancy: Random Numbers / White Noise aka Uniform Distribution – At lower sample counts, convergance is slower (and have higher variance) due to the possibility of not getting good coverage over the area you integrating. At higher sample counts, this problem disappears. (Hint: real time graphics and preview renderings use a smaller number of samples)
  • Lowest Discrepancy: Regular Grid – This will cause aliasing, unlike the other “random” based sampling, which trade aliasing for noise. Noise is preferred over aliasing.
  • Low Discrepancy: Low Discrepancy Sequences – In lower numbers of samples, this will have faster convergence by having better coverage of the sampling space, but will use randomness to get rid of aliasing by introducing noise.

Also interesting to note, Quasi Monte Carlo has provably better asymptotic convergence than regular monte carlo integration.

1 Dimensional Sequences

We’ll first look at 1 dimensional sequences.

Grid

Here are 100 samples evenly spaced:

Random Numbers (White Noise)

This is actually a high discrepancy sequence. To generate this, you just use a standard random number generator to pick 100 points between 0 and 1. I used std::mt19937 with a std::uniform_real_distribution from 0 to 1:

Subrandom Numbers

Subrandom numbers are ways to decrease the discrepancy of white noise.

One way to do this is to break the sampling space in half. You then generate even numbered samples in the first half of the space, and odd numbered samples in the second half of the space.

There’s no reason you can’t generalize this into more divisions of space though.

This splits the space into 4 regions:

8 regions:

16 regions:

32 regions:

There are other ways to generate subrandom numbers though. One way is to generate random numbers between 0 and 0.5, and add them to the last sample, plus 0.5. This gives you a random walk type setup.

Here is that:

Uniform Sampling + Jitter

If you take the first subrandom idea to the logical maximum, you break your sample space up into N sections and place one point within those N sections to make a low discrepancy sequence made up of N points.

Another way to look at this is that you do uniform sampling, but add some random jitter to the samples, between +/- half a uniform sample size, to keep the samples in their own areas.

This is that:

I have heard that Pixar invented this technique interestingly.

Irrational Numbers

Rational numbers are numbers which can be described as fractions, such as 0.75 which can be expressed as 3/4. Irrational numbers are numbers which CANNOT be described as fractions, such as pi, or the golden ratio, or the square root of a prime number.

Interestingly you can use irrational numbers to generate low discrepancy sequences. You start with some value (could be 0, or could be a random number), add the irrational number, and modulus against 1.0. To get the next sample you add the irrational value again, and modulus against 1.0 again. Rinse and repeat until you get as many samples as you want.

Some values work better than others though, and apparently the golden ratio is provably the best choice (1.61803398875…), says Wikipedia.

Here is the golden ratio, using 4 different random (white noise) starting values:



Here I’ve used the square root of 2, with 4 different starting random numbers again:




Lastly, here is pi, with 4 random starting values:




Van der Corput Sequence

The Van der Corput sequence is the 1d equivelant of the Halton sequence which we’ll talk about later.

How you generate values in the Van der Corput sequence is you convert the index of your sample into some base.

For instance if it was base 2, you would convert your index to binary. If it was base 16, you would convert your index to hexadecimal.

Now, instead of treating the digits as if they are B^0, B^1, B^2, etc (where B is the base), you instead treat them as B^{-1}, B^{-2}, B^{-3} and so on. In other words, you multiply each digit by a fraction and add up the results.

To show a couple quick examples, let’s say we wanted sample 6 in the sequence of base 2.

First we convert 6 to binary which is 110. From right to left, we have 3 digits: a 0 in the 1’s place, a 1 in the 2’s place, and a 1 in the 4’s place. 0*1 + 1*2 + 1*4 = 6, so we can see that 110 is in fact 6 in binary.

To get the Van der Corput value for this, instead of treating it as the 1’s, 2’s and 4’s digit, we treat it as the 1/2, 1/4 and 1/8’s digit.

0 * 1/2 + 1 * 1/4 + 1 * 1/8 = 3/8.

So, sample 6 in the Van der Corput sequence using base 2 is 3/8.

Let’s try sample 21 in base 3.

First we convert 21 to base 3 which is 210. We can verify this is right by seeing that 0 * 1 + 1 * 3 + 2 * 9 = 21.

Instead of a 1’s, 3’s and 9’s digit, we are going to treat it like a 1/3, 1/9 and 1/27 digit.

0 * 1/3 + 1 * 1/9 + 2 * 1/27 = 5/27

So, sample 21 in the Van der Corput sequence using base 3 is 5/27.

Here is the Van der Corput sequence for base 2:

Here it is for base 3:

Base 4:

Base 5:

Sobol

One dimensional Sobol is actually just the Van der Corput sequence base 2 re-arranged a little bit, but it’s generated differently.

You start with 0 (either using it as sample 0 or sample -1, doesn’t matter which), and for each sample you do this:

  1. Calculate the Ruler function value for the current sample’s index(more info in a second)
  2. Make the direction vector by shifting 1 left (in binary) 31 – ruler times.
  3. XOR the last sample by the direction vector to get the new sample
  4. To interpret the sample as a floating point number you divide it by 2^{32}

That might sound completely different than the Van der Corput sequence but it actually is the same thing – just re-ordered.

In the final step when dividing by 2^{32}, we are really just interpreting the binary number as a fraction just like before, but it’s the LEFT most digit that is the 1/2 spot, not the RIGHT most digit.

The Ruler Function goes like: 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, …

It’s pretty easy to calculate too. Calculating the ruler function for an index (starting at 1) is just the zero based index of the right most 1’s digit after converting the number to binary.

1 in binary is 001 so Ruler(1) is 0.
2 in binary is 010 so Ruler(2) is 1.
3 in binary is 011 so Ruler(3) is 0.
4 in binary is 100 so Ruler(4) is 2.
5 in binary is 101 so Ruler(5) is 0.
and so on.

Here is 1D Sobol:

Hammersley

In one dimension, the Hammersley sequence is the same as the base 2 Van der Corput sequence, and in the same order. If that sounds strange that it’s the same, it’s a 2d sequence I broke down into a 1d sequence for comparison. The one thing Hammersley has that makes it unique in the 1d case is that you can truncate bits.

It doesn’t seem that useful for 1d Hammersley to truncate bits but knowing that is useful info too I guess. Look at the 2d version of Hammersley to get a fairer look at it, because it’s meant to be a 2d sequence.

Here is Hammersley:

With 1 bit truncated:

With 2 bits truncated:

Poisson Disc

Poisson disc points are points which are densely packed, but have a minimum distance from each other.

Computer scientists are still working out good algorithms to generate these points efficiently.

I use “Mitchell’s Best-Candidate” which means that when you want to generate a new point in the sequence, you generate N new points, and choose whichever point is farthest away from the other points you’ve generated so far.

Here it is where N is 100:

2 Dimensional Sequences

Next up, let’s look at some 2 dimensional sequences.

Grid

Below is 2d uniform samples on a grid.

Note that uniform grid is not particularly low discrepancy for the 2d case! More info here: Is it expected that uniform points would have non zero discrepancy?

Random

Here are 100 random points:

Uniform Grid + Jitter

Here is a uniform grid that has random jitter applied to the points. Jittered grid is a pretty commonly used low discrepancy sampling technique that has good success.

Subrandom

Just like in 1 dimensions, you can apply the subrandom ideas to 2 dimensions where you divide the X and Y axis into so many sections, and randomly choose points in the sections.

If you divide X and Y into the same number of sections though, you are going to have a problem because some areas are not going to have any points in them.

@Reedbeta pointed out that instead of using i%x and i%y, that you could use i%x and (i/x)%y to make it pick points in all regions.

Picking different numbers for X and Y can be another way to give good results. Here’s dividing X and Y into 2 and 3 sections respectively:

If you choose co-prime numbers for divisions for each axis you can get maximal period of repeats. 2 and 3 are coprime so the last example is a good example of that, but here is 3 and 11:

Here is 3 and 97. 97 is large enough that with only doing 100 samples, we are almost doing jittered grid on the y axis.

Here is the other subrandom number from 1d, where we start with a random value for X and Y, and then add a random number between 0 and 0.5 to each, also adding 0.5, to make a “random walk” type setup again:

Halton

The Halton sequence is just the Van der Corput sequence, but using a different base on each axis.

Here is the Halton sequence where X and Y use bases 2 and 3:

Here it is using bases 5 and 7:

Here are bases 13 and 9:

Irrational Numbers

The irrational numbers technique can be used for 2d as well but I wasn’t able to find out how to make it give decent looking output that didn’t have an obvious diagonal pattern in them. Bart Wronski shared a neat paper that explains how to use the golden ratio in 2d with great success: Golden Ratio Sequences For Low-Discrepancy Sampling

This uses the golden ratio for the X axis and the square root of 2 for the Y axis. Below that is the same, with a random starting point, to make it give a different sequence.

Here X axis uses square root of 2 and Y axis uses square root of 3. Below that is a random starting point, which gives the same discrepancy.

Hammersley

In 2 dimensions, the Hammersley sequence uses the 1d Hammersley sequence for the X axis: Instead of treating the binary version of the index as binary, you treat it as fractions like you do for Van der Corput and sum up the fractions.

For the Y axis, you just reverse the bits and then do the same!

Here is the Hammersley sequence. Note we would have to take 128 samples (not just the 100 we did) if we wanted it to fill the entire square with samples.

Truncating bits in 2d is a bit useful. Here is 1 bit truncated:

2 bits truncated:

Poisson Disc

Using the same method we did for 1d, we can generate points in 2d space:

N Rooks

There is a sampling pattern called N-Rooks where you put N rooks onto a chess board and arrange them such that no two are in the same row or column.

A way to generate these samples is to realize that there will be only one rook per row, and that none of them will ever be in the same column. So, you make an array that has numbers 0 to N-1, and then shuffle the array. The index into the array is the row, and the value in the array is the column.

Here are 100 rooks:

Sobol

Sobol in two dimensions is more complex to explain so I’ll link you to the source I used: Sobol Sequences Made Simple.

The 1D sobol already covered is used for the X axis, and then something more complex was used for the Y axis:

Links

Bart Wronski has a really great series on a related topic: Dithering in Games

Wikipedia: Low Discrepancy Sequence

Wikipedia: Halton Sequence

Wikipedia: Van der Corput Sequence

Using Fibonacci Sequence To Generate Colors

Deeper info and usage cases for low discrepancy sequences

Poisson-Disc Sampling

Low discrepancy sequences are related to blue noise. Where white noise contains all frequencies evenly, blue noise has more high frequencies and fewer low frequencies. Blue noise is essentially the ultimate in low discrepancy, but can be expensive to compute. Here are some pages on blue noise:

Free Blue Noise Textures

The problem with 3D blue noise

Stippling and Blue Noise

Vegetation placement in “The Witness”

Here are some links from @marc_b_reynolds:
Sobol (low-discrepancy) sequence in 1-3D, stratified in 2-4D.
Classic binary-reflected gray code.
Sobol.h

Weyl Sequence

Code

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers and performance counter.  Sorry non windows people!
#include <vector>
#include <stdint.h>
#include <random>
#include <array>
#include <algorithm>
#include <stdlib.h>
#include <set>

typedef uint8_t uint8;

#define NUM_SAMPLES 100  // to simplify some 2d code, this must be a square
#define NUM_SAMPLES_FOR_COLORING 100

// Turning this on will slow things down significantly because it's an O(N^5) operation for 2d!
#define CALCULATE_DISCREPANCY 0

#define IMAGE1D_WIDTH 600
#define IMAGE1D_HEIGHT 50
#define IMAGE2D_WIDTH 300
#define IMAGE2D_HEIGHT 300
#define IMAGE_PAD   30

#define IMAGE1D_CENTERX ((IMAGE1D_WIDTH+IMAGE_PAD*2)/2)
#define IMAGE1D_CENTERY ((IMAGE1D_HEIGHT+IMAGE_PAD*2)/2)
#define IMAGE2D_CENTERX ((IMAGE2D_WIDTH+IMAGE_PAD*2)/2)
#define IMAGE2D_CENTERY ((IMAGE2D_HEIGHT+IMAGE_PAD*2)/2)

#define AXIS_HEIGHT 40
#define DATA_HEIGHT 20
#define DATA_WIDTH 2

#define COLOR_FILL SColor(255,255,255)
#define COLOR_AXIS SColor(0, 0, 0)

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};

struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    uint8 B, G, R;
};

//======================================================================================
bool SaveImage (const char *fileName, const SImageData &image)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
  
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
  
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
  
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
  
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
  
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
 
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_width);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
void ImageClear (SImageData& image, const SColor& color)
{
    uint8* row = &image.m_pixels[0];
    for (size_t rowIndex = 0; rowIndex < image.m_height; ++rowIndex)
    {
        SColor* pixels = (SColor*)row;
        std::fill(pixels, pixels + image.m_width, color);

        row += image.m_pitch;
    }
}

//======================================================================================
void ImageBox (SImageData& image, size_t x1, size_t x2, size_t y1, size_t y2, const SColor& color)
{
    for (size_t y = y1; y < y2; ++y)
    {
        uint8* row = &image.m_pixels[y * image.m_pitch];
        SColor* start = &((SColor*)row)[x1];
        std::fill(start, start + x2 - x1, color);
    }
}

//======================================================================================
float Distance (float x1, float y1, float x2, float y2)
{
    float dx = (x2 - x1);
    float dy = (y2 - y1);

    return std::sqrtf(dx*dx + dy*dy);
}

//======================================================================================
SColor DataPointColor (size_t sampleIndex)
{
    SColor ret;
    float percent = (float(sampleIndex) / (float(NUM_SAMPLES_FOR_COLORING) - 1.0f));

    ret.R = uint8((1.0f - percent) * 255.0f);
    ret.G = 0;
    ret.B = uint8(percent * 255.0f);

    float mag = (float)sqrt(ret.R*ret.R + ret.G*ret.G + ret.B*ret.B);
    ret.R = uint8((float(ret.R) / mag)*255.0f);
    ret.G = uint8((float(ret.G) / mag)*255.0f);
    ret.B = uint8((float(ret.B) / mag)*255.0f);

    return ret;
}

//======================================================================================
float RandomFloat (float min, float max)
{
    static std::random_device rd;
    static std::mt19937 mt(rd());
    std::uniform_real_distribution<float> dist(min, max);
    return dist(mt);
}

//======================================================================================
size_t Ruler (size_t n)
{
    size_t ret = 0;
    while (n != 0 && (n & 1) == 0)
    {
        n /= 2;
        ++ret;
    }
    return ret;
}

//======================================================================================
float CalculateDiscrepancy1D (const std::array<float, NUM_SAMPLES>& samples)
{
    // some info about calculating discrepancy
    // https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

    // Calculates the discrepancy of this data.
    // Assumes the data is [0,1) for valid sample range
    std::array<float, NUM_SAMPLES> sortedSamples = samples;
    std::sort(sortedSamples.begin(), sortedSamples.end());

    float maxDifference = 0.0f;
    for (size_t startIndex = 0; startIndex <= NUM_SAMPLES; ++startIndex)
    {
        // startIndex 0 = 0.0f.  startIndex 1 = sortedSamples[0]. etc

        float startValue = 0.0f;
        if (startIndex > 0)
            startValue = sortedSamples[startIndex - 1];

        for (size_t stopIndex = startIndex; stopIndex <= NUM_SAMPLES; ++stopIndex)
        {
            // stopIndex 0 = sortedSamples[0].  startIndex[N] = 1.0f. etc

            float stopValue = 1.0f;
            if (stopIndex < NUM_SAMPLES)
                stopValue = sortedSamples[stopIndex];

            float length = stopValue - startValue;

            // open interval (startValue, stopValue)
            size_t countInside = 0;
            for (float sample : samples)
            {
                if (sample > startValue &&
                    sample < stopValue)
                {
                    ++countInside;
                }
            }
            float density = float(countInside) / float(NUM_SAMPLES);
            float difference = std::abs(density - length);
            if (difference > maxDifference)
                maxDifference = difference;

            // closed interval [startValue, stopValue]
            countInside = 0;
            for (float sample : samples)
            {
                if (sample >= startValue &&
                    sample <= stopValue)
                {
                    ++countInside;
                }
            }
            density = float(countInside) / float(NUM_SAMPLES);
            difference = std::abs(density - length);
            if (difference > maxDifference)
                maxDifference = difference;
        }
    }
    return maxDifference;
}

//======================================================================================
float CalculateDiscrepancy2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples)
{
    // some info about calculating discrepancy
    // https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

    // Calculates the discrepancy of this data.
    // Assumes the data is [0,1) for valid sample range.

    // Get the sorted list of unique values on each axis
    std::set<float> setSamplesX;
    std::set<float> setSamplesY;
    for (const std::array<float, 2>& sample : samples)
    {
        setSamplesX.insert(sample[0]);
        setSamplesY.insert(sample[1]);
    }
    std::vector<float> sortedXSamples;
    std::vector<float> sortedYSamples;
    sortedXSamples.reserve(setSamplesX.size());
    sortedYSamples.reserve(setSamplesY.size());
    for (float f : setSamplesX)
        sortedXSamples.push_back(f);
    for (float f : setSamplesY)
        sortedYSamples.push_back(f);

    // Get the sorted list of samples on the X axis, for faster interval testing
    std::array<std::array<float, 2>, NUM_SAMPLES> sortedSamplesX = samples;
    std::sort(sortedSamplesX.begin(), sortedSamplesX.end(),
        [] (const std::array<float, 2>& itemA, const std::array<float, 2>& itemB)
        {
            return itemA[0] < itemB[0];
        }
    );

    // calculate discrepancy
    float maxDifference = 0.0f;
    for (size_t startIndexY = 0; startIndexY <= sortedYSamples.size(); ++startIndexY)
    {
        float startValueY = 0.0f;
        if (startIndexY > 0)
            startValueY = *(sortedYSamples.begin() + startIndexY - 1);

        for (size_t startIndexX = 0; startIndexX <= sortedXSamples.size(); ++startIndexX)
        {
            float startValueX = 0.0f;
            if (startIndexX > 0)
                startValueX = *(sortedXSamples.begin() + startIndexX - 1);

            for (size_t stopIndexY = startIndexY; stopIndexY <= sortedYSamples.size(); ++stopIndexY)
            {
                float stopValueY = 1.0f;
                if (stopIndexY < sortedYSamples.size())
                    stopValueY = sortedYSamples[stopIndexY];

                for (size_t stopIndexX = startIndexX; stopIndexX <= sortedXSamples.size(); ++stopIndexX)
                {
                    float stopValueX = 1.0f;
                    if (stopIndexX < sortedXSamples.size())
                        stopValueX = sortedXSamples[stopIndexX];

                    // calculate area
                    float length = stopValueX - startValueX;
                    float height = stopValueY - startValueY;
                    float area = length * height;

                    // open interval (startValue, stopValue)
                    size_t countInside = 0;
                    for (const std::array<float, 2>& sample : samples)
                    {
                        if (sample[0] > startValueX &&
                            sample[1] > startValueY &&
                            sample[0] < stopValueX &&
                            sample[1] < stopValueY)
                        {
                            ++countInside;
                        }
                    }
                    float density = float(countInside) / float(NUM_SAMPLES);
                    float difference = std::abs(density - area);
                    if (difference > maxDifference)
                        maxDifference = difference;

                    // closed interval [startValue, stopValue]
                    countInside = 0;
                    for (const std::array<float, 2>& sample : samples)
                    {
                        if (sample[0] >= startValueX &&
                            sample[1] >= startValueY &&
                            sample[0] <= stopValueX &&
                            sample[1] <= stopValueY)
                        {
                            ++countInside;
                        }
                    }
                    density = float(countInside) / float(NUM_SAMPLES);
                    difference = std::abs(density - area);
                    if (difference > maxDifference)
                        maxDifference = difference;
                }
            }
        }
    }

    return maxDifference;
}

//======================================================================================
void Test1D (const char* fileName, const std::array<float, NUM_SAMPLES>& samples)
{
    // create and clear the image
    SImageData image;
    ImageInit(image, IMAGE1D_WIDTH + IMAGE_PAD * 2, IMAGE1D_HEIGHT + IMAGE_PAD * 2);

    // setup the canvas
    ImageClear(image, COLOR_FILL);

    // calculate the discrepancy
    #if CALCULATE_DISCREPANCY
        float discrepancy = CalculateDiscrepancy1D(samples);
        printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
    #endif

    // draw the sample points
    size_t i = 0;
    for (float f: samples)
    {
        size_t pos = size_t(f * float(IMAGE1D_WIDTH)) + IMAGE_PAD;
        ImageBox(image, pos, pos + 1, IMAGE1D_CENTERY - DATA_HEIGHT / 2, IMAGE1D_CENTERY + DATA_HEIGHT / 2, DataPointColor(i));
        ++i;
    }

    // draw the axes lines. horizontal first then the two vertical
    ImageBox(image, IMAGE_PAD, IMAGE1D_WIDTH + IMAGE_PAD, IMAGE1D_CENTERY, IMAGE1D_CENTERY + 1, COLOR_AXIS);
    ImageBox(image, IMAGE_PAD, IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);
    ImageBox(image, IMAGE1D_WIDTH + IMAGE_PAD, IMAGE1D_WIDTH + IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);

    // save the image
    SaveImage(fileName, image);
}

//======================================================================================
void Test2D (const char* fileName, const std::array<std::array<float,2>, NUM_SAMPLES>& samples)
{
    // create and clear the image
    SImageData image;
    ImageInit(image, IMAGE2D_WIDTH + IMAGE_PAD * 2, IMAGE2D_HEIGHT + IMAGE_PAD * 2);
    
    // setup the canvas
    ImageClear(image, COLOR_FILL);

    // calculate the discrepancy
    #if CALCULATE_DISCREPANCY
        float discrepancy = CalculateDiscrepancy2D(samples);
        printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
    #endif

    // draw the sample points
    size_t i = 0;
    for (const std::array<float, 2>& sample : samples)
    {
        size_t posx = size_t(sample[0] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
        size_t posy = size_t(sample[1] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
        ImageBox(image, posx - 1, posx + 1, posy - 1, posy + 1, DataPointColor(i));
        ++i;
    }

    // horizontal lines
    ImageBox(image, IMAGE_PAD - 1, IMAGE2D_WIDTH + IMAGE_PAD + 1, IMAGE_PAD - 1, IMAGE_PAD, COLOR_AXIS);
    ImageBox(image, IMAGE_PAD - 1, IMAGE2D_WIDTH + IMAGE_PAD + 1, IMAGE2D_HEIGHT + IMAGE_PAD, IMAGE2D_HEIGHT + IMAGE_PAD + 1, COLOR_AXIS);

    // vertical lines
    ImageBox(image, IMAGE_PAD - 1, IMAGE_PAD, IMAGE_PAD - 1, IMAGE2D_HEIGHT + IMAGE_PAD + 1, COLOR_AXIS);
    ImageBox(image, IMAGE_PAD + IMAGE2D_WIDTH, IMAGE_PAD + IMAGE2D_WIDTH + 1, IMAGE_PAD - 1, IMAGE2D_HEIGHT + IMAGE_PAD + 1, COLOR_AXIS);

    // save the image
    SaveImage(fileName, image);
}

//======================================================================================
void TestUniform1D (bool jitter)
{
    // calculate the sample points
    const float c_cellSize = 1.0f / float(NUM_SAMPLES+1);
    std::array<float, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = float(i+1) / float(NUM_SAMPLES+1);
        if (jitter)
            samples[i] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
    }

    // save bitmap etc
    if (jitter)
        Test1D("1DUniformJitter.bmp", samples);
    else
        Test1D("1DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom1D ()
{
    // calculate the sample points
    const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
    std::array<float, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
        samples[i] = RandomFloat(0.0f, 1.0f);

    // save bitmap etc
    Test1D("1DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA1D (size_t numRegions)
{
    const float c_randomRange = 1.0f / float(numRegions);

    // calculate the sample points
    const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
    std::array<float, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = RandomFloat(0.0f, c_randomRange);
        samples[i] += float(i % numRegions) / float(numRegions);
    }

    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "1DSubRandomA_%zu.bmp", numRegions);
    Test1D(fileName, samples);
}

//======================================================================================
void TestSubRandomB1D ()
{
    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    float sample = RandomFloat(0.0f, 0.5f);
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        sample = std::fmodf(sample + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
        samples[i] = sample;
    }

    // save bitmap etc
    Test1D("1DSubRandomB.bmp", samples);
}

//======================================================================================
void TestVanDerCorput (size_t base)
{
    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = 0.0f;
        float denominator = float(base);
        size_t n = i;
        while (n > 0)
        {
            size_t multiplier = n % base;
            samples[i] += float(multiplier) / denominator;
            n = n / base;
            denominator *= base;
        }
    }

    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "1DVanDerCorput_%zu.bmp", base);
    Test1D(fileName, samples);
}

//======================================================================================
void TestIrrational1D (float irrational, float seed)
{
    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    float sample = seed;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        sample = std::fmodf(sample + irrational, 1.0f);
        samples[i] = sample;
    }

    // save bitmap etc
    char irrationalStr[256];
    sprintf(irrationalStr, "%f", irrational);
    char seedStr[256];
    sprintf(seedStr, "%f", seed);
    char fileName[256];
    sprintf(fileName, "1DIrrational_%s_%s.bmp", &irrationalStr[2], &seedStr[2]);
    Test1D(fileName, samples);
}

//======================================================================================
void TestSobol1D ()
{
    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    size_t sampleInt = 0;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        size_t ruler = Ruler(i + 1);
        size_t direction = size_t(size_t(1) << size_t(31 - ruler));
        sampleInt = sampleInt ^ direction;
        samples[i] = float(sampleInt) / std::pow(2.0f, 32.0f);
    }

    // save bitmap etc
    Test1D("1DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley1D (size_t truncateBits)
{
    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    size_t sampleInt = 0;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        size_t n = i >> truncateBits;
        float base = 1.0f / 2.0f;
        samples[i] = 0.0f;
        while (n)
        {
            if (n & 1)
                samples[i] += base;
            n /= 2;
            base /= 2.0f;
        }
    }

    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "1DHammersley_%zu.bmp", truncateBits);
    Test1D(fileName, samples);
}

//======================================================================================
float MinimumDistance1D (const std::array<float, NUM_SAMPLES>& samples, size_t numSamples, float x)
{
    // Used by poisson.
    // This returns the minimum distance that point (x) is away from the sample points, from [0, numSamples).
    float minimumDistance = 0.0f;
    for (size_t i = 0; i < numSamples; ++i)
    {
        float distance = std::abs(samples[i] - x);
        if (i == 0 || distance < minimumDistance)
            minimumDistance = distance;
    }
    return minimumDistance;
}

//======================================================================================
void TestPoisson1D ()
{
    // every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
    const size_t c_bestOfAttempts = 100;

    // calculate the sample points
    std::array<float, NUM_SAMPLES> samples;
    for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
    {
        // generate some random points and keep the one that has the largest minimum distance from any of the existing points
        float bestX = 0.0f;
        float bestMinDistance = 0.0f;
        for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
        {
            float attemptX = RandomFloat(0.0f, 1.0f);
            float minDistance = MinimumDistance1D(samples, sampleIndex, attemptX);

            if (minDistance > bestMinDistance)
            {
                bestX = attemptX;
                bestMinDistance = minDistance;
            }
        }
        samples[sampleIndex] = bestX;
    }

    // save bitmap etc
    Test1D("1DPoisson.bmp", samples);
}

//======================================================================================
void TestUniform2D (bool jitter)
{
    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
    const float c_cellSize = 1.0f / float(c_oneSide+1);
    for (size_t iy = 0; iy < c_oneSide; ++iy)
    {
        for (size_t ix = 0; ix < c_oneSide; ++ix)
        {
            size_t sampleIndex = iy * c_oneSide + ix;

            samples[sampleIndex][0] = float(ix + 1) / (float(c_oneSide + 1));
            if (jitter)
                samples[sampleIndex][0] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);

            samples[sampleIndex][1] = float(iy + 1) / (float(c_oneSide) + 1.0f);
            if (jitter)
                samples[sampleIndex][1] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
        }
    }

    // save bitmap etc
    if (jitter)
        Test2D("2DUniformJitter.bmp", samples);
    else
        Test2D("2DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom2D ()
{
    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
    const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i][0] = RandomFloat(0.0f, 1.0f);
        samples[i][1] = RandomFloat(0.0f, 1.0f);
    }

    // save bitmap etc
    Test2D("2DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA2D (size_t regionsX, size_t regionsY)
{
    const float c_randomRangeX = 1.0f / float(regionsX);
    const float c_randomRangeY = 1.0f / float(regionsY);

    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i][0] = RandomFloat(0.0f, c_randomRangeX);
        samples[i][0] += float(i % regionsX) / float(regionsX);

        samples[i][1] = RandomFloat(0.0f, c_randomRangeY);
        samples[i][1] += float(i % regionsY) / float(regionsY);
    }

    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "2DSubRandomA_%zu_%zu.bmp", regionsX, regionsY);
    Test2D(fileName, samples);
}

//======================================================================================
void TestSubRandomB2D ()
{
    // calculate the sample points
    float samplex = RandomFloat(0.0f, 0.5f);
    float sampley = RandomFloat(0.0f, 0.5f);
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samplex = std::fmodf(samplex + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
        sampley = std::fmodf(sampley + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
        samples[i][0] = samplex;
        samples[i][1] = sampley;
    }
    
    // save bitmap etc
    Test2D("2DSubRandomB.bmp", samples);
}

//======================================================================================
void TestHalton (size_t basex, size_t basey)
{
    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
    const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        // x axis
        samples[i][0] = 0.0f;
        {
            float denominator = float(basex);
            size_t n = i;
            while (n > 0)
            {
                size_t multiplier = n % basex;
                samples[i][0] += float(multiplier) / denominator;
                n = n / basex;
                denominator *= basex;
            }
        }

        // y axis
        samples[i][1] = 0.0f;
        {
            float denominator = float(basey);
            size_t n = i;
            while (n > 0)
            {
                size_t multiplier = n % basey;
                samples[i][1] += float(multiplier) / denominator;
                n = n / basey;
                denominator *= basey;
            }
        }
    }

    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "2DHalton_%zu_%zu.bmp", basex, basey);
    Test2D(fileName, samples);
}

//======================================================================================
void TestSobol2D ()
{
    // calculate the sample points

    // x axis
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    size_t sampleInt = 0;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        size_t ruler = Ruler(i + 1);
        size_t direction = size_t(size_t(1) << size_t(31 - ruler));
        sampleInt = sampleInt ^ direction;
        samples[i][0] = float(sampleInt) / std::pow(2.0f, 32.0f);
    }

    // y axis
    // Code adapted from http://web.maths.unsw.edu.au/~fkuo/sobol/
    // uses numbers: new-joe-kuo-6.21201

    // Direction numbers
    std::vector<size_t> V;
    V.resize((size_t)ceil(log((double)NUM_SAMPLES) / log(2.0)));
    V[0] = size_t(1) << size_t(31);
    for (size_t i = 1; i < V.size(); ++i)
        V[i] = V[i - 1] ^ (V[i - 1] >> 1);

    // Samples
    sampleInt = 0;
    for (size_t i = 0; i < NUM_SAMPLES; ++i) {
        size_t ruler = Ruler(i + 1);
        sampleInt = sampleInt ^ V[ruler];
        samples[i][1] = float(sampleInt) / std::pow(2.0f, 32.0f);
    }

    // save bitmap etc
    Test2D("2DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley2D (size_t truncateBits)
{
    // figure out how many bits we are working in.
    size_t value = 1;
    size_t numBits = 0;
    while (value < NUM_SAMPLES)
    {
        value *= 2;
        ++numBits;
    }

    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    size_t sampleInt = 0;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        // x axis
        samples[i][0] = 0.0f;
        {
            size_t n = i >> truncateBits;
            float base = 1.0f / 2.0f;
            while (n)
            {
                if (n & 1)
                    samples[i][0] += base;
                n /= 2;
                base /= 2.0f;
            }
        }

        // y axis
        samples[i][1] = 0.0f;
        {
            size_t n = i >> truncateBits;
            size_t mask = size_t(1) << (numBits - 1 - truncateBits);

            float base = 1.0f / 2.0f;
            while (mask)
            {
                if (n & mask)
                    samples[i][1] += base;
                mask /= 2;
                base /= 2.0f;
            }
        }
    }


    // save bitmap etc
    char fileName[256];
    sprintf(fileName, "2DHammersley_%zu.bmp", truncateBits);
    Test2D(fileName, samples);
}

//======================================================================================
void TestRooks2D ()
{
    // make and shuffle rook positions
    std::random_device rd;
    std::mt19937 mt(rd());
    std::array<size_t, NUM_SAMPLES> rookPositions;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
        rookPositions[i] = i;
    std::shuffle(rookPositions.begin(), rookPositions.end(), mt);

    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        // x axis
        samples[i][0] = float(rookPositions[i]) / float(NUM_SAMPLES-1);

        // y axis
        samples[i][1] = float(i) / float(NUM_SAMPLES - 1);
    }

    // save bitmap etc
    Test2D("2DRooks.bmp", samples);
}

//======================================================================================
void TestIrrational2D (float irrationalx, float irrationaly, float seedx, float seedy)
{
    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    float samplex = seedx;
    float sampley = seedy;
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samplex = std::fmodf(samplex + irrationalx, 1.0f);
        sampley = std::fmodf(sampley + irrationaly, 1.0f);

        samples[i][0] = samplex;
        samples[i][1] = sampley;
    }

    // save bitmap etc
    char irrationalxStr[256];
    sprintf(irrationalxStr, "%f", irrationalx);
    char irrationalyStr[256];
    sprintf(irrationalyStr, "%f", irrationaly);
    char seedxStr[256];
    sprintf(seedxStr, "%f", seedx);
    char seedyStr[256];
    sprintf(seedyStr, "%f", seedy);
    char fileName[256];
    sprintf(fileName, "2DIrrational_%s_%s_%s_%s.bmp", &irrationalxStr[2], &irrationalyStr[2], &seedxStr[2], &seedyStr[2]);
    Test2D(fileName, samples);
}

//======================================================================================
float MinimumDistance2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples, size_t numSamples, float x, float y)
{
    // Used by poisson.
    // This returns the minimum distance that point (x,y) is away from the sample points, from [0, numSamples).
    float minimumDistance = 0.0f;
    for (size_t i = 0; i < numSamples; ++i)
    {
        float distance = Distance(samples[i][0], samples[i][1], x, y);
        if (i == 0 || distance < minimumDistance)
            minimumDistance = distance;
    }
    return minimumDistance;
}

//======================================================================================
void TestPoisson2D ()
{
    // every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
    const size_t c_bestOfAttempts = 100;

    // calculate the sample points
    std::array<std::array<float, 2>, NUM_SAMPLES> samples;
    for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
    {
        // generate some random points and keep the one that has the largest minimum distance from any of the existing points
        float bestX = 0.0f;
        float bestY = 0.0f;
        float bestMinDistance = 0.0f;
        for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
        {
            float attemptX = RandomFloat(0.0f, 1.0f);
            float attemptY = RandomFloat(0.0f, 1.0f);
            float minDistance = MinimumDistance2D(samples, sampleIndex, attemptX, attemptY);

            if (minDistance > bestMinDistance)
            {
                bestX = attemptX;
                bestY = attemptY;
                bestMinDistance = minDistance;
            }
        }
        samples[sampleIndex][0] = bestX;
        samples[sampleIndex][1] = bestY;
    }

    // save bitmap etc
    Test2D("2DPoisson.bmp", samples);
}

//======================================================================================
int main (int argc, char **argv)
{
    // 1D tests
    {
        TestUniform1D(false);
        TestUniform1D(true);

        TestUniformRandom1D();

        TestSubRandomA1D(2);
        TestSubRandomA1D(4);
        TestSubRandomA1D(8);
        TestSubRandomA1D(16);
        TestSubRandomA1D(32);

        TestSubRandomB1D();

        TestVanDerCorput(2);
        TestVanDerCorput(3);
        TestVanDerCorput(4);
        TestVanDerCorput(5);

        // golden ratio mod 1 aka (sqrt(5) - 1)/2
        TestIrrational1D(0.618034f, 0.0f);
        TestIrrational1D(0.618034f, 0.385180f);
        TestIrrational1D(0.618034f, 0.775719f);
        TestIrrational1D(0.618034f, 0.287194f);

        // sqrt(2) - 1
        TestIrrational1D(0.414214f, 0.0f);
        TestIrrational1D(0.414214f, 0.385180f);
        TestIrrational1D(0.414214f, 0.775719f);
        TestIrrational1D(0.414214f, 0.287194f);

        // PI mod 1
        TestIrrational1D(0.141593f, 0.0f);
        TestIrrational1D(0.141593f, 0.385180f);
        TestIrrational1D(0.141593f, 0.775719f);
        TestIrrational1D(0.141593f, 0.287194f);
        
        TestSobol1D();

        TestHammersley1D(0);
        TestHammersley1D(1);
        TestHammersley1D(2);

        TestPoisson1D();
    }

    // 2D tests
    {
        TestUniform2D(false);
        TestUniform2D(true);

        TestUniformRandom2D();

        TestSubRandomA2D(2, 2);
        TestSubRandomA2D(2, 3);
        TestSubRandomA2D(3, 11);
        TestSubRandomA2D(3, 97);

        TestSubRandomB2D();

        TestHalton(2, 3);
        TestHalton(5, 7);
        TestHalton(13, 9);

        TestSobol2D();

        TestHammersley2D(0);
        TestHammersley2D(1);
        TestHammersley2D(2);

        TestRooks2D();

        // X axis = golden ratio mod 1 aka (sqrt(5)-1)/2
        // Y axis = sqrt(2) mod 1
        TestIrrational2D(0.618034f, 0.414214f, 0.0f, 0.0f);
        TestIrrational2D(0.618034f, 0.414214f, 0.775719f, 0.264045f);

        // X axis = sqrt(2) mod 1
        // Y axis = sqrt(3) mod 1
        TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.0f, 0.0f);
        TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.775719f, 0.264045f);

        TestPoisson2D();
    }

    #if CALCULATE_DISCREPANCY
        printf("\n");
        system("pause");
    #endif
}