# Calculating the Distance Between Points in “Wrap Around” (Toroidal) Space

Let’s say you are trying to find the distance between two points in 2D, but that these points are in a universe that “wraps around” like old video games – leaving the screen on the right, left, top or bottom side makes you re-appear on the opposite edge.

This universe is actually shaped like a toroid, also known as a doughnut. It’s actually an impossible object, a “flat torus”, so not exactly a doughnut, but whatever.

If you imagine yourself on the surface of a doughnut, it would behave exactly this way. If you go “down” you end up where you previously considered “up”. If you go far enough “left” you end up where you previously considered “right”.

How would you calculate the distance between two points in a universe like this?

Let’s imagine the situation below where we are trying to find the distance between the red point and the green point:

One way to do this would be to pick one of the points (I’m picking red in this case) and clone it 8 times to surround the cell like the below. You’d calculate the distance from the green point to each of the 9 red points, and whatever distance was smallest would be the answer.

Something not so desirable about this is that it takes 9 distance calculations to find the minimum distance. You can work with squared distances instead of regular distances to avoid a square root on each of these distance calculations, but that’s still a bit of calculation to do.

Going up in dimensions makes the problem even worse. In 3D, it requires 27 distance calculations to find the shortest point, and 81 distance calculations in 4D!

Luckily there’s a better way to approach this.

Let’s say that our universe (image) is 1 unit by 1 unit big (aka we are working in texture UVs). If you look at the image with 9 copies of the red dot, you can see that they are just the 9 possible combinations of having -1, +0, +1 on each axis added to the red dot’s coordinates. All possible combinations of the x and y axis having -1, +0 or +1 added to them are valid locations of the red dot.

Looking at the distance formula we can see that if we minimize each axis individually, that we will also end up with the minimal distance overall.

$d = \sqrt{(x_2-x_1)^2+(y_2-y_1)^2}$

So, the better way is to minimize each axis individually.

On the x axis you’d find if the x axis distance between the red and green point is minimal when you subtract 1 from the red dot’s x axis position, leave it alone, or add 1.

Whichever x axis value of the red dot gives you the minimal x axis 1D distance is the x axis location to use.

You’d repeat for the y axis to get the y axis location to use (and would repeat for any further axes for higher dimensions).

This gives you the closest point which you can then plug into the distance formula to get the distance between the points in this wrap around space.

You can actually do better though.

Still working on each axis individually, you can calculate the absoluate value of the 1D distance between the two points on that axis. If that distance is greater than 0.5, the real distance for that axis is 1-distance.

The intuition here is that if you are in a 1d repeating space, if going from A to B is more than half the distance, it means that you went the wrong way, and that going the other way is shorter. The distance of that other way is one minus whatever distance you just calculated since the distance from one point to itself is 1!

Do that for each axis and use those 1d distances in the distance formula to get the actual distance.

This lets you minimize the distance without having to explicitly figure out which combination makes the point closest.

More importantly, it lets you efficiently calculate the distance between the two points in toroidal space (doughnut space!)

The computational complexity is a lot better. It’s now linear in the number of dimensions: $O(N)$, instead of $O(3^N)$.

Here is some C++ to show you how it would work in 2D.

float ToroidalDistance (float x1, float y1, float x2, float y2)
{
float dx = std::abs(x2 - x1);
float dy = std::abs(y2 - y1);

if (dx > 0.5f)
dx = 1.0f - dx;

if (dy > 0.5f)
dy = 1.0f - dy;

return std::sqrt(dx*dx + dy*dy);
}


I hit this problem trying to make a tileable texture. I needed to place a few circles on a texture such that the circles weren’t too close to each other, even when the texture was tiled.

The calculations above gave me the basic tool needed to be able to calculate distances between points. Subtracting circle radii from the distance between points let me get toroidal distance between circles and make sure I didn’t place them too closely to each other.

That let me make an image that kept the distance constraints even when it was tiled.

Here’s an example image by itself:

Here is the image tiled:

# Generating Random Numbers From a Specific Distribution With Rejection Sampling

The last post showed how to transform uniformly generated random numbers into any random number distribution you desired.

It did so by turning the PDF (probability density function) into a CDF (cumulative density function) and then inverting it – either analytically (making a function) or numerically (making a look up table).

This post will show you how to generate numbers from a PDF as well, but will do so using rejection sampling.

# Dice

Let’s say you wanted to simulate a fair five sided die but that you only had a six sided die.

You can use rejection sampling for this by rolling a six sided die and ignoring the roll any time a six came up. Doing that, you do in fact get a fair five sided die roll!

This shows doing that to get 10,000 five sided die rolls:

One disadvantage to this method is that you are throwing away die rolls which can be a source of inefficiency. In this setup it takes 1.2 six sided die rolls on average to get a valid five sided die roll since a roll will be thrown away 1/6 of the time.

Another disadvantage is that each time you need a new value, there are an unknown number of die rolls needed to get it. On average it’s true that you only need 1.2 die rolls, but in reality, it’s possible you may roll 10 sixes in a row. Heck it’s even technically possible (but very unlikely) that you could be rolling dice until the end of time and keep getting sixes. (Using PRNG’s in computers, this won’t happen, but it does take a variable number of rolls).

This is just to say: there is uneven and unpredictable execution time of this algorithm, and it needs an unknown (but somewhat predictable) amount of random numbers to work. This is true of the other forms of sampling methods I talk about lower down as well.

Instead of using a six sided die you could use a die of any size that is greater than (or equal to…) five. Here shows a twenty sided die simulating a five sided die:

It looks basically the same as using a six sided die, which makes sense (that shows that it works), but in this case, it actually took 4 rolls on average to make a valid five sided die roll, since the roll fails 15/20 times (3 out of 4 rolls will fail).

Quick Asides:

• If straying from rejection sampling ideas for a minute, in the case of the twenty sided die, you could use modulus to get a fair five sided die roll each time: $((roll - 1) \% 5) + 1$. This works because there is no remainder for 20 % 5. If there was a remainder it would bias the rolls towards the numbers <= the remainder, making them more likely to come up than the other numbers.
• You could also get a four sided die roll at the same time if you didn’t want to waste any of this precious random information: $((roll - 1) / 5) + 1$
• Another algorithm to check out for discrete (integer) weighted random numbers is Vose’s method: Vose’s Method.

# Box Around PDF

Moving back into the world of continuous valued random numbers and PDF’s, a simple version of how rejection sampling can be used is like this:

2. Draw a box around the PDF
3. Generate a (uniform) random point in that box
4. If the point is under the curve of the PDF, use the x axis value as your random number, else throw it out and go to 1

That’s all there is to it!

This works because the x axis value of your 2d point is the random number you might be choosing. The y axis value of your 2d point is a probability of choosing that point. Since the PDF graph is higher in places that are more probable, those places are more likely to accept your 2d point than places that have lower PDF values.

Furthermore, the average number of rejected samples vs accepted samples is based on the area under the PDF compared to the area of the box.

The number of samples on average will be the area of the box divided by the area of the PDF.

Since PDF’s by definition have to integrate to 1, that means that you are dividing by 1. So, to simplify: The number of samples on average will be the same as the area of the box!

If it’s hard to come up with the exact size of the box for the PDF, the box doesn’t have to fit exactly, but of course the tighter you can fit the box around the PDF, the fewer rejected samples you’ll have.

You don’t actually need to graph the PDF and draw a box to do this though. Just generate a 2d random number (a random x and a random y) and reject the point if $PDF(x) < y$.

Here I'm using this technique with the PDF $y=2x$ where x is in [0,1) and I'm using a box that goes from (0,0) to (1,2) to get 100,000 samples.

As expected, it took on average 2 points to get a single valid point since the area of the box is 2. Here are how many failed tests each histogram bucket had. Unsurprisingly, lower values of the PDF have more failed tests!

Moving to a more complex PDF, let’s look at $y=\frac{x^3-10x^2+5x+11}{10.417}$

Here are 10 million samples (lots of samples to minimize the noise), using a box height of 1.2, which unsurprisingly takes 1.2 samples on average to get a valid sample:

Here is the graph of the failure counts:

Here the box has a height of 2.8. It still works, but uses 2.8 samples on average which is less efficient:

Here’s the graph of failure counts:

Something interesting about this technique is that technically, the distribution you are sampling from doesn’t even have to be a PDF! If you have negative parts of the graph, they will be treated as zero, assuming your box has a minimum y of 0. Also, the fact that your function may not integrate to (have an area of) 1 doesn’t matter at all.

Here we take the PDF from the last examples, and take off the division by a constant, so that it doesn’t integrate to 1: $y=x^3-10x^2+5x+11$

The interesting thing is that we get as output a normalized PDF (the red line), even though the distribution we were using to sample was not normalized (the blue line, which is mostly hidden behind the yellow line).

Here are the rejection counts:

## Generating One PDF from Another PDF

In the last section we showed how to enclose a PDF in a box, make uniformly random 2d points, and use them to generate points from the PDF.

By enclosing it in a box, all we were really doing is putting it under a uniform distribition that was scaled up to be larger than the PDF at all points.

Now here’s the interesting thing: We aren’t limited to using the uniform distribution!

To generalize this technique, if you are trying to sample from a PDF $f(x)$, you can use any PDF $g(x)$ to do so, so long as you multiply $g(x)$ by a scalar value $M$ so that $M*g(x)>= f(x)$ for all values of x. In other words: scale up g so that it’s always bigger than f.

Using this more generalized technique has one or two more steps than the other way, but allows for a tighter fit of a generating function, resulting in fewer samples thrown away.

Here’s how to do it:

1. Generate a random number from the distribution g, and call it x.
2. Calculate the percentage chance of x being chosen by getting a ratio of how likely that number is to be chosen in each PDF: $\frac{f(x)}{M*g(x)}$
3. Generate a uniform random number from 0 to 1. If it’s less than the value you just calculated, accept x as the random number, else reject it and go back to 1.

Let’s see this in action!

We’ll generate numbers in a Gaussian distribution with a mean of 15 and a standard deviation of 5. We’ll truncate it to +/- 3 standard deviations so we want to generate random numbers from [0,30).

To generate these numbers, we’ll draw random numbers from the PDF $y=x*0.002222$. We’ll use an $M$ value of 3 to scale up this PDF to always be greater than the Gaussian one.

Here is how it looks doing this with 20,000 samples:

We generate random numbers along the red line, multiply them by 3 to make them be the yellow line. Then, at whatever point we are at on the x axis, we divide the blue line value by the yellow line value and use that as an acceptance probability. Doing this and counting numbers in a histogram gives us our result – the green line. Since the end goal is the blue line, you can see it is indeed working! With a larger number of samples, the green line would more closely match the blue line.

Here’s the graph of the failed tests:

We have to take on average 3 samples before we get a valid random number. That shouldn’t be too surprising because both PDF’s start with area of 1, but we are multiplying one of them by 3 to make it always be larger than the other.

Something else interesting you might notice is that we have a lot fewer failed tests where the two PDF functions are more similar.

That is the power of this technique: If you can cheaply and easily generate samples that are “pretty close” to a harder distribution to sample from, you can use this technique to more cheaply sample from it.

Something to note is that just like in the last section, the target PDF doesn’t necessarily need to be a real PDF with only positive values and integrating to 1. It would work just the same with a non PDF function, just so long as the PDF generating the random numbers you start with is always above the function.

# Some Other Notes

There is family of techniques called “adaptive rejection sampling” that will change the PDF they are drawing from whenever there is a failed test.

Basically, if you imagine the PDF you are drawing from as being a bunch of line segments connected together, you could imagine that whenever you failed a test, you moved a line segment down to be closer to the curve, so that when you sampled from that area again, the chances would be lower that you’d fail the test again.

Taking this to the limit, your sampling PDF will eventually become the PDF you are trying to sample from, and then using this PDF will be a no-op.

These techniques are a continued area of research.

Something else to note is that rejection sampling can be used to find random points within shapes.

For instance, a random point on a triangle, ellipse or circle could be done by putting a (tight) bounding box around the shape, generating points randomly in that box, and only accepting ones within the inner shape.

This can be extended to 3d shapes as well.

Some shapes have better ways to generate points within them that don’t involve iteration and rejected samples, but if all else fails, rejection sampling does indeed work!

At some point in the future I’d like to look into “Markov Chain Monte Carlo” more deeply. It seems like a very interesting technique to approach this same problem, but I have no idea if it’s used often in graphics, especially real time graphics.

# Code

Here is the code that generated all the data from this post. The data was visualized with open office.

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <random>
#include <array>
#include <unordered_map>

template <size_t NUM_TEST_SAMPLES, size_t SIMULATED_DICE_SIDES, size_t ACTUAL_DICE_SIDES>
void TestDice (const char* fileName)
{
// seed the random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_int_distribution<size_t> dist(0, ACTUAL_DICE_SIDES-1);

// generate the histogram
std::array<size_t, SIMULATED_DICE_SIDES> histogram = { 0 };
size_t rejectedSamples = 0;
for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
{
size_t roll = dist(rng);
while (roll >= SIMULATED_DICE_SIDES)
{
++rejectedSamples;
roll = dist(rng);
}
histogram[roll]++;
}

// write the histogram and rejected sample count to a csv
// an extra 0 data point forces the graph to include 0 in the scale. hack to make the data not look noisier than it really is.
FILE *file = fopen(fileName, "w+t");
fprintf(file, "Actual Count, Expected Count, , %0.2f samples needed per roll on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
for (size_t value : histogram)
fprintf(file, "%zu,%zu,0\n", value, (size_t)(float(NUM_TEST_SAMPLES) / float(SIMULATED_DICE_SIDES)));
fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void Test (const char* fileName, float maxPDFValue, const PDF_LAMBDA& PDF)
{
// seed the random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution<float> dist(0.0f, 1.0f);

// generate the histogram
std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
size_t rejectedSamples = 0;
for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
{
// Generate a sample from the PDF by generating a random 2d point.
// If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
// NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
float pointX = 0.0f;
float pointY = 0.0f;
bool validPoint = false;
while (!validPoint)
{
pointX = dist(rng);
pointY = dist(rng) * maxPDFValue;
float pdfValue = PDF(pointX);
validPoint = (pointY <= pdfValue);

// track number of failed tests per histogram bucket
if (!validPoint)
{
size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
++rejectedSamples;
}
}

// increment the correct bin in the histogram
size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
}

// write the histogram and pdf sample to a csv
FILE *file = fopen(fileName, "w+t");
fprintf(file, "PDF, Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
{
float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
float pdfSample = PDF(x);
fprintf(file, "%f,%f,%f,%f\n",
pdfSample,
NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
maxPDFValue,
float(failedTestCounts[i])
);
}
fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void TestNotPDF (const char* fileName, float maxPDFValue, float normalizationConstant, const PDF_LAMBDA& PDF)
{
// seed the random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution<float> dist(0.0f, 1.0f);

// generate the histogram
std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
size_t rejectedSamples = 0;
for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
{
// Generate a sample from the PDF by generating a random 2d point.
// If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
// NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
float pointX = 0.0f;
float pointY = 0.0f;
bool validPoint = false;
while (!validPoint)
{
pointX = dist(rng);
pointY = dist(rng) * maxPDFValue;
float pdfValue = PDF(pointX);
validPoint = (pointY <= pdfValue);

// track number of failed tests per histogram bucket
if (!validPoint)
{
size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
++rejectedSamples;
}
}

// increment the correct bin in the histogram
size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
}

// write the histogram and pdf sample to a csv
FILE *file = fopen(fileName, "w+t");
fprintf(file, "Function, Simulated PDF, Scaled Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
{
float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
float pdfSample = PDF(x);
fprintf(file, "%f,%f,%f,%f,%f\n",
pdfSample,
NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES) * normalizationConstant,
maxPDFValue,
float(failedTestCounts[i])
);
}
fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_F_LAMBDA, typename PDF_G_LAMBDA, typename INVERSE_CDF_G_LAMBDA>
void TestPDFToPDF (const char* fileName, const PDF_F_LAMBDA& PDF_F, const PDF_G_LAMBDA& PDF_G, float M, const INVERSE_CDF_G_LAMBDA& Inverse_CDF_G, float rngRange)
{
// We generate a sample from PDF F by generating a sample from PDF G, and accepting it with probability PDF_F(x)/(M*PDF_G(x))

// seed the random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution<float> dist(0.0f, 1.0f);

// generate the histogram
std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
size_t rejectedSamples = 0;
for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
{
// generate random points until we have one that's accepted
// NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
float sampleG = 0.0f;
bool validPoint = false;
while (!validPoint)
{
// Generate a sample from the soure PDF G
sampleG = Inverse_CDF_G(dist(rng));

// calculate the ratio of how likely we are to accept this sample
float acceptChance = PDF_F(sampleG) / (M * PDF_G(sampleG));

// see if we should accept it
validPoint = dist(rng) <= acceptChance;

// track number of failed tests per histogram bucket
if (!validPoint)
{
size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
++rejectedSamples;
}
}

// increment the correct bin in the histogram
size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
}

// write the histogram and pdf sample to a csv
FILE *file = fopen(fileName, "w+t");
fprintf(file, "PDF F,PDF G,Scaled PDF G,Simulated PDF,Failed Tests,%0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
{
float x = (float(i) + 0.5f) * rngRange / float(NUM_HISTOGRAM_BUCKETS);

fprintf(file, "%f,%f,%f,%f,%f\n",
PDF_F(x),
PDF_G(x),
PDF_G(x)*M,
NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / (float(NUM_TEST_SAMPLES)*rngRange),
float(failedTestCounts[i])
);
}
fclose(file);
}

int main(int argc, char **argv)
{
// Dice
{
// Simulate a 5 sided dice with a 6 sided dice
TestDice<10000, 5, 6>("test1_5_6.csv");

// Simulate a 5 sided dice with a 20 sided dice
TestDice<10000, 5, 20>("test1_5_20.csv");
}

// PDF y=2x, simulated with a uniform distribution
{
auto PDF = [](float x) { return 2.0f * x; };

Test<1000, 100>("test2_1k.csv", 2.0f, PDF);
Test<100000, 100>("test2_100k.csv", 2.0f, PDF);
Test<1000000, 100>("test2_1m.csv", 2.0f, PDF);
}

// PDF y=(x^3-10x^2+5x+11)/10.417, simulated with a uniform distribution
{
auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f) / (10.417f); };
Test<10000000, 100>("test3_10m_1_15.csv", 1.15f, PDF);
Test<10000000, 100>("test3_10m_1_5.csv", 1.5f, PDF);
Test<10000000, 100>("test3_10m_2_8.csv", 2.8f, PDF);
}

// function (not PDF, Doesn't integrate to 1!) y=(x^3-10x^2+5x+11), simulated with a scaled up uniform distribution
{
auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f); };
TestNotPDF<10000000, 100>("test4_10m_12_5.csv", 12.5f, 10.417f, PDF);
}

// Generate samples from PDF F using samples from PDF G.  random numbers are from 0 to 30.
// F PDF = gaussian distribution, mean 15, std dev of 5.  Truncated to +/- 3 stddeviations.
// G PDF = x*0.002222
// G CDF = 0.001111 * x^2
// G inverted CDF = (1000 * sqrt(x)) / sqrt(1111)
// M = 3
{
// gaussian PDF F
const float mean = 15.0f;
const float stddev = 5.0f;
auto PDF_F = [=] (float x) -> float
{
return (1.0f / (stddev * sqrt(2.0f * (float)std::_Pi))) * std::exp(-0.5f * pow((x - mean) / stddev, 2.0f));
};

// PDF G
auto PDF_G = [](float x) -> float
{
return x * 0.002222f;
};

// Inverse CDF of G
auto Inverse_CDF_G = [] (float x) -> float
{
return 1000.0f * std::sqrtf(x) / std::sqrtf(1111.0f);
};

TestPDFToPDF<20000, 100>("test5.csv", PDF_F, PDF_G, 3.0f, Inverse_CDF_G, 30.0f);
}

return 0;
}


# Generating Random Numbers From a Specific Distribution By Inverting the CDF

The last post talked about the normal distribution and showed how to generate random numbers from that distribution by generating regular (uniform) random numbers and then counting the bits.

What would you do if you wanted to generate random numbers from a different, arbitrary distribution though? Let’s say the distribution is defined by a function even.

It turns out that in general this is a hard problem, but in practice there are a few ways to approach it. The below are the most common techniques for achieving this that I’ve seen.

• Inverting the CDF (analytically or numerically)
• Rejection Sampling
• Markov Chain Monte Carlo
• Ziggurat algorithm

This post talks about the first one listed: Inverting the CDF.

# What Is A CDF?

The last post briefly explained that a PDF is a probability density function and that it describes the relative probability of numbers being chosen at random. A requirement of a PDF is that it has non negative value everywhere and also that the area under the curve is 1.

It needs to be non negative everywhere because a negative probability doesn’t make any sense. It needs to have an area under the curve of 1 because that means it represents the full 100% probability of all possible outcomes.

CDF stands for “Cumulative distribution function” and is related to the PDF.

A PDF is a function y=f(x) where y is the probability of the number x number being chosen at random from the distribution.

A CDF is a function y=f(x) where y is the probability of the number x, or any lower number, being chosen at random from that distribution.

You get a CDF from a PDF by integrating the PDF. From there you make sure that the CDF has a starting y value of 0, and an ending value of 1. You might have to do a bias (addition or subtraction) and/or scale (multiplication or division) to make that happen.

# Why Invert the CDF? (And Not the PDF?)

With both a PDF and a CDF, you plug in a number, and you get information about probabilities relating to that number.

To get a random number from a specific distribution, we want to do the opposite. We want to plug in a probability and get out the number corresponding to that probability.

Basically, we want to flip x and y in the equation and solve for y, so that we have a function that does this. That is what we have to do to invert the CDF.

Why invert the CDF though and not the PDF? Check out the images below from Wikipedia. The first is some Gaussian PDF’s and the second is the same distributions as CDF’s:

The issue is that if we flip x and y’s in a PDF, there would be multiple y values corresponding to the same x. This isn’t true in a CDF.

Let’s work through sampling some PDFs by inverting the CDF.

# Example 0: y=1

This is the easiest case and represents uniform random numbers, where every number is evenly likely to be chosen.

Our PDF equation is: $y=1$ where $x \in [0,1]$. The graph looks like this:

If we integrate the pdf to get the cdf, we get $y=x$ where $x \in [0,1]$ which looks like this:

Now, to invert the cdf, we flip x and y, and then solve for y again. It’s trivially easy…

$y=x \Leftarrow \text{CDF}\\ x=y \Leftarrow \text{Flip x and y}\\ y=x \Leftarrow \text{Solve for y again}$

Now that we have our inverted CDF, which is $y=x$, we can generate uniform random numbers, plug them into that equation as x and get y which is the actual value drawn from our PDF.

You can see that since we are plugging in numbers from an even distribution and not doing anything to them at all, that the result is going to an even distribution as well. So, we are in fact generating uniformly distributed random numbers using this inverted CDF, just like our PDF asked for.

This is so trivially simple it might be confusing. If so, don’t sweat it. Move onto the next example and you can come back to this later if you want to understand what I’m talking about here.

Note: The rest of the examples are going to have x in [0,1] as well but we are going to stop explicitly saying so. This process still works when x is in a different range of values, but for simplicity we’ll just have x be in [0,1] for the rest of the post.

# Example 1: y=2x

The next easiest case for a PDF is $y=2x$ which looks like this:

You might wonder why it’s $y=2x$ instead of $y=x$. This is because the area under the curve $y=x$ is 0.5. PDF’s need to have an area of 1, so I multiplied by 2 to make it have an area of 1.

What this PDF means is that small numbers are less likely to be picked than large numbers.

If we integrate the PDF $y=2x$ to get the CDF, we get $y=x^2$ which looks like this:

Now let’s flip x and y and solve for y again.

$y=x^2 \Leftarrow \text{CDF}\\ x=y^2 \Leftarrow \text{Flip x and y}\\ y=\sqrt{x} \Leftarrow \text{Solve for y again}$

We now have our inverted CDF which is $y=\sqrt{x}$ and looks like this:

Now, if we plug uniformly random numbers into that formula as x, we should get as output samples that follow the probability of our PDF.

We can use a histogram to see if this is really true. We can generate some random numbers, square root them, and count how many are in each range of values.

Here is a histogram where I took 1,000 random numbers, square rooted them, and put their counts into 100 buckets. Bucket 1 counted how many numbers were in [0, 0.01), bucket 2 counted how many numbers were in [0.01, 0.02) and so on until bucket 100 which counted how many numbers were in [0.99, 1.0).

Increasing the number of samples to 100,000 it gets closer:

At 1,000,000 samples you can barely see a difference:

The reason it doesn’t match up at lower sample counts is just due to the nature of random numbers being random. It does match up, but you’ll have some variation with lower sample counts.

# Example 2: y=3x^2

Let’s check out the PDF $y=3x^2$. The area under that curve where x is in [0,1) is 1.0 and it’s non negative everywhere in that range too, so it’s a valid PDF.

Integrating that, we get $y=x^3$ for the CDF. Then we invert the CDF:

$y=x^3 \Leftarrow \text{CDF}\\ x=y^3 \Leftarrow \text{Flip x and y}\\ y=\sqrt[3]{x} \Leftarrow \text{Solve for y again}$

And here is a 100,000 sample histogram vs the PDF to verify that we got the right answer:

# Example 3: Numeric Solution

So far we’ve been able to invert the CDF to get a nice easy function to transform uniform distribution random numbers into numbers from the distribution described by the PDF.

Sometimes though, inverting a CDF isn’t possible, or gives a complex equation that is costly to evaluate. In these cases, you can actually invert the CDF numerically via a lookup table.

A lookup table may also be desired in cases where eg you have a pixel shader that is drawing numbers from a PDF, and instead of making N shaders for N different PDFs, you want to unify them all into a single shader. Passing a lookup table via a constant buffer, or perhaps even via a texture can be a decent solution here. (Note: if storing in a texture you may be interested in fitting the data with curves and using this technique to store it and recall it from the texture: GPU Texture Sampler Bezier Curve Evaluation)

Let’s invert a PDF numerically using a look up table to see how that would work.

Our PDF will be:

$y=\frac{x^3-10x^2+5x+11}{10.417}$

And looks like this:

It’s non negative in the range we care about and it integrates to 1.0 – or it integrates closely enough… the division by 10.417 is there for that reason, and using more digits would get it closer to 1.0.

What we are going to do is evaluate that PDF at N points to get a probability for those samples of numbers. That will give us a lookup table for our PDF.

We are then going to make each point be the sum of all the PDF samples to the left of it to make a lookup table for a CDF. We’ll also have to normalize the CDF table since it’s likely that our PDF samples don’t all add up (integrate) to 1.0. We do this by dividing every item in the CDF by the last entry in the CDF. If you look at the table after that, it will fully cover everything from 0% to 100% probability.

Below are some histogram comparisons of the lookup table technique vs the actual PDF.

Here is 100 million samples (to make it easier to see the data without very much random noise), in 100 histogram buckets, and a lookup table size of 3 which is pretty low quality:

Increasing it to a lookup table of size 5 gives you this:

Here’s 10:

25:

And here’s 100:

So, not surprisingly, the size of the lookup table affects the quality of the results!

## Code

here is the code I used to generate the data in this post, which i visualized with open office. I visualized the function graphs using wolfram alpha.

#define _CRT_SECURE_NO_WARNINGS

#include
#include
#include
#include

template
void Test (const char* fileName, const PDF_LAMBDA& PDF, const INVERSE_CDF_LAMBDA& inverseCDF)
{
// seed the random number generator
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution dist(0.0f, 1.0f);

// generate the histogram
std::array histogram = { 0 };
for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
{
// put a uniform random number into the inverted CDF to sample the PDF
float x = dist(rng);
float y = inverseCDF(x);

// increment the correct bin on the histogram
size_t bin = (size_t)std::floor(y * float(NUM_HISTOGRAM_BUCKETS));
histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
}

// write the histogram and pdf sample to a csv
FILE *file = fopen(fileName, "w+t");
fprintf(file, "PDF, Inverted CDF\n");
for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
{
float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
float pdfSample = PDF(x);
fprintf(file, "%f,%f\n",
pdfSample,
NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES)
);
}
fclose(file);
}

template
void TestPDFOnly (const char* fileName, const PDF_LAMBDA& PDF)
{
// make the CDF lookup table by sampling the PDF
// NOTE: we could integrate the buckets by averaging multiple samples instead of just the 1. This bucket integration is pretty low tech and low quality.
std::array CDFLookupTable;
float value = 0.0f;
for (size_t i = 0; i < LOOKUP_TABLE_SIZE; ++i)
{
float x = float(i) / float(LOOKUP_TABLE_SIZE - 1); // The -1 is so we cover the full range from 0% to 100%
value += PDF(x);
CDFLookupTable[i] = value;
}

// normalize the CDF - make sure we span the probability range 0 to 1.
for (float& f : CDFLookupTable)
f /= value;

// make our LUT based inverse CDF
// We will binary search over the y's (which are sorted smallest to largest) looking for the x, which is implied by the index.
// I'm sure there's a better & more clever lookup table setup for this situation but this should give you an idea of the technique
auto inverseCDF = [&CDFLookupTable] (float y) {

// there is an implicit entry of "0%" at index -1
if (y < CDFLookupTable[0])
{
float t = y / CDFLookupTable[0];
return t / float(LOOKUP_TABLE_SIZE);
}

// get the lower bound in the lut using a binary search
auto it = std::lower_bound(CDFLookupTable.begin(), CDFLookupTable.end(), y);

// figure out where we are at in the table
size_t index = it - CDFLookupTable.begin();

// Linearly interpolate between the values
// NOTE: could do other interpolation methods, like perhaps cubic (https://blog.demofox.org/2015/08/08/cubic-hermite-interpolation/)
float t = (y - CDFLookupTable[index - 1]) / (CDFLookupTable[index] - CDFLookupTable[index - 1]);
float fractionalIndex = float(index) + t;
return fractionalIndex / float(LOOKUP_TABLE_SIZE);
};

// call the usual function to do the testing
Test(fileName, PDF, inverseCDF);
}

int main (int argc, char **argv)
{
// PDF: y=2x
// inverse CDF: y=sqrt(x)
{
auto PDF = [] (float x) { return 2.0f * x; };
auto inverseCDF = [] (float x) { return std::sqrt(x); };

Test("test1_1k.csv", PDF, inverseCDF);
Test("test1_100k.csv", PDF, inverseCDF);
Test("test1_1m.csv", PDF, inverseCDF);
}

// PDF: y=3x^2
// inverse CDF: y=cuberoot(x) aka y = pow(x, 1/3)
{
auto PDF = [] (float x) { return 3.0f * x * x; };
auto inverseCDF = [](float x) { return std::pow(x, 1.0f / 3.0f); };

Test("test2_100k.csv", PDF, inverseCDF);
}

// PDF: y=(x^3-10x^2+5x+11)/10.417
// Inverse CDF Numerically via a lookup table
{
auto PDF = [] (float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f) / (10.417f); };
TestPDFOnly("test3_100m_3.csv", PDF);
TestPDFOnly("test3_100m_5.csv", PDF);
TestPDFOnly("test3_100m_10.csv", PDF);
TestPDFOnly("test3_100m_25.csv", PDF);
TestPDFOnly("test3_100m_100.csv", PDF);
}

return 0;
}


# Counting Bits & The Normal Distribution

I recently saw some interesting posts on twitter about the normal distribution:

I’m not really a statistics kind of guy, but knowing that probability distributions come up in graphics (Like in PBR & Path Tracing), it seemed like a good time to upgrade knowledge in this area while sharing an interesting technique for generating normal distribution random numbers.

## Basics

Below is an image showing a few normal (aka Gaussian) distributions (from wikipedia).

Normal distributions are defined by these parameters:

• $\mu$ – “mu” is the mean. This is the average value of the distribution. This is where the center (peak) of the curve is on the x axis.
• $\sigma^2$ – “sigma squared” is the variance, and is just the standard deviation squared. I find standard deviation more intuitive to think about.
• $\sigma$ – “sigma” is the standard deviation, which (surprise surprise!) is the square root of the variance. This controls the “width” of the graph. The area under the cover is 1.0, so as you increase standard deviation and make the graph wider, it also gets shorter.

Here’s a diagram of standard deviations to help understand them (also from wikipedia):

I find the standard deviation intuitive because 68.2% of the data is within one standard deviation from the mean (on the plus and minus side of the mean). 95.4% of the data is within two standard deviations of the mean.

Standard deviation is given in the same units as the data itself, so if a bell curve described scores on a test, with a mean of 80 and a standard deviation of 5, it means that 68.2% of the students got between 75 and 85 points on the test, and that 95.4% of the students got between 70 and 90 points on the test.

The normal distribution is what’s called a “probability density function” or pdf, which means that the y axis of the graph describes the likelyhood of the number on the x axis being chosen at random.

This means that if you have a normal distribution that has a specific mean and variance (standard deviation), that numbers closer to the mean are more likely to be chosen randomly, while numbers farther away are less likely. The variance controls how the probability drops off as you get farther away from the mean.

Thinking about standard deviation again, 68.2% of the random numbers generated will be within 1 standard deviation of the mean (+1 std dev or -1 std dev). 95.4% will be within 2 standard deviations.

## Generating Normal Distribution Random Numbers – Coin Flips

Generating uniform random numbers, where every number is as likely as every other number, is pretty simple. In the physical world, you can roll some dice or flip some coins. In the software world, you can use PRNGs.

How would you generate random numbers that follow a normal distribution though?

In C++, there is std::normal_distribution that can do this for you. There is also something called the Box-Muller transform that can turn uniformly distributed random numbers into normal distribution random numbers (info here: Generating Gaussian Random Numbers).

I want to talk about something else though and hopefully build some better intuition.

First let’s look at coin flips.

If you flip a fair coin a million times and keep a count of how many heads and tails you saw, you might get 500014 heads and 499986 tails (I got this with a PRNG – std::mt19937). That is a pretty uniform distribution of values in the range of [0,1]. (breadcrumb: pascal’s triangle row 2 is 1,1)

Let’s flip two coins at a time though and add our values together (say that heads is 0 and tails is 1). Here’s what that graph looks like:

Out of 1 million flips, 250639 had no tails, 500308 had one tail, and 249053 had two tails. It might seem weird that they aren’t all even, but it makes more sense when you look at the outcome of flipping two coins: we can get heads/heads (00), heads/tails (01), tails/heads (10) or tails/tails (11). Two of the four possibilities have a single tails, so it makes sense that flipping two coins and getting one coin being a tail would be twice as likely as getting no tails or two tails. (breadcrumb: pascal’s triangle row 3 is 1,2,1)

What happens when we sum 3 coins? With a million flips I got 125113 0’s, 375763 1’s, 373905 2’s and 125219 3’s.

If you work out the possible combinations, there is 1 way to get 0, 3 ways to get 1, 3 ways to get 2 and 1 way to get 3. Those numbers almost exactly follow that 1, 3, 3, 1 probability. (breadcrumb: pascal’s triangle row 4 is 1,3,3,1)

If we flip 100 coins and sum them, we get this:

That looks a bit like the normal distribution graphs at the beginning of this post doesn’t it?

Flipping and summing coins will get you something called the “Binomial Distribution”, and the interesting thing there is that the binomial distribution approaches the normal distribution the more coins you are summing together. At an infinite number of coins, it is the normal distribution.

## Generating Normal Distribution Random Numbers – Dice Rolls

What if instead of flipping coins, we roll dice?

Well, rolling a 4 sided die a million times, you get each number roughly the same percentage of the time as you’d expect; roughly 25% each. 250125 0’s, 250103 1’s, 249700 2’s, 250072 3’s.

If we sum two 4 sided dice rolls we get this:

If we sum three 4 sided dice rolls we get this:

And if we sum one hundred we get this, which sure looks like a normal distribution:

This isn’t limited to four sided dice though, here’s one hundred 6 sided dice being summed:

With dice, instead of being a “binomial distribution”, it’s called a “multinomial distribution”, but as the number of dice goes to infinity, it also approaches the normal distribution.

This means you can get a normal distribution with not only coins, but any sided dice in general.

An even stronger statement than that is the Central Limit Theorem which says that if you have random numbers from ANY distribution, if you add enough of em together, you’ll often approach a normal distribution.

Strange huh?

## Generating Normal Distribution Random Numbers – Counting Bits

Now comes a fun way of generating random numbers which follow a normal distribution. Are you ready for it?

Simply generate an N bit random number and return how many 1 bits are set.

That gives you a random number that follows a normal distribution!

One problem with this is that you have very low “resolution” random numbers. Counting the bits of a 64 bit random number for instance, you can only return 0 through 64 so there are only 65 possible random numbers.

That is a pretty big limitation, but if you need normal distribution numbers calculated quickly and don’t mind if they are low resolution (like in a pixel shader?), this technique could work well for you.

Another problem though is that you don’t have control over the variance or the mean of the distribution.

That isn’t a super huge deal though because you can easily convert numbers from one normal distribution into another normal distribution.

To do so, you get your normal distribution random number. First you subtract the mean of the distribution to make it centered on 0 (have a mean of 0). You then divide it by the standard deviation to make it be part of a distribution which has a standard deviation of 1.

At this point you have a random number from a normal distribution which has a mean of 0 and a standard deviation of 1.

Next, you multiply the number by the standard deviation of the distribution you want, and lastly you add the mean of the distribution you want.

That’s pretty simple (and is implemented in the source code at the bottom of this post), but to do this you need to know what standard deviation (variance) and mean you are starting with.

If you have some way to generate random numbers in [0, N) and you are summing M of those numbers together, the mean is $M*(N-1)/2$. Note that if you instead are generating random numbers in [1,N], the mean instead is $M*(N+1)/2$.

The variance in either case is $M*(N^2-1)/12$. The standard deviation is the square root of that.

Using that information you have everything you need to generate normal distribution random numbers of a specified mean and variance.

Thanks to @fahickman for the help on calculating mean and variance of dice roll sums.

## Code

Here is the source code I used to generate the data which was used to generate the graphs in this post. There is also an implementation of the bit counting algorithm i mentioned, which converts to the desired mean and variance.

#define _CRT_SECURE_NO_WARNINGS

#include <array>
#include <random>
#include <stdint.h>
#include <stdio.h>
#include <limits>

const size_t c_maxNumSamples = 1000000;
const char* c_fileName = "results.csv";

template <size_t DiceRange, size_t DiceCount, size_t NumBuckets>
void DumpBucketCountsAddRandomNumbers (size_t numSamples, const std::array<size_t, NumBuckets>& bucketCounts)
{
// open file for append if we can
FILE* file = fopen(c_fileName, "a+t");
if (!file)
return;

// write the info
float mean = float(DiceCount) * float(DiceRange - 1.0f) / 2.0f;
float variance = float(DiceCount) * (DiceRange * DiceRange) / 12.0f;
if (numSamples == 1)
{
fprintf(file, "\"%zu random numbers [0,%zu) added together (sum %zud%zu). %zu buckets.  Mean = %0.2f.  Variance = %0.2f.  StdDev = %0.2f.\"\n", DiceCount, DiceRange, DiceCount, DiceRange, NumBuckets, mean, variance, std::sqrt(variance));
fprintf(file, "\"\"");
for (size_t i = 0; i < NumBuckets; ++i)
fprintf(file, ",\"%zu\"", i);
fprintf(file, "\n");
}
fprintf(file, "\"%zu samples\",", numSamples);

// report the samples
for (size_t count : bucketCounts)
fprintf(file, "\"%zu\",", count);

fprintf(file, "\"\"\n");
if (numSamples == c_maxNumSamples)
fprintf(file, "\n");

// close file
fclose(file);
}

template <size_t DiceSides, size_t DiceCount>
{
std::mt19937 rng;
rng.seed(std::random_device()());
std::uniform_int_distribution<size_t> dist(size_t(0), DiceSides - 1);

std::array<size_t, (DiceSides - 1) * DiceCount + 1> bucketCounts = { 0 };

size_t nextDump = 1;
for (size_t i = 0; i < c_maxNumSamples; ++i)
{
size_t sum = 0;
for (size_t j = 0; j < DiceCount; ++j)
sum += dist(rng);

bucketCounts[sum]++;

if (i + 1 == nextDump)
{
nextDump *= 10;
}
}
}

template <size_t NumBuckets>
void DumpBucketCountsCountBits (size_t numSamples, const std::array<size_t, NumBuckets>& bucketCounts)
{
// open file for append if we can
FILE* file = fopen(c_fileName, "a+t");
if (!file)
return;

// write the info
float mean = float(NumBuckets-1) * 1.0f / 2.0f;
float variance = float(NumBuckets-1) * 3.0f / 12.0f;
if (numSamples == 1)
{
fprintf(file, "\"%zu random bits (coin flips) added together. %zu buckets.  Mean = %0.2f.  Variance = %0.2f.  StdDev = %0.2f.\"\n", NumBuckets - 1, NumBuckets, mean, variance, std::sqrt(variance));
fprintf(file, "\"\"");
for (size_t i = 0; i < NumBuckets; ++i)
fprintf(file, ",\"%zu\"", i);
fprintf(file, "\n");
}
fprintf(file, "\"%zu samples\",", numSamples);

// report the samples
for (size_t count : bucketCounts)
fprintf(file, "\"%zu\",", count);

fprintf(file, "\"\"\n");
if (numSamples == c_maxNumSamples)
fprintf(file, "\n");

// close file
fclose(file);
}

template <size_t NumBits> // aka NumCoinFlips!
void CountBitsTest ()
{

size_t maxValue = 0;
for (size_t i = 0; i < NumBits; ++i)
maxValue = (maxValue << 1) | 1;

std::mt19937 rng;
rng.seed(std::random_device()());
std::uniform_int_distribution<size_t> dist(0, maxValue);

std::array<size_t, NumBits + 1> bucketCounts = { 0 };

size_t nextDump = 1;
for (size_t i = 0; i < c_maxNumSamples; ++i)
{
size_t sum = 0;
size_t number = dist(rng);
while (number)
{
if (number & 1)
++sum;
number = number >> 1;
}

bucketCounts[sum]++;

if (i + 1 == nextDump)
{
DumpBucketCountsCountBits(nextDump, bucketCounts);
nextDump *= 10;
}
}
}

float GenerateNormalRandomNumber (float mean, float variance)
{
static std::mt19937 rng;
static std::uniform_int_distribution<uint64_t> dist(0, (uint64_t)-1);

static bool seeded = false;
if (!seeded)
{
seeded = true;
rng.seed(std::random_device()());
}

// generate our normal distributed random number from 0 to 65.
//
float sum = 0.0f;
uint64_t number = dist(rng);
while (number)
{
if (number & 1)
sum += 1.0f;
number = number >> 1;
}

// convert from: mean 32, variance 16, stddev 4
// to: mean 0, variance 1, stddev 1
float ret = sum;
ret -= 32.0f;
ret /= 4.0f;

// convert to the specified mean and variance
ret *= std::sqrt(variance);
ret += mean;
return ret;
}

void VerifyGenerateNormalRandomNumber (float mean, float variance)
{
// open file for append if we can
FILE* file = fopen(c_fileName, "a+t");
if (!file)
return;

// write info
fprintf(file, "\"Normal Distributed Random Numbers. mean = %0.2f.  variance = %0.2f.  stddev = %0.2f\"\n", mean, variance, std::sqrt(variance));

// write some random numbers
fprintf(file, "\"100 numbers\"");
for (size_t i = 0; i < 100; ++i)
fprintf(file, ",\"%f\"", GenerateNormalRandomNumber(mean, variance));
fprintf(file, "\n\n");

// close file
fclose(file);
}

int main (int argc, char **argv)
{
// clear out the file
FILE* file = fopen(c_fileName, "w+t");
if (file)
fclose(file);

// coin flips
{
// flip a fair coin

// flip two coins and sum them

// sum 3 coin flips

// sum 100 coin flips
}

// dice rolls
{
// roll a 4 sided die

// sum two 4 sided dice

// sum three 4 sided dice

// sum one hundred 4 sided dice

// sum one hundred 6 sided dice
}

CountBitsTest<8>();
CountBitsTest<16>();
CountBitsTest<32>();
CountBitsTest<64>();

VerifyGenerateNormalRandomNumber(0.0f, 20.0f);

VerifyGenerateNormalRandomNumber(0.0f, 10.0f);

VerifyGenerateNormalRandomNumber(5.0f, 10.0f);

return 0;
}


# When Random Numbers Are Too Random: Low Discrepancy Sequences

Random numbers can be useful in graphics and game development, but they have a pesky and sometimes undesirable habit of clumping together.

This is a problem in path tracing and monte carlo integration when you take N samples, but the samples aren’t well spread across the sampling range.

This can also be a problem for situations like when you are randomly placing objects in the world or generating treasure for a treasure chest. You don’t want your randomly placed trees to only be in one part of the forest, and you don’t want a player to get only trash items or only godly items when they open a treasure chest. Ideally you want to have some randomness, but you don’t want the random number generator to give you all of the same or similar random numbers.

The problem is that random numbers can be TOO random, like in the below where you can see clumps and large gaps between the 100 samples.

For cases like that, when you want random numbers that are a little bit more well distributed, you might find some use in low discrepancy sequences.

The standalone C++ code (one source file, standard headers, no libraries to link to) I used to generate the data and images are at the bottom of this post, as well as some links to more resources.

# What Is Discrepancy?

In this context, discrepancy is a measurement of the highest or lowest density of points in a sequence. High discrepancy means that there is either a large area of empty space, or that there is an area that has a high density of points. Low discrepancy means that there are neither, and that your points are more or less pretty evenly distributed.

The lowest discrepancy possible has no randomness at all, and in the 1 dimensional case means that the points are evenly distributed on a grid. For monte carlo integration and the game dev usage cases I mentioned, we do want some randomness, we just want the random points to be spread out a little more evenly.

If more formal math notation is your thing, discrepancy is defined as:
$D_{N}(P)=\sup _{{B\in J}}\left|{\frac {A(B;P)}{N}}-\lambda _{s}(B)\right|$

Equidistributed sequence

For monte carlo integration specifically, this is the behavior each thing gives you:

• High Discrepancy: Random Numbers / White Noise aka Uniform Distribution – At lower sample counts, convergance is slower (and have higher variance) due to the possibility of not getting good coverage over the area you integrating. At higher sample counts, this problem disappears. (Hint: real time graphics and preview renderings use a smaller number of samples)
• Lowest Discrepancy: Regular Grid – This will cause aliasing, unlike the other “random” based sampling, which trade aliasing for noise. Noise is preferred over aliasing.
• Low Discrepancy: Low Discrepancy Sequences – In lower numbers of samples, this will have faster convergence by having better coverage of the sampling space, but will use randomness to get rid of aliasing by introducing noise.

Also interesting to note, Quasi Monte Carlo has provably better asymptotic convergence than regular monte carlo integration.

# 1 Dimensional Sequences

We’ll first look at 1 dimensional sequences.

## Grid

Here are 100 samples evenly spaced:

## Random Numbers (White Noise)

This is actually a high discrepancy sequence. To generate this, you just use a standard random number generator to pick 100 points between 0 and 1. I used std::mt19937 with a std::uniform_real_distribution from 0 to 1:

## Subrandom Numbers

Subrandom numbers are ways to decrease the discrepancy of white noise.

One way to do this is to break the sampling space in half. You then generate even numbered samples in the first half of the space, and odd numbered samples in the second half of the space.

There’s no reason you can’t generalize this into more divisions of space though.

This splits the space into 4 regions:

8 regions:

16 regions:

32 regions:

There are other ways to generate subrandom numbers though. One way is to generate random numbers between 0 and 0.5, and add them to the last sample, plus 0.5. This gives you a random walk type setup.

Here is that:

## Uniform Sampling + Jitter

If you take the first subrandom idea to the logical maximum, you break your sample space up into N sections and place one point within those N sections to make a low discrepancy sequence made up of N points.

Another way to look at this is that you do uniform sampling, but add some random jitter to the samples, between +/- half a uniform sample size, to keep the samples in their own areas.

This is that:

I have heard that Pixar invented this technique interestingly.

# Irrational Numbers

Rational numbers are numbers which can be described as fractions, such as 0.75 which can be expressed as 3/4. Irrational numbers are numbers which CANNOT be described as fractions, such as pi, or the golden ratio, or the square root of a prime number.

Interestingly you can use irrational numbers to generate low discrepancy sequences. You start with some value (could be 0, or could be a random number), add the irrational number, and modulus against 1.0. To get the next sample you add the irrational value again, and modulus against 1.0 again. Rinse and repeat until you get as many samples as you want.

Some values work better than others though, and apparently the golden ratio is provably the best choice (1.61803398875…), says Wikipedia.

Here is the golden ratio, using 4 different random (white noise) starting values:

Here I’ve used the square root of 2, with 4 different starting random numbers again:

Lastly, here is pi, with 4 random starting values:

## Van der Corput Sequence

The Van der Corput sequence is the 1d equivelant of the Halton sequence which we’ll talk about later.

How you generate values in the Van der Corput sequence is you convert the index of your sample into some base.

For instance if it was base 2, you would convert your index to binary. If it was base 16, you would convert your index to hexadecimal.

Now, instead of treating the digits as if they are $B^0$, $B^1$, $B^2$, etc (where B is the base), you instead treat them as $B^{-1}$, $B^{-2}$, $B^{-3}$ and so on. In other words, you multiply each digit by a fraction and add up the results.

To show a couple quick examples, let’s say we wanted sample 6 in the sequence of base 2.

First we convert 6 to binary which is 110. From right to left, we have 3 digits: a 0 in the 1’s place, a 1 in the 2’s place, and a 1 in the 4’s place. $0*1 + 1*2 + 1*4 = 6$, so we can see that 110 is in fact 6 in binary.

To get the Van der Corput value for this, instead of treating it as the 1’s, 2’s and 4’s digit, we treat it as the 1/2, 1/4 and 1/8’s digit.

$0 * 1/2 + 1 * 1/4 + 1 * 1/8 = 3/8$.

So, sample 6 in the Van der Corput sequence using base 2 is 3/8.

Let’s try sample 21 in base 3.

First we convert 21 to base 3 which is 210. We can verify this is right by seeing that $0 * 1 + 1 * 3 + 2 * 9 = 21$.

Instead of a 1’s, 3’s and 9’s digit, we are going to treat it like a 1/3, 1/9 and 1/27 digit.

$0 * 1/3 + 1 * 1/9 + 2 * 1/27 = 5/27$

So, sample 21 in the Van der Corput sequence using base 3 is 5/27.

Here is the Van der Corput sequence for base 2:

Here it is for base 3:

Base 4:

Base 5:

## Sobol

One dimensional Sobol is actually just the Van der Corput sequence base 2 re-arranged a little bit, but it’s generated differently.

You start with 0 (either using it as sample 0 or sample -1, doesn’t matter which), and for each sample you do this:

1. Calculate the Ruler function value for the current sample’s index(more info in a second)
2. Make the direction vector by shifting 1 left (in binary) 31 – ruler times.
3. XOR the last sample by the direction vector to get the new sample
4. To interpret the sample as a floating point number you divide it by $2^{32}$

That might sound completely different than the Van der Corput sequence but it actually is the same thing – just re-ordered.

In the final step when dividing by $2^{32}$, we are really just interpreting the binary number as a fraction just like before, but it’s the LEFT most digit that is the 1/2 spot, not the RIGHT most digit.

The Ruler Function goes like: 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, …

It’s pretty easy to calculate too. Calculating the ruler function for an index (starting at 1) is just the zero based index of the right most 1’s digit after converting the number to binary.

1 in binary is 001 so Ruler(1) is 0.
2 in binary is 010 so Ruler(2) is 1.
3 in binary is 011 so Ruler(3) is 0.
4 in binary is 100 so Ruler(4) is 2.
5 in binary is 101 so Ruler(5) is 0.
and so on.

Here is 1D Sobol:

# Hammersley

In one dimension, the Hammersley sequence is the same as the base 2 Van der Corput sequence, and in the same order. If that sounds strange that it’s the same, it’s a 2d sequence I broke down into a 1d sequence for comparison. The one thing Hammersley has that makes it unique in the 1d case is that you can truncate bits.

It doesn’t seem that useful for 1d Hammersley to truncate bits but knowing that is useful info too I guess. Look at the 2d version of Hammersley to get a fairer look at it, because it’s meant to be a 2d sequence.

Here is Hammersley:

With 1 bit truncated:

With 2 bits truncated:

# Poisson Disc

Poisson disc points are points which are densely packed, but have a minimum distance from each other.

Computer scientists are still working out good algorithms to generate these points efficiently.

I use “Mitchell’s Best-Candidate” which means that when you want to generate a new point in the sequence, you generate N new points, and choose whichever point is farthest away from the other points you’ve generated so far.

Here it is where N is 100:

# 2 Dimensional Sequences

Next up, let’s look at some 2 dimensional sequences.

## Grid

Below is 2d uniform samples on a grid.

Note that uniform grid is not particularly low discrepancy for the 2d case! More info here: Is it expected that uniform points would have non zero discrepancy?

## Random

Here are 100 random points:

## Uniform Grid + Jitter

Here is a uniform grid that has random jitter applied to the points. Jittered grid is a pretty commonly used low discrepancy sampling technique that has good success.

## Subrandom

Just like in 1 dimensions, you can apply the subrandom ideas to 2 dimensions where you divide the X and Y axis into so many sections, and randomly choose points in the sections.

If you divide X and Y into the same number of sections though, you are going to have a problem because some areas are not going to have any points in them.

@Reedbeta pointed out that instead of using i%x and i%y, that you could use i%x and (i/x)%y to make it pick points in all regions.

Picking different numbers for X and Y can be another way to give good results. Here’s dividing X and Y into 2 and 3 sections respectively:

If you choose co-prime numbers for divisions for each axis you can get maximal period of repeats. 2 and 3 are coprime so the last example is a good example of that, but here is 3 and 11:

Here is 3 and 97. 97 is large enough that with only doing 100 samples, we are almost doing jittered grid on the y axis.

Here is the other subrandom number from 1d, where we start with a random value for X and Y, and then add a random number between 0 and 0.5 to each, also adding 0.5, to make a “random walk” type setup again:

## Halton

The Halton sequence is just the Van der Corput sequence, but using a different base on each axis.

Here is the Halton sequence where X and Y use bases 2 and 3:

Here it is using bases 5 and 7:

Here are bases 13 and 9:

# Irrational Numbers

The irrational numbers technique can be used for 2d as well but I wasn’t able to find out how to make it give decent looking output that didn’t have an obvious diagonal pattern in them. Bart Wronski shared a neat paper that explains how to use the golden ratio in 2d with great success: Golden Ratio Sequences For Low-Discrepancy Sampling

This uses the golden ratio for the X axis and the square root of 2 for the Y axis. Below that is the same, with a random starting point, to make it give a different sequence.

Here X axis uses square root of 2 and Y axis uses square root of 3. Below that is a random starting point, which gives the same discrepancy.

## Hammersley

In 2 dimensions, the Hammersley sequence uses the 1d Hammersley sequence for the X axis: Instead of treating the binary version of the index as binary, you treat it as fractions like you do for Van der Corput and sum up the fractions.

For the Y axis, you just reverse the bits and then do the same!

Here is the Hammersley sequence. Note we would have to take 128 samples (not just the 100 we did) if we wanted it to fill the entire square with samples.

Truncating bits in 2d is a bit useful. Here is 1 bit truncated:

2 bits truncated:

## Poisson Disc

Using the same method we did for 1d, we can generate points in 2d space:

## N Rooks

There is a sampling pattern called N-Rooks where you put N rooks onto a chess board and arrange them such that no two are in the same row or column.

A way to generate these samples is to realize that there will be only one rook per row, and that none of them will ever be in the same column. So, you make an array that has numbers 0 to N-1, and then shuffle the array. The index into the array is the row, and the value in the array is the column.

Here are 100 rooks:

## Sobol

Sobol in two dimensions is more complex to explain so I’ll link you to the source I used: Sobol Sequences Made Simple.

The 1D sobol already covered is used for the X axis, and then something more complex was used for the Y axis:

Bart Wronski has a really great series on a related topic: Dithering in Games

Wikipedia: Low Discrepancy Sequence

Wikipedia: Halton Sequence

Wikipedia: Van der Corput Sequence

Using Fibonacci Sequence To Generate Colors

Deeper info and usage cases for low discrepancy sequences

Poisson-Disc Sampling

Low discrepancy sequences are related to blue noise. Where white noise contains all frequencies evenly, blue noise has more high frequencies and fewer low frequencies. Blue noise is essentially the ultimate in low discrepancy, but can be expensive to compute. Here are some pages on blue noise:

Free Blue Noise Textures

The problem with 3D blue noise

Stippling and Blue Noise

Vegetation placement in “The Witness”

Here are some links from @marc_b_reynolds:
Sobol (low-discrepancy) sequence in 1-3D, stratified in 2-4D.
Classic binary-reflected gray code.
Sobol.h

Weyl Sequence

## Code

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers and performance counter.  Sorry non windows people!
#include <vector>
#include <stdint.h>
#include <random>
#include <array>
#include <algorithm>
#include <stdlib.h>
#include <set>

typedef uint8_t uint8;

#define NUM_SAMPLES 100  // to simplify some 2d code, this must be a square
#define NUM_SAMPLES_FOR_COLORING 100

// Turning this on will slow things down significantly because it's an O(N^5) operation for 2d!
#define CALCULATE_DISCREPANCY 0

#define IMAGE1D_WIDTH 600
#define IMAGE1D_HEIGHT 50
#define IMAGE2D_WIDTH 300
#define IMAGE2D_HEIGHT 300

#define AXIS_HEIGHT 40
#define DATA_HEIGHT 20
#define DATA_WIDTH 2

#define COLOR_FILL SColor(255,255,255)
#define COLOR_AXIS SColor(0, 0, 0)

//======================================================================================
struct SImageData
{
SImageData ()
: m_width(0)
, m_height(0)
{ }

size_t m_width;
size_t m_height;
size_t m_pitch;
std::vector<uint8> m_pixels;
};

struct SColor
{
SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
: R(_R), G(_G), B(_B)
{ }

uint8 B, G, R;
};

//======================================================================================
bool SaveImage (const char *fileName, const SImageData &image)
{
// open the file if we can
FILE *file;
file = fopen(fileName, "wb");
if (!file) {
printf("Could not save %s\n", fileName);
return false;
}

// write the data and close the file
fclose(file);

return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
image.m_width = width;
image.m_height = height;
image.m_pitch = 4 * ((width * 24 + 31) / 32);
image.m_pixels.resize(image.m_pitch * image.m_width);
std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
void ImageClear (SImageData& image, const SColor& color)
{
uint8* row = &image.m_pixels[0];
for (size_t rowIndex = 0; rowIndex < image.m_height; ++rowIndex)
{
SColor* pixels = (SColor*)row;
std::fill(pixels, pixels + image.m_width, color);

row += image.m_pitch;
}
}

//======================================================================================
void ImageBox (SImageData& image, size_t x1, size_t x2, size_t y1, size_t y2, const SColor& color)
{
for (size_t y = y1; y < y2; ++y)
{
uint8* row = &image.m_pixels[y * image.m_pitch];
SColor* start = &((SColor*)row)[x1];
std::fill(start, start + x2 - x1, color);
}
}

//======================================================================================
float Distance (float x1, float y1, float x2, float y2)
{
float dx = (x2 - x1);
float dy = (y2 - y1);

return std::sqrtf(dx*dx + dy*dy);
}

//======================================================================================
SColor DataPointColor (size_t sampleIndex)
{
SColor ret;
float percent = (float(sampleIndex) / (float(NUM_SAMPLES_FOR_COLORING) - 1.0f));

ret.R = uint8((1.0f - percent) * 255.0f);
ret.G = 0;
ret.B = uint8(percent * 255.0f);

float mag = (float)sqrt(ret.R*ret.R + ret.G*ret.G + ret.B*ret.B);
ret.R = uint8((float(ret.R) / mag)*255.0f);
ret.G = uint8((float(ret.G) / mag)*255.0f);
ret.B = uint8((float(ret.B) / mag)*255.0f);

return ret;
}

//======================================================================================
float RandomFloat (float min, float max)
{
static std::random_device rd;
static std::mt19937 mt(rd());
std::uniform_real_distribution<float> dist(min, max);
return dist(mt);
}

//======================================================================================
size_t Ruler (size_t n)
{
size_t ret = 0;
while (n != 0 && (n & 1) == 0)
{
n /= 2;
++ret;
}
return ret;
}

//======================================================================================
float CalculateDiscrepancy1D (const std::array<float, NUM_SAMPLES>& samples)
{
// some info about calculating discrepancy
// https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

// Calculates the discrepancy of this data.
// Assumes the data is [0,1) for valid sample range
std::array<float, NUM_SAMPLES> sortedSamples = samples;
std::sort(sortedSamples.begin(), sortedSamples.end());

float maxDifference = 0.0f;
for (size_t startIndex = 0; startIndex <= NUM_SAMPLES; ++startIndex)
{
// startIndex 0 = 0.0f.  startIndex 1 = sortedSamples[0]. etc

float startValue = 0.0f;
if (startIndex > 0)
startValue = sortedSamples[startIndex - 1];

for (size_t stopIndex = startIndex; stopIndex <= NUM_SAMPLES; ++stopIndex)
{
// stopIndex 0 = sortedSamples[0].  startIndex[N] = 1.0f. etc

float stopValue = 1.0f;
if (stopIndex < NUM_SAMPLES)
stopValue = sortedSamples[stopIndex];

float length = stopValue - startValue;

// open interval (startValue, stopValue)
size_t countInside = 0;
for (float sample : samples)
{
if (sample > startValue &&
sample < stopValue)
{
++countInside;
}
}
float density = float(countInside) / float(NUM_SAMPLES);
float difference = std::abs(density - length);
if (difference > maxDifference)
maxDifference = difference;

// closed interval [startValue, stopValue]
countInside = 0;
for (float sample : samples)
{
if (sample >= startValue &&
sample <= stopValue)
{
++countInside;
}
}
density = float(countInside) / float(NUM_SAMPLES);
difference = std::abs(density - length);
if (difference > maxDifference)
maxDifference = difference;
}
}
return maxDifference;
}

//======================================================================================
float CalculateDiscrepancy2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples)
{
// some info about calculating discrepancy
// https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

// Calculates the discrepancy of this data.
// Assumes the data is [0,1) for valid sample range.

// Get the sorted list of unique values on each axis
std::set<float> setSamplesX;
std::set<float> setSamplesY;
for (const std::array<float, 2>& sample : samples)
{
setSamplesX.insert(sample[0]);
setSamplesY.insert(sample[1]);
}
std::vector<float> sortedXSamples;
std::vector<float> sortedYSamples;
sortedXSamples.reserve(setSamplesX.size());
sortedYSamples.reserve(setSamplesY.size());
for (float f : setSamplesX)
sortedXSamples.push_back(f);
for (float f : setSamplesY)
sortedYSamples.push_back(f);

// Get the sorted list of samples on the X axis, for faster interval testing
std::array<std::array<float, 2>, NUM_SAMPLES> sortedSamplesX = samples;
std::sort(sortedSamplesX.begin(), sortedSamplesX.end(),
[] (const std::array<float, 2>& itemA, const std::array<float, 2>& itemB)
{
return itemA[0] < itemB[0];
}
);

// calculate discrepancy
float maxDifference = 0.0f;
for (size_t startIndexY = 0; startIndexY <= sortedYSamples.size(); ++startIndexY)
{
float startValueY = 0.0f;
if (startIndexY > 0)
startValueY = *(sortedYSamples.begin() + startIndexY - 1);

for (size_t startIndexX = 0; startIndexX <= sortedXSamples.size(); ++startIndexX)
{
float startValueX = 0.0f;
if (startIndexX > 0)
startValueX = *(sortedXSamples.begin() + startIndexX - 1);

for (size_t stopIndexY = startIndexY; stopIndexY <= sortedYSamples.size(); ++stopIndexY)
{
float stopValueY = 1.0f;
if (stopIndexY < sortedYSamples.size())
stopValueY = sortedYSamples[stopIndexY];

for (size_t stopIndexX = startIndexX; stopIndexX <= sortedXSamples.size(); ++stopIndexX)
{
float stopValueX = 1.0f;
if (stopIndexX < sortedXSamples.size())
stopValueX = sortedXSamples[stopIndexX];

// calculate area
float length = stopValueX - startValueX;
float height = stopValueY - startValueY;
float area = length * height;

// open interval (startValue, stopValue)
size_t countInside = 0;
for (const std::array<float, 2>& sample : samples)
{
if (sample[0] > startValueX &&
sample[1] > startValueY &&
sample[0] < stopValueX &&
sample[1] < stopValueY)
{
++countInside;
}
}
float density = float(countInside) / float(NUM_SAMPLES);
float difference = std::abs(density - area);
if (difference > maxDifference)
maxDifference = difference;

// closed interval [startValue, stopValue]
countInside = 0;
for (const std::array<float, 2>& sample : samples)
{
if (sample[0] >= startValueX &&
sample[1] >= startValueY &&
sample[0] <= stopValueX &&
sample[1] <= stopValueY)
{
++countInside;
}
}
density = float(countInside) / float(NUM_SAMPLES);
difference = std::abs(density - area);
if (difference > maxDifference)
maxDifference = difference;
}
}
}
}

return maxDifference;
}

//======================================================================================
void Test1D (const char* fileName, const std::array<float, NUM_SAMPLES>& samples)
{
// create and clear the image
SImageData image;

// setup the canvas
ImageClear(image, COLOR_FILL);

// calculate the discrepancy
#if CALCULATE_DISCREPANCY
float discrepancy = CalculateDiscrepancy1D(samples);
printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
#endif

// draw the sample points
size_t i = 0;
for (float f: samples)
{
size_t pos = size_t(f * float(IMAGE1D_WIDTH)) + IMAGE_PAD;
ImageBox(image, pos, pos + 1, IMAGE1D_CENTERY - DATA_HEIGHT / 2, IMAGE1D_CENTERY + DATA_HEIGHT / 2, DataPointColor(i));
++i;
}

// draw the axes lines. horizontal first then the two vertical
ImageBox(image, IMAGE_PAD, IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);
ImageBox(image, IMAGE1D_WIDTH + IMAGE_PAD, IMAGE1D_WIDTH + IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);

// save the image
SaveImage(fileName, image);
}

//======================================================================================
void Test2D (const char* fileName, const std::array<std::array<float,2>, NUM_SAMPLES>& samples)
{
// create and clear the image
SImageData image;

// setup the canvas
ImageClear(image, COLOR_FILL);

// calculate the discrepancy
#if CALCULATE_DISCREPANCY
float discrepancy = CalculateDiscrepancy2D(samples);
printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
#endif

// draw the sample points
size_t i = 0;
for (const std::array<float, 2>& sample : samples)
{
size_t posx = size_t(sample[0] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
size_t posy = size_t(sample[1] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
ImageBox(image, posx - 1, posx + 1, posy - 1, posy + 1, DataPointColor(i));
++i;
}

// horizontal lines

// vertical lines

// save the image
SaveImage(fileName, image);
}

//======================================================================================
void TestUniform1D (bool jitter)
{
// calculate the sample points
const float c_cellSize = 1.0f / float(NUM_SAMPLES+1);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = float(i+1) / float(NUM_SAMPLES+1);
if (jitter)
samples[i] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
}

// save bitmap etc
if (jitter)
Test1D("1DUniformJitter.bmp", samples);
else
Test1D("1DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom1D ()
{
// calculate the sample points
const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
samples[i] = RandomFloat(0.0f, 1.0f);

// save bitmap etc
Test1D("1DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA1D (size_t numRegions)
{
const float c_randomRange = 1.0f / float(numRegions);

// calculate the sample points
const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = RandomFloat(0.0f, c_randomRange);
samples[i] += float(i % numRegions) / float(numRegions);
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DSubRandomA_%zu.bmp", numRegions);
Test1D(fileName, samples);
}

//======================================================================================
void TestSubRandomB1D ()
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
float sample = RandomFloat(0.0f, 0.5f);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
sample = std::fmodf(sample + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
samples[i] = sample;
}

// save bitmap etc
Test1D("1DSubRandomB.bmp", samples);
}

//======================================================================================
void TestVanDerCorput (size_t base)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = 0.0f;
float denominator = float(base);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % base;
samples[i] += float(multiplier) / denominator;
n = n / base;
denominator *= base;
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DVanDerCorput_%zu.bmp", base);
Test1D(fileName, samples);
}

//======================================================================================
void TestIrrational1D (float irrational, float seed)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
float sample = seed;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
sample = std::fmodf(sample + irrational, 1.0f);
samples[i] = sample;
}

// save bitmap etc
char irrationalStr[256];
sprintf(irrationalStr, "%f", irrational);
char seedStr[256];
sprintf(seedStr, "%f", seed);
char fileName[256];
sprintf(fileName, "1DIrrational_%s_%s.bmp", &irrationalStr[2], &seedStr[2]);
Test1D(fileName, samples);
}

//======================================================================================
void TestSobol1D ()
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t ruler = Ruler(i + 1);
size_t direction = size_t(size_t(1) << size_t(31 - ruler));
sampleInt = sampleInt ^ direction;
samples[i] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// save bitmap etc
Test1D("1DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley1D (size_t truncateBits)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t n = i >> truncateBits;
float base = 1.0f / 2.0f;
samples[i] = 0.0f;
while (n)
{
if (n & 1)
samples[i] += base;
n /= 2;
base /= 2.0f;
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DHammersley_%zu.bmp", truncateBits);
Test1D(fileName, samples);
}

//======================================================================================
float MinimumDistance1D (const std::array<float, NUM_SAMPLES>& samples, size_t numSamples, float x)
{
// Used by poisson.
// This returns the minimum distance that point (x) is away from the sample points, from [0, numSamples).
float minimumDistance = 0.0f;
for (size_t i = 0; i < numSamples; ++i)
{
float distance = std::abs(samples[i] - x);
if (i == 0 || distance < minimumDistance)
minimumDistance = distance;
}
return minimumDistance;
}

//======================================================================================
void TestPoisson1D ()
{
// every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
const size_t c_bestOfAttempts = 100;

// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
{
// generate some random points and keep the one that has the largest minimum distance from any of the existing points
float bestX = 0.0f;
float bestMinDistance = 0.0f;
for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
{
float attemptX = RandomFloat(0.0f, 1.0f);
float minDistance = MinimumDistance1D(samples, sampleIndex, attemptX);

if (minDistance > bestMinDistance)
{
bestX = attemptX;
bestMinDistance = minDistance;
}
}
samples[sampleIndex] = bestX;
}

// save bitmap etc
Test1D("1DPoisson.bmp", samples);
}

//======================================================================================
void TestUniform2D (bool jitter)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_cellSize = 1.0f / float(c_oneSide+1);
for (size_t iy = 0; iy < c_oneSide; ++iy)
{
for (size_t ix = 0; ix < c_oneSide; ++ix)
{
size_t sampleIndex = iy * c_oneSide + ix;

samples[sampleIndex][0] = float(ix + 1) / (float(c_oneSide + 1));
if (jitter)
samples[sampleIndex][0] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);

samples[sampleIndex][1] = float(iy + 1) / (float(c_oneSide) + 1.0f);
if (jitter)
samples[sampleIndex][1] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
}
}

// save bitmap etc
if (jitter)
Test2D("2DUniformJitter.bmp", samples);
else
Test2D("2DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom2D ()
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i][0] = RandomFloat(0.0f, 1.0f);
samples[i][1] = RandomFloat(0.0f, 1.0f);
}

// save bitmap etc
Test2D("2DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA2D (size_t regionsX, size_t regionsY)
{
const float c_randomRangeX = 1.0f / float(regionsX);
const float c_randomRangeY = 1.0f / float(regionsY);

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i][0] = RandomFloat(0.0f, c_randomRangeX);
samples[i][0] += float(i % regionsX) / float(regionsX);

samples[i][1] = RandomFloat(0.0f, c_randomRangeY);
samples[i][1] += float(i % regionsY) / float(regionsY);
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DSubRandomA_%zu_%zu.bmp", regionsX, regionsY);
Test2D(fileName, samples);
}

//======================================================================================
void TestSubRandomB2D ()
{
// calculate the sample points
float samplex = RandomFloat(0.0f, 0.5f);
float sampley = RandomFloat(0.0f, 0.5f);
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samplex = std::fmodf(samplex + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
sampley = std::fmodf(sampley + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
samples[i][0] = samplex;
samples[i][1] = sampley;
}

// save bitmap etc
Test2D("2DSubRandomB.bmp", samples);
}

//======================================================================================
void TestHalton (size_t basex, size_t basey)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = 0.0f;
{
float denominator = float(basex);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % basex;
samples[i][0] += float(multiplier) / denominator;
n = n / basex;
denominator *= basex;
}
}

// y axis
samples[i][1] = 0.0f;
{
float denominator = float(basey);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % basey;
samples[i][1] += float(multiplier) / denominator;
n = n / basey;
denominator *= basey;
}
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DHalton_%zu_%zu.bmp", basex, basey);
Test2D(fileName, samples);
}

//======================================================================================
void TestSobol2D ()
{
// calculate the sample points

// x axis
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t ruler = Ruler(i + 1);
size_t direction = size_t(size_t(1) << size_t(31 - ruler));
sampleInt = sampleInt ^ direction;
samples[i][0] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// y axis
// uses numbers: new-joe-kuo-6.21201

// Direction numbers
std::vector<size_t> V;
V.resize((size_t)ceil(log((double)NUM_SAMPLES) / log(2.0)));
V[0] = size_t(1) << size_t(31);
for (size_t i = 1; i < V.size(); ++i)
V[i] = V[i - 1] ^ (V[i - 1] >> 1);

// Samples
sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i) {
size_t ruler = Ruler(i + 1);
sampleInt = sampleInt ^ V[ruler];
samples[i][1] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// save bitmap etc
Test2D("2DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley2D (size_t truncateBits)
{
// figure out how many bits we are working in.
size_t value = 1;
size_t numBits = 0;
while (value < NUM_SAMPLES)
{
value *= 2;
++numBits;
}

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = 0.0f;
{
size_t n = i >> truncateBits;
float base = 1.0f / 2.0f;
while (n)
{
if (n & 1)
samples[i][0] += base;
n /= 2;
base /= 2.0f;
}
}

// y axis
samples[i][1] = 0.0f;
{
size_t n = i >> truncateBits;
size_t mask = size_t(1) << (numBits - 1 - truncateBits);

float base = 1.0f / 2.0f;
{
samples[i][1] += base;
base /= 2.0f;
}
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DHammersley_%zu.bmp", truncateBits);
Test2D(fileName, samples);
}

//======================================================================================
void TestRooks2D ()
{
// make and shuffle rook positions
std::random_device rd;
std::mt19937 mt(rd());
std::array<size_t, NUM_SAMPLES> rookPositions;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
rookPositions[i] = i;
std::shuffle(rookPositions.begin(), rookPositions.end(), mt);

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = float(rookPositions[i]) / float(NUM_SAMPLES-1);

// y axis
samples[i][1] = float(i) / float(NUM_SAMPLES - 1);
}

// save bitmap etc
Test2D("2DRooks.bmp", samples);
}

//======================================================================================
void TestIrrational2D (float irrationalx, float irrationaly, float seedx, float seedy)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
float samplex = seedx;
float sampley = seedy;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samplex = std::fmodf(samplex + irrationalx, 1.0f);
sampley = std::fmodf(sampley + irrationaly, 1.0f);

samples[i][0] = samplex;
samples[i][1] = sampley;
}

// save bitmap etc
char irrationalxStr[256];
sprintf(irrationalxStr, "%f", irrationalx);
char irrationalyStr[256];
sprintf(irrationalyStr, "%f", irrationaly);
char seedxStr[256];
sprintf(seedxStr, "%f", seedx);
char seedyStr[256];
sprintf(seedyStr, "%f", seedy);
char fileName[256];
sprintf(fileName, "2DIrrational_%s_%s_%s_%s.bmp", &irrationalxStr[2], &irrationalyStr[2], &seedxStr[2], &seedyStr[2]);
Test2D(fileName, samples);
}

//======================================================================================
float MinimumDistance2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples, size_t numSamples, float x, float y)
{
// Used by poisson.
// This returns the minimum distance that point (x,y) is away from the sample points, from [0, numSamples).
float minimumDistance = 0.0f;
for (size_t i = 0; i < numSamples; ++i)
{
float distance = Distance(samples[i][0], samples[i][1], x, y);
if (i == 0 || distance < minimumDistance)
minimumDistance = distance;
}
return minimumDistance;
}

//======================================================================================
void TestPoisson2D ()
{
// every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
const size_t c_bestOfAttempts = 100;

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
{
// generate some random points and keep the one that has the largest minimum distance from any of the existing points
float bestX = 0.0f;
float bestY = 0.0f;
float bestMinDistance = 0.0f;
for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
{
float attemptX = RandomFloat(0.0f, 1.0f);
float attemptY = RandomFloat(0.0f, 1.0f);
float minDistance = MinimumDistance2D(samples, sampleIndex, attemptX, attemptY);

if (minDistance > bestMinDistance)
{
bestX = attemptX;
bestY = attemptY;
bestMinDistance = minDistance;
}
}
samples[sampleIndex][0] = bestX;
samples[sampleIndex][1] = bestY;
}

// save bitmap etc
Test2D("2DPoisson.bmp", samples);
}

//======================================================================================
int main (int argc, char **argv)
{
// 1D tests
{
TestUniform1D(false);
TestUniform1D(true);

TestUniformRandom1D();

TestSubRandomA1D(2);
TestSubRandomA1D(4);
TestSubRandomA1D(8);
TestSubRandomA1D(16);
TestSubRandomA1D(32);

TestSubRandomB1D();

TestVanDerCorput(2);
TestVanDerCorput(3);
TestVanDerCorput(4);
TestVanDerCorput(5);

// golden ratio mod 1 aka (sqrt(5) - 1)/2
TestIrrational1D(0.618034f, 0.0f);
TestIrrational1D(0.618034f, 0.385180f);
TestIrrational1D(0.618034f, 0.775719f);
TestIrrational1D(0.618034f, 0.287194f);

// sqrt(2) - 1
TestIrrational1D(0.414214f, 0.0f);
TestIrrational1D(0.414214f, 0.385180f);
TestIrrational1D(0.414214f, 0.775719f);
TestIrrational1D(0.414214f, 0.287194f);

// PI mod 1
TestIrrational1D(0.141593f, 0.0f);
TestIrrational1D(0.141593f, 0.385180f);
TestIrrational1D(0.141593f, 0.775719f);
TestIrrational1D(0.141593f, 0.287194f);

TestSobol1D();

TestHammersley1D(0);
TestHammersley1D(1);
TestHammersley1D(2);

TestPoisson1D();
}

// 2D tests
{
TestUniform2D(false);
TestUniform2D(true);

TestUniformRandom2D();

TestSubRandomA2D(2, 2);
TestSubRandomA2D(2, 3);
TestSubRandomA2D(3, 11);
TestSubRandomA2D(3, 97);

TestSubRandomB2D();

TestHalton(2, 3);
TestHalton(5, 7);
TestHalton(13, 9);

TestSobol2D();

TestHammersley2D(0);
TestHammersley2D(1);
TestHammersley2D(2);

TestRooks2D();

// X axis = golden ratio mod 1 aka (sqrt(5)-1)/2
// Y axis = sqrt(2) mod 1
TestIrrational2D(0.618034f, 0.414214f, 0.0f, 0.0f);
TestIrrational2D(0.618034f, 0.414214f, 0.775719f, 0.264045f);

// X axis = sqrt(2) mod 1
// Y axis = sqrt(3) mod 1
TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.0f, 0.0f);
TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.775719f, 0.264045f);

TestPoisson2D();
}

#if CALCULATE_DISCREPANCY
printf("\n");
system("pause");
#endif
}


# Improved Storage Space Efficiency of GPU Texture Sampler Bezier Curve Evaluation

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves (also just polynomials in general as well as rational polynomials, and also surfaces and volumes made by tensor products). You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension improves on the efficiency of the storage space usage, allowing a higher density of curve data per pixel, but the post also talks about some caveats and limitations.

This post is divided into the following sections:

1. Basic Idea of Extension
2. 2D Texture / Quadratic Piecewise Curves
3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity
4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency
5. Real World Limitations
6. 3D Texture / Cubic Piecewise Curves
7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?
8. 3D Texture / Cubic Piecewise Curves – C0 Continuity
9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency
10. Generalizing The Unit Hyper Cube
11. Closing
12. Code

# 1. Basic Idea of Extension

Let’s talk about the base technique before going into the details of the extension.

The image below shows how bilinear interpolation across the diagonal between pixels can calculate points on curves. Bilinear interpolation is exactly equivalent to the De Casteljau algorithm when the u and v coordinate are the same value.

Linear interpolation between two values A and B at time t is done with this formula:
$A(1-t) + Bt$

I’ve found useful to replace (1-t) with it’s own symbol s. That makes it become this:
$As + Bt$

Now, if you bilinear interpolate between 4 values, you have two rows. One row has A,B in it and the other row has C,D in it. To bilinear interpolate between these four values at time (t,t), the formula is this:
$(As + Bt)s + (Cs+Dt)t$

If you expand that and collect like terms you come up with this equation:
$As^2 + (B+C)st + Dt^2$

At this point, the last step is to make B and C the same value (make them both into B) and then rename D to C since that letter is unused. The resulting formula turns out to be the formula for a quadratic Bezier curve. This shows that mathematically, bilinear interpolation can be made to be mathematically the same as the quadratic Bezier formula. (Note: there are extensions to get higher order curves and surfaces as well)
$As^2 + 2Bst + Ct^2$

However, for this extension we are going to take one step back to the prior equation:
$As^2 + (B+C)st + Dt^2$

What you may notice is that the two values in the corners of the 2×2 bilinear interpolation don’t have to be the exact value of the middle control point of the quadratic Bezier curve – they only have to AVERAGE to that value.

This is interesting because to encode two different piecewise quadratic curves (C0-C2 and C3-C5) into a 2d texture before this extension, I would do it like this:

$A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3 \\ E = C_4 \\ F = C_5\\$

That uses 8 pixels to store the 6 control points of the two quadratic curves.

With the ideas of this extension, one way it could look now is this:

$A = C_0 \\ B + C = 2*C_1 : B = 2*C_1 - C_3 \\ D = C_2 \\ C = C_3 \\ D + E = 2*C_4 : E = 2*C_4 - C_2\\ F = C_5\\$

The result is that 6 pixels are used instead of 8, for storing the 6 control points of the two quadratic curves.

That isn’t the only result though, so let’s explore the details (:

# 2. 2D Texture / Quadratic Piecewise Curves

Let’s start by more formally looking at the 2d texture / quadratic curve case.

We are going to number the pixels by their texture coordinate location (in the form of Pyx) instead of using letters. Later on that will help show a pattern of generalization. We are still using the same notation for control points where C0 is the first control point, C1 is the second control point and so on.

Looking at a single quadratic curve we have this texture which has these constraints on it’s pixel values:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\$

To analyze this, let’s make an augmented matrix. The left matrix is a 3×4 matrix where each column is a pixel and each row is the left side of the equation for a constraint. The right matrix is a 3×3 matrix where each column is a control point and each row is the right side of the equation for a constraint. The first row of the matrix is column labels to help see what’s going on more easily.

Note that i put my pixel columns and control point columns in ascending order in the matrix, but if you put them in a different order, you’d get the same (or equivalent) results as I did. It’s just my convention they are in this order.

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

The next step would be to put this matrix into reduced row echelon form to solve the equations to see what the values of the pixels need to be, but the matrix is in fact already in rref form! (For more information on rref, check out my last post: Solving N equations and N unknowns: The Fine Print (Gauss Jordan Elimination))

What we can see by looking at the rref of the matrix is that either P01 or P10 can be a free variable – meaning we can choose whatever value we want for it. After we choose a value for either of those variables (pixels), the rest of the pixels are fully defined.

Deciding that P10 is the free variable (just by convention that it isn’t the leading non zero value), the second equation (constraint) becomes P01 = 2*C1-P10.

If we choose the value C1 for P10, that means that P01 must equal C1 too (this is how the original technique worked). If we choose 0 for P10, that means that P01 must equal 2*C1. This is because P01 must always equal 2*C1-P10. We then are in the new territory of this extension, where the pixels representing the middle control point have some freedom about what values they can take on, so long as they average to the middle control point value.

Let’s add a row of pixels and try encoding a second quadratic curve:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5$

Let’s again make an augmented matrix with pixels on the left and control points on the right.

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]$

Putting that into rref to solve for the pixel values we get this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]$

We got the identity matrix on the left, so we don’t have any inconsistencies or free variables.

If we turn that matrix back into equations we get this:

$P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = C_2 \\ P_{20} = 2*C_4 - C_2 \\ P_{21} = C_5$

We were successful! We can store two piecewise Bezier curves in 6 pixels by setting the pixel values to these specific values.

The last example we’ll show is the next stage, where it falls apart. We’ll add another row of pixels and try to encode 3 Bezier curves (9 control points) into those 8 pixels.

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5 \\ P_{20} = C_6 \\ P_{21}+P_{30} = 2*C_7 \\ P_{31} = C_8$

This is the augmented matrix with pixels on the left and control points on the right:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

The rref form is:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & -2 & 0 & 1 & 0 & 0 \\ \end{array}\right]$

Let’s turn that back into equations.

$P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = 2*C_4 - C_6 \\ P_{20} = C_6 \\ P_{21} = C_5 \\ P_{30} = 2*C_7 - C_5 \\ P_{31} = C_8 \\ 0 = C_2 - 2*C_4 + C_6$

We have a problem unfortunately! The bottom row says this:

$0 = C_2 - 2*C_4 + C_6$

That means that we can only store these curves in this pixel configuration if we limit the values of the control points 2,4,6 to values that make that last equation true.

Since my desire is to be able to store curves in textures without “unusual” restrictions on what the control points can be, I’m going to count this as a failure for a general case solution.

It only gets worse from here for the case of trying to add another row of pixels for each curve you want to add.

It looks like storing two quadratic curves in a 2×6 group of pixels is the most optimal (data dense) storage. If you go any higher, it puts restrictions on the control points. If you go any lower, you have a free variable, which means you aren’t making full use of all of the pixels you have.

This means that if you are storing piecewise quadratic curves in 2d textures, doing it this way will cause you to use 3/4 as many pixels as doing it the other way, and you will be using 1 pixel per control point stored, instead of 1.333 pixels per control point stored.

This isn’t the end of the story though, so let’s continue (:

# 3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity

If we add the requirement that our piecewise curves must be connected (aka that they have C0 continuity), we can actually do something pretty interesting. Take a look at this setup:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5$

Putting this into matrix form looks like this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

In rref it becomes this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 & 0 & -2 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 \\ \end{array}\right]$

Turning the rref back into equations we get:

$P_{00} = C_0 \\ P_{01} - P_{21} = 2*C_1-2*C_4 \\ P_{10} + P_{21} = 2*C_4 \\ P_{11} = C_3 \\ P_{20} = C_5 \\ 0 = C_2 - C_3$

P21 is a free variable, so we can set it to whatever we want. Once we choose a value, the pixel values P01 and P10 will be fully defined.

The bottom equation might have you worried, because it looks like an inconsistency (aka restriction) but it is actually expected.

That last equation says 0 = C2-C3 which can be rearranged into C2 = C3. That just means that the end of our first curve has to equal the beginning of our second curve. That is C0 just the continuity we already said we’d agree to.

So, it worked! Let’s try adding a row of pixels and another curve to see what happens.

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5\\ P_{20} = C_6\\ P_{21}+P_{30} = 2*C_7\\ P_{31} = C_8$

Putting that into matrix form:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ \end{array}\right]$

And in rref:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 2 & 0 & 0 & -2 & 0 & 0 & 2 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & -2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0\\ \end{array}\right]$

Turning the rref back into equations:

$P_{00} = C_0 \\ P_{01}+P_{30} = 2*C_1 - 2*C_4+2*C_7 \\ P_{10}-P_{30} = 2*C_4-2*C_7 \\ P_{11} = C_3 \\ P_{20} = C_6 \\ P_{21} + P_{30} = 2*C_7\\ P_{31} = C_8\\ 0 = C_2 - C_3\\ 0 = C_5 - C_6\\$

We see that P30 is a free variable, and the last two rows show us we have the C0 continuity requirements: C2 = C3 and C5 = C6.

The last section without C0 continuity reached it’s limit of storage space efficiency after storing two curves (6 control points) in 6 pixels.

When we add the C0 continuity requirement, we were able to take it further and store 3 curves in 8 pixels. Technically those 3 curves have 9 control points, but because the end point of each curve has to be the same as the start point of the next curve it makes it so in reality there is only 3 control points for the first curve and then 2 additional control points for each additional curve. That makes 8 control points for 3 curves, not 9.

Unlike the last section, using this zigzag pattern with C0 continuity, you can encode any number of curves. I am not sure how to prove it, but from observation, there is no sign of any shrinking of capacity as we increase the number of curves, adding two more rows of pixels for each curve. If you know how to prove this more formally, please let me know!

Note that instead of explicitly having 3 control points per curve, where the first control point of a curve has to equal the last control point of the previous curve, you can instead describe the piecewise curves with fewer control points. You need 3 control points for the first curve, and then 2 control points for each curve after that.

Mathematically both ways are equivelant and you’ll get to the same answer. The accompanying source code works that way, but I show this example in this longer way to more explicitly show how things work.

# 4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency

Let’s compare the storage efficiency of the last two sections to each other, as well as to the original technique.

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 6 & 1.00 & 3.00 \\ 3 & 2x5 & 10 & 9 & 1.11 & 3.33 \\ 4 & 2x6 & 12 & 12 & 1.00 & 3.00 \\ 5 & 2x8 & 16 & 15 & 1.06 & 3.20 \\ 6 & 2x9 & 18 & 18 & 1.00 & 3.00 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 5 & 1.20 & 3.00 \\ 3 & 2x4 & 8 & 7 & 1.14 & 2.66 \\ 4 & 2x5 & 10 & 9 & 1.11 & 2.50 \\ 5 & 2x6 & 12 & 11 & 1.09 & 2.40 \\ 6 & 2x7 & 14 & 13 & 1.08 & 2.33 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x4 & 8 & 6 & 1.33 & 4.00 \\ 3 & 2x6 & 12 & 9 & 1.33 & 4.00 \\ 4 & 2x8 & 16 & 12 & 1.33 & 4.00 \\ 5 & 2x10 & 20 & 15 & 1.33 & 4.00 \\ 6 & 2x12 & 24 & 18 & 1.33 & 4.00 \\ \hline \end{array}$

The tables show that the first method uses fewer pixels per control point, while the second method uses fewer pixels per curve.

The first method can get you to what I believe to be the maximum density of 1 pixel per control point if you store an even number of curves. It can also give you a curve for every 3 pixels of storage.

The second method approaches the 1 pixel per control point as you store more and more curves and also approaches 2 pixels of storage per curve stored. Note that the second method’s table is using the convention of 3 control points are used for the first curve, and 2 additional control points for each curve after that.

The deciding factor for which method to use is probably going to be whether or not you want to force C0 continuity of your curve data. If so, you’d use the second technique, else you’d use the first.

The original technique uses a constant 1.33 pixels per control point, and 4 pixels to store each curve. Those numbers shows how this extension improves on the storage efficiency of the original technique.

# 5. Real World Limitations

This extension has a problem that the original technique does not have unfortunately.

While the stuff above is correct mathematically, there are limitations on the values we can store in actual textures. For instance, if we have 8 bit uint8 color channels we can only store values 0 to 255.

Looking at one of the equations $P_{01} = 2*C_1 - C_3$, if C1 is 255 and C3 is 0, we are going to need to store 510 in the 8 bits we have for P01, which we can’t. If C1 is 0 and C3 is not zero, we are going to have to store a negative value in the 8 bits we have for P01, which we can’t.

This becomes less of a problem when using 16 bit floats per color channel, and is basically solved when using 32 bit floats per color channels, but that makes the technique hungrier for storage and less efficient again.

While that limits the usefulness of this extension, there are situations where this would still be appropriate – like if you already have your data stored in 16 or 32 bit color channels like some data (eg position data) would require..

The extension goes further, into 3d textures and beyond though, so let’s explore a little bit more.

# 6. 3D Texture / Cubic Piecewise Curves

The original technique talks about how to use a 2x2x2 3d volume texture to store a cubic Bezier curve (per color channel) and to retrieve it by doing a trilinear interpolated texture read.

If you have four control points A,B,C,D then the first slice of the volume texture will be a 2d texture storing the quadratic Bezier curve defined by A,B,C and the second slice will store B,C,D. You still sample along the diagonal of the texture but this time it’s a 3d diagonal instead of 2d. Here is that setup, where the texture is sampled along the diagonal from from A to D:

$A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3$

Let’s look at what this extension means for 3d textures / cubic curves.

The equation for a cubic Bezier curve looks like this:

$As^3 + 3Bs^2t + 3Cst^2 + Dt^3$

If we derive that from trilinear interpolation between 8 points A,B,C,D,E,F,G,H, the second to last step would look like this:

$As^3 + (B+C+E)s^2t + (D+F+G)st^2 + Ht^3$

So, similar to our 2d setup, we have some freedom about our values.

In the original technique, B,C,E would have to be equal to the second control point, and D,F,G would have to be equal to the third control point. With the new extension, in both cases, those groups of values only have to AVERAGE to their specific control points. Once again, this gives us some freedoms for the values we can use, and lets us use our pixels more efficiently.

Here is the setup, again using texture coordinates (in the form Pzyx) for the pixels instead of letters.

$P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3$

here’s how the equations look in matrix form, which also happens to already be in rref:

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

P010 and P100 are free variables and so are P101 and P110, making a total of four free variables. They can be set to any value desired, which will then define the value that P001 and P011 need to be.

Let’s add another piecewise cubic Bezier curve, and another row of pixels to the texture to see what happens.

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{010} = C_{4}\\ P_{011} + P_{020} + P_{110} = 3C_{5}\\ P_{021} + P_{111} + P_{120} = 3C_{6}\\ P_{121} = C_{7}\\$

Here are the equations in matrix form:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Here it is in rref:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Putting that back into equations we have this:

$P_{000} = C_{0}\\ P_{001} + P_{100} = 3C_{1} + -C_{4}\\ P_{010} = C_{4}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{020} + -P_{101} = -3C_{2} + 3C_{5}\\ P_{021} + P_{120} = -C_{3} + 3C_{6}\\ P_{111} = C_{3}\\ P_{121} = C_{7}\\$

The result is that we still have four free variables: P100, P101, P110 and P120. When we give values to those pixels, we will then be able to calculate the values for P001, P011, P020 and P021.

There is a limit to this pattern though. Where the maximum number of curves to follow the pattern was 2 with the 2d / quadratic case, the maximum number of curves to follow this pattern with the 3d / cubic case is 3. As soon as you try to put 4 curves in this pattern it fails by having constraints. Interestingly, we still have 4 free variables when putting 3 curves in there, so it doesn’t follow the 2d case where free variables disappeared as we put more curves in, indicating when the failure would happen.

If you know how to more formally analyze when these patterns of equations will fail, please let me know!

# 7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?

Looking at the 3d texture case of 2x2x2 storing a single curve, I saw that there were 4 free variables. Since it takes 4 control points to define a cubic curve, I wondered if we could use those 4 free variables to encode another cubic curve.

Here’s a setup where the x axis is flipped for the second curve. It’s a little bit hard to tell from the diagram, but the blue line does still go through the center of the 3d cube. It goes from P001 to P110, while the first curve still goes from P000 to P111.

Here’s what the equations look like:

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3*C_{1}\\ P_{011} + P_{101} + P_{110} = 3*C_{2}\\ P_{111} = C_{3}\\ P_{001} = C_{4}\\ P_{000} + P_{011} + P_{101} = 3*C_{5}\\ P_{010} + P_{100} + P_{111} = 3*C_{6}\\ P_{110} = C_{7}\\$

And in matrix form:

$\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

After putting the matrix in rref to solve the equations, we get this matrix:

$\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 3 & 0 & 0 & -3 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & \frac{1}{3} & \frac{-1}{3} & 0 & -1 & 0 \\ \end{array}\right]$

Which is this set of equations:

$P_{000} = -3C_{2} + 3C_{5} + C_{7}\\ P_{001} = C_{4}\\ P_{010} + P_{100} = -C_{3} + 3C_{6}\\ P_{011} + P_{101} = 3C_{2} + -C_{7}\\ P_{110} = C_{7}\\ P_{111} = C_{3}\\ 0 = C_{0} + 3C_{2} - 3C_{5} - C_{7}\\ 0 = C_{1} + C_{3}/3 - C_{4}/3 + -C_{6}\\$

In the end there are 2 free variables, but also 2 constraints on the values that the control points can take. The constraints mean it doesn’t work which is unfortunate. That would have been a nice way to bring the 3d / cubic case to using 1 pixel per control point!

I also tried other variations like flipping y or z along with x (flipping all three just makes the first curve in the reverse direction!) but couldn’t find anything that worked. Too bad!

# 8. 3D Texture / Cubic Piecewise Curves – C0 Continuity

Since the regular 3d texture / cubic curve pattern has a limit (3 curves), let’s look at the C0 continuity version like we did for the 2d texture / quadratic case where we sample zig zag style.

Since the sampling has to pass through the center of the cube, we need to flip both x and z each curve.

That gives us a setup like this:

Here are the constraints for the pixel values:

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{020} = C_{6}\\$

Which looks like this in matrix form:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Here is the matrix solved in rref:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & -1 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & 0 & 0 & -3 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & -3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ \end{array}\right]$

And here is that matrix put back into equations form:

$P_{000} = C_{0}\\ P_{001} - P_{021} + P_{100} - P_{120} = 3C_{1} - 3C_{5}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{020} = C_{6}\\ P_{101} - P_{121} = 3C_{2} - 3C_{4}\\ P_{111} = C_{3}\\$

It worked! It also has 5 free variables.

This pattern works for as many curves as i tried (21 of them), and each time you add another curve / row of this pattern you gain another free variable.

So, storing 2 curves results in 6 free variables, 3 curves has 7 free variables, 4 curves has 8 free variables and so on.

# 9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency

Let’s compare storage efficiency of these 3d texture / cubic curve techniques like we did for the 2d texture / quadratic curve techniques.

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 8 & 1.50 & 6.00 \\ 3 & 2x4x2 & 16 & 12 & 1.33 & 5.33 \\ 4 & 2x6x2 & 24 & 16 & 1.50 & 6.00 \\ 5 & 2x7x2 & 28 & 20 & 1.40 & 5.60 \\ 6 & 2x8x2 & 32 & 24 & 1.33 & 5.33 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 7 & 1.71 & 6.00 \\ 3 & 2x4x2 & 16 & 10 & 1.60 & 5.33 \\ 4 & 2x5x2 & 20 & 13 & 1.54 & 5.00 \\ 5 & 2x6x2 & 24 & 16 & 1.50 & 4.80 \\ 6 & 2x7x2 & 28 & 19 & 1.47 & 4.67 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x4x2 & 16 & 8 & 2.00 & 8.00 \\ 3 & 2x6x2 & 24 & 12 & 2.00 & 8.00 \\ 4 & 2x8x2 & 32 & 16 & 2.00 & 8.00 \\ 5 & 2x10x2 & 40 & 20 & 2.00 & 8.00 \\ 6 & 2x12x2 & 48 & 24 & 2.00 & 8.00 \\ \hline \end{array}$

The original technique had a constant 2 pixels per control point and 8 pixels per cubic curve.

The basic extension lets you bring that down to 1.33 pixels per control point, and 5.33 pixels per curve.

If C0 continuity is desired, as you store more and more curves the extension can bring things down towards 1.33 pixels per control point, and 4 pixels per curve. (Remember that with the C0 extension you have 4 control points for the first curve and then 3 more for each subsequent curve, so that 1.33 pixels per control point isn’t exactly an apples to apples comparison vs the basic extension).

The pattern continues for 4D textures and higher (for higher than cubic curves too!), but working through the 2d and 3d cases for quadratic / cubic curves is the most likely usage case both because 4d textures and higher are kind of excessive (probably you’d need to do multiple texture reads to simulate them), but also when fitting curves to data, quadratic and cubic curves tend to do well in that they don’t usually overfit the data or have as many problems with ringing.

Despite that, I do think it’s useful to look at it from an N dimensional point of view to see the larger picture, so let’s do that next.

# 10. Generalizing The Unit Hyper Cube

Let’s ignore the zig zag sampling pattern and storing multiple curves in a texture and just get back to the basic idea.

Given an N dimensional texture that is 2x2x…x2 that you are going to sample across the diagonal to get a degree N Bezier curve from, how do you know what values to put in which control points to use this technique?

You could derive it from N-linear interpolation, but that is a lot of work.

The good news is it turns out there is a simple pattern, that is also pretty interesting.

Let’s check out the 1d, 2d and 3d cases to see what patterns we might be able to see.

1d Texture / linear Bezier / linear interpolation:

$P_{0} = C_0 \\ P_{1} = C_1 \\$

$\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]$

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\$

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

3d Texture / Cubic Bezier:

$P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3$

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

The first pattern you might see is that the right side of the equations for an N dimensional hypercube is the identity matrix, but instead of using 1 for the value along the diagonal, it uses values from Pascal’s Triangle (binomial coefficients).

To simplify this a bit though, we could also notice that the number on the right side of the equation equals the sum of the numbers on the left side of the equation. Mathematically it would be the same to say that the numbers on the left side of the equation have to sum up to 1. This would make the matrix on the right just be the identity matrix and we can forget about Pascal’s triangle numbers (they will show up implicitly as divisors of the left side equation coefficients but there’s no need to explicitly calculate them).

But then we are still left with the matrix on the left. How do we know which pixels belong in which rows?

It turns out there is another interesting pattern here. In all the matrices above it follows this pattern:

• Row 0 has a “1” wherever the pixel coordinate has 0 ones set
• Row 1 has a “1” wherever the pixel coordinate has 1 ones set
• Row 2 has a “1” wherever the pixel coordinate has 2 ones set
• Row 3 has a “1” wherever the pixel coordinate has 3 ones set
• ….

That pattern continues indefinitely, but don’t forget that the numbers (coefficients) on the left side of the equation must add up to one.

Here is the matrix form of 1d / linear, 2d / quadratic, and 3d / cubic again with the right matrix being the identity matrix, and the equations below them. Notice the pattern about counts of one bits in each row!

1D:

$\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]$

$P_0 = C_0 \\ P_1 = C_1 \\$

2D:

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

$P_{00} = C_0 \\ P_{01}/2 + P_{10}/2 = C_1 \\ P_{11} = C_2 \\$

3D:

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3} & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

$P_{000} = C_0 \\ P_{001}/3 + P_{010}/3 + P_{100}/3 = C_1 \\ P_{011}/3 + P_{101}/3 + P_{110}/3 = C_2 \\ P_{111} = C_3 \\$

Here are the formulas for linear, quadratic and cubic Bezier curves to show a different way of looking at this. Below each is the same formula but with the 1d, 2d and 3d pixels in the formula instead of the control points, using the formulas above which relate pixel values to control point values. Note that I have replaced (1-t) with s for easier reading.

$f(t) = As + Bt \\ \\ f(t) = P_0s + P_1t\\$

$f(t) = As^2 + 2Bst + Ct^2 \\ \\ f(t) = P_{00}s^2 + (P_{01}+P_{10})st + P_{11}t^2 \\$

$f(t) = As^3 + 3Bs^2t + 3Cst^2 + Dt^3 \\ f(t) = P_{000}s^3 + (P_{001}+P_{010}+P_{100})s^2t + (P_{011}+P_{101}+P_{110})st^2 + P_{111}t^3$

I think it’s really interesting how in the last equation as an example, “3B” literally becomes 3 values which could have the value of B. In the plain vanilla technique they did have the value of B. In this extension, the only requirement is that they average to B.

It’s also interesting to notice that if you have an N bit number and you count how many permutations have each possible number of bits turned on, the resulting counts is the Pascal’s triangle row. That is nothing new, but it seems like there might be a fun way to convert a set of random numbers (white noise) into a Gaussian distribution, just by counting how many one bits there were in each number. That isn’t new either, and there are better algorithms, but still I think it’s an interesting idea, and may be useful in a pinch since it seems pretty computationally inexpensive.

# 11. Closing

This extension makes storage efficiency a bit better than the plain vanilla technique, especially if you are interested in C0 continuous curves.

The extension does come at a price though, as you may find yourself in a situation where you need to store a value that is outside of the possible values for common data formats to store (such as needing to store a negative number or a larger than 255 number in a uint8).

Even so, if these three criteria are met:

1. You are already storing data in textures. (Counter point: compute is usually preferred over texture lookup these days)
2. You are relying on the texture interpolator to interpolate values between data points. (Counter point: if you don’t want the interpolation, use a buffer instead so you fit more of the data you actually care about in the cache)
3. You are storing data in 16 or 32 bit real numbers. (Counter point: uint8 is half as large as 16 bit and a quarter as large as 32 bit already)

Then this may be an attractive solution for you, even over the plain vanilla technique.

For future work, I think it would be interesting to see how this line of thinking applies to surfaces.

I also think there is probably some fertile ground looking into what happens when sampling off of the diagonal of the textures. Intuitively it seems you might be able to store some special case higher order curves in lower dimension / storage textures, but I haven’t looked into it yet.

A common usage case when encoding data in a texture would probably include putting curves side by side on the x axis of the texture. It could be interesting to look into whether curves need to be completely separate from each other horizontally (aka 2 pixel of width for each track of curves in the texture), or if you could perhaps fit two curves side by side in a 3 pixel width, or any similar ideas.

Lastly, when looking at these groups of points on these N dimensional hyper-cubes, I can’t help but wonder what kinds of shapes they are. Are they simplices? If so, is there a pattern to the dimensions they are of?

It’s a bit hard to visualize, but taking a look at the first few rows of pascal’s triangle / hyper cubes here’s what I found:

• Dimension 1 (line) : Row 2 = 1,1. Those are both points, so are simplices of 0 dimension.
• Dimension 2 (square) : Row 3 = 1,2,1. The 1’s are points, the 2 is a line, so are simplices of dimension 0, 1, 0.
• Dimension 3 (cube) : Row 4 = 1,3,3,1. The 1’s are still points. The 3’s are in fact triangles, I checked. So, simplices of dimension 0, 2, 2, 0.
• Dimension 4 (hypercube) : Row 5 = 1,4,6,4,1. The 1’s are points. The 4’s are tetrahedrons. The 6 is a 3 dimensional object. I’m not sure it’s shape but that makes it not be a simplex. Possibly it’s two simplices fused together some how. I don’t really know. So, the dimensions anyways are: 0, 3, 3, 3, 0.
• Beyond? That’s as far as I looked. If you look further / deeper and find anything interesting please share!

Update: PBS Infinite Series ended up posting a video on the topic of hypercube slices and pascal’s triangle (seriously!). Give it a watch if you are interested in how these things relate: Dissecting Hypercubes with Pascal’s Triangle | Infinite Series

If you have a usage case for this or any of the related techniques, I’d love to hear about them.

# 12. Code

It’s easy to talk about things and claim that everything is correct, when in fact, the moment you try it, everything falls apart.

I made up some simple standalone c++ code that goes through the cases we talked about, doing the math we did, and also verifying that the texture interpolation is equivalent to actually calculating Bezier curves (using Bernstein polynomials).

You can also use this code as a starting point to explore higher curve counts or other storage patterns. It uses only standard includes and no libraries, so it should be real easy to drop this into a compiler and start experimenting.

Here’s some example output, which shows 6 cubic curves stored in a 3d texture using the zig zag sampling pattern.

Here’s the code:

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <stdlib.h>
#include <array>
#include <algorithm>
#include <unordered_set>
#include <random>
#include <vector>

#define SHOW_MATHJAX_MATRIX() 0
#define SHOW_MATHJAX_EQUATIONS() 0
#define SHOW_EQUATIONS_BEFORE_SOLVE() 0
#define EQUALITY_TEST_SAMPLES 1000

typedef int32_t TINT;

TINT CalculateGCD (TINT smaller, TINT larger);
TINT CalculateLCM (TINT smaller, TINT larger);

// A rational number, to handle fractional numbers without typical floating point issues
struct CRationalNumber
{
CRationalNumber (TINT numerator = 0, TINT denominator = 1)
: m_numerator(numerator)
, m_denominator(denominator)
{ }

TINT m_numerator;
TINT m_denominator;

CRationalNumber Reciprocal () const
{
return CRationalNumber(m_denominator, m_numerator);
}

void Reduce ()
{
if (m_numerator != 0 && m_denominator != 0)
{
TINT div = CalculateGCD(m_numerator, m_denominator);
m_numerator /= div;
m_denominator /= div;
}

if (m_denominator < 0)
{
m_numerator *= -1;
m_denominator *= -1;
}

if (m_numerator == 0)
m_denominator = 1;
}

bool IsZero () const
{
return m_numerator == 0 && m_denominator != 0;
}

// NOTE: the functions below assume Reduce() has happened
bool IsOne () const
{
return m_numerator == 1 && m_denominator == 1;
}

bool IsMinusOne () const
{
return m_numerator == -1 && m_denominator == 1;
}

bool IsWholeNumber () const
{
return m_denominator == 1;
}
};

// Define a vector as an array of rational numbers
template<size_t N>
using TVector = std::array<CRationalNumber, N>;

// Define a matrix as an array of vectors
template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

//===================================================================================================================================
//                                              GCD / LCM
//===================================================================================================================================

// from my blog post: https://blog.demofox.org/2015/01/24/programmatically-calculating-gcd-and-lcm/

TINT CalculateGCD (TINT smaller, TINT larger)
{
// make sure A <= B before starting
if (larger < smaller)
std::swap(smaller, larger);

// loop
while (1)
{
// if the remainder of larger / smaller is 0, they are the same
// so return smaller as the GCD
TINT remainder = larger % smaller;
if (remainder == 0)
return smaller;

// otherwise, the new larger number is the old smaller number, and
// the new smaller number is the remainder
larger = smaller;
smaller = remainder;
}
}

TINT CalculateLCM (TINT A, TINT B)
{
// LCM(A,B) = (A/GCD(A,B))*B
return (A / CalculateGCD(A, B))*B;
}

//===================================================================================================================================
//                                              RATIONAL NUMBER MATH
//===================================================================================================================================

void CommonDenominators (CRationalNumber& a, CRationalNumber& b)
{
TINT lcm = CalculateLCM(a.m_denominator, b.m_denominator);

a.m_numerator *= lcm / a.m_denominator;
b.m_numerator *= lcm / b.m_denominator;

a.m_denominator = lcm;
b.m_denominator = lcm;
}

bool operator == (const CRationalNumber& a, const CRationalNumber& b)
{
CRationalNumber _a(a), _b(b);
CommonDenominators(_a, _b);
return _a.m_numerator == _b.m_numerator;
}

void operator *= (CRationalNumber& a, const CRationalNumber& b)
{
a.m_numerator *= b.m_numerator;
a.m_denominator *= b.m_denominator;
}

CRationalNumber operator * (const CRationalNumber& a, const CRationalNumber& b)
{
return CRationalNumber(a.m_numerator * b.m_numerator, a.m_denominator * b.m_denominator);
}

void operator -= (CRationalNumber& a, const CRationalNumber& b)
{
CRationalNumber _b(b);
CommonDenominators(a, _b);
a.m_numerator -= _b.m_numerator;
}

//===================================================================================================================================
//                                              GAUSS-JORDAN ELIMINATION CODE
//===================================================================================================================================

// From my blog post: https://blog.demofox.org/2017/04/10/solving-n-equations-and-n-unknowns-the-fine-print-gauss-jordan-elimination/

// Make a specific row have a 1 in the colIndex, and make all other rows have 0 there
template <size_t M, size_t N>
bool MakeRowClaimVariable (TMatrix<M, N>& matrix, size_t rowIndex, size_t colIndex)
{
// Find a row that has a non zero value in this column and swap it with this row
{
// Find a row that has a non zero value
size_t nonZeroRowIndex = rowIndex;
while (nonZeroRowIndex < M && matrix[nonZeroRowIndex][colIndex].IsZero())
++nonZeroRowIndex;

// If there isn't one, nothing to do
if (nonZeroRowIndex == M)
return false;

// Otherwise, swap the row
if (rowIndex != nonZeroRowIndex)
std::swap(matrix[rowIndex], matrix[nonZeroRowIndex]);
}

// Scale this row so that it has a leading one
CRationalNumber scale = matrix[rowIndex][colIndex].Reciprocal();
for (size_t normalizeColIndex = colIndex; normalizeColIndex < N; ++normalizeColIndex)
{
matrix[rowIndex][normalizeColIndex] *= scale;
matrix[rowIndex][normalizeColIndex].Reduce();
}

// Make sure all rows except this one have a zero in this column.
// Do this by subtracting this row from other rows, multiplied by a multiple that makes the column disappear.
for (size_t eliminateRowIndex = 0; eliminateRowIndex < M; ++eliminateRowIndex)
{
if (eliminateRowIndex == rowIndex)
continue;

CRationalNumber scale = matrix[eliminateRowIndex][colIndex];
for (size_t eliminateColIndex = 0; eliminateColIndex < N; ++eliminateColIndex)
{
matrix[eliminateRowIndex][eliminateColIndex] -= matrix[rowIndex][eliminateColIndex] * scale;
matrix[eliminateRowIndex][eliminateColIndex].Reduce();
}
}

return true;
}

// make matrix into reduced row echelon form
template <size_t M, size_t N>
void GaussJordanElimination (TMatrix<M, N>& matrix)
{
size_t rowIndex = 0;
for (size_t colIndex = 0; colIndex < N; ++colIndex)
{
if (MakeRowClaimVariable(matrix, rowIndex, colIndex))
{
++rowIndex;
if (rowIndex == M)
return;
}
}
}

//===================================================================================================================================
//                                                           Shared Testing Code
//===================================================================================================================================

template <size_t M, size_t N, typename LAMBDA>
void PrintEquations (
TMatrix<M, N>& augmentedMatrix,
size_t numPixels,
LAMBDA& pixelIndexToCoordinates
)
{
char pixelCoords[10];

#if SHOW_MATHJAX_MATRIX()
// print the matrix opening stuff
printf("\left[\begin{array}{");
for (size_t i = 0; i < N; ++i)
{
if (i == numPixels)
printf("|");
printf("r");
}
printf("}n");
for (size_t i = 0; i < numPixels; ++i)
{
pixelIndexToCoordinates(i, pixelCoords);
if (i == 0)
printf("P_{%s}", pixelCoords);
else
printf(" & P_{%s}", pixelCoords);
}
for (size_t i = numPixels; i < N; ++i)
{
printf(" & C_{%zu}", i-numPixels);
}
printf(" \\n");

// Print the matrix
for (const TVector<N>& row : augmentedMatrix)
{
bool first = true;
for (const CRationalNumber& n : row)
{
if (first)
first = false;
else
printf(" & ");

if (n.IsWholeNumber())
printf("%i", n.m_numerator);
else
printf("\frac{%i}{%i}", n.m_numerator, n.m_denominator);
}
printf(" \\n");
}

// print the matrix closing stuff
printf("\end{array}\right]n");
#endif

// print equations
for (const TVector<N>& row : augmentedMatrix)
{
// indent
#if SHOW_MATHJAX_EQUATIONS() == 0
printf("    ");
#endif

// left side of the equation
bool leftHasATerm = false;
for (size_t i = 0; i < numPixels; ++i)
{
if (!row[i].IsZero())
{
if (leftHasATerm)
printf(" + ");
pixelIndexToCoordinates(i, pixelCoords);

#if SHOW_MATHJAX_EQUATIONS()
if (row[i].IsOne())
printf("P_{%s}", pixelCoords);
else if (row[i].IsMinusOne())
printf("-P_{%s}", pixelCoords);
else if (row[i].IsWholeNumber())
printf("%iP_{%s}", row[i].m_numerator, pixelCoords);
else if (row[i].m_numerator == 1)
printf("P_{%s}/%i", pixelCoords, row[i].m_denominator);
else
printf("P_{%s} * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
#else
if (row[i].IsOne())
printf("P%s", pixelCoords);
else if (row[i].IsMinusOne())
printf("-P%s", pixelCoords);
else if (row[i].IsWholeNumber())
printf("%iP%s", row[i].m_numerator, pixelCoords);
else if (row[i].m_numerator == 1)
printf("P%s/%i", pixelCoords, row[i].m_denominator);
else
printf("P%s * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
#endif
leftHasATerm = true;
}
}
if (!leftHasATerm)
printf("0 = ");
else
printf(" = ");

// right side of the equation
bool rightHasATerm = false;
for (size_t i = numPixels; i < N; ++i)
{
if (!row[i].IsZero())
{
if (rightHasATerm)
printf(" + ");

#if SHOW_MATHJAX_EQUATIONS()
if (row[i].IsOne())
printf("C_{%zu}", i - numPixels);
else if (row[i].IsMinusOne())
printf("-C_{%zu}", i - numPixels);
else if (row[i].IsWholeNumber())
printf("%iC_{%zu}", row[i].m_numerator, i - numPixels);
else if (row[i].m_numerator == 1)
printf("C_{%zu}/%i", i - numPixels, row[i].m_denominator);
else
printf("C_{%zu} * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
#else
if (row[i].IsOne())
printf("C%zu", i - numPixels);
else if (row[i].IsMinusOne())
printf("-C%zu", i - numPixels);
else if (row[i].IsWholeNumber())
printf("%iC%zu", row[i].m_numerator, i - numPixels);
else if (row[i].m_numerator == 1)
printf("C%zu/%i", i - numPixels, row[i].m_denominator);
else
printf("C%zu * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
#endif
rightHasATerm = true;
}
}

#if SHOW_MATHJAX_EQUATIONS()
printf("\\n");
#else
printf("n");
#endif
}
}

template <size_t M, size_t N, typename LAMBDA>
bool SolveMatrixAndPrintEquations (
TMatrix<M, N>& augmentedMatrix,
size_t numPixels,
std::unordered_set<size_t>& freeVariables,
LAMBDA& pixelIndexToCoordinates
)
{
#if SHOW_EQUATIONS_BEFORE_SOLVE()
printf("   Initial Equations:n");
PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);
printf("   Solved Equations:n");
#endif

// put augmented matrix into rref
GaussJordanElimination(augmentedMatrix);

// Print equations
PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);

// Get free variables and check for control point constraint
bool constraintFound = false;
for (const TVector<N>& row : augmentedMatrix)
{
bool leftHasATerm = false;
for (size_t i = 0; i < numPixels; ++i)
{
if (!row[i].IsZero())
{
if (leftHasATerm)
freeVariables.insert(i);
else
leftHasATerm = true;
}
}

bool rightHasATerm = false;
for (size_t i = numPixels; i < N; ++i)
{
if (!row[i].IsZero())
rightHasATerm = true;
}

if (!leftHasATerm && rightHasATerm)
constraintFound = true;
}

printf("  %zu free variables.n", freeVariables.size());

if (constraintFound)
{
printf("  Constraint Found.  This configuration doesn't work for the general case!nn");
return false;
}

return true;
}

float lerp (float t, float a, float b)
{
return a * (1.0f - t) + b * t;
}

template <size_t NUMPIXELS, size_t NUMCONTROLPOINTS, size_t NUMEQUATIONS>
void FillInPixelsAndControlPoints (
std::array<float, NUMPIXELS>& pixels,
std::array<float, NUMCONTROLPOINTS>& controlPoints,
const TMatrix<NUMEQUATIONS, NUMPIXELS+ NUMCONTROLPOINTS>& augmentedMatrix,
const std::unordered_set<size_t>& freeVariables)
{
// come up with random values for the control points and free variable pixels
static std::random_device rd;
static std::mt19937 mt(rd());
static std::uniform_real_distribution<float> dist(-10.0f, 10.0f);
for (float& cp : controlPoints)
cp = dist(mt);
for (size_t var : freeVariables)
pixels[var] = dist(mt);

// fill in the non free variable pixels per the equations
for (const TVector<NUMPIXELS + NUMCONTROLPOINTS>& row : augmentedMatrix)
{
// the first non zero value is the non free pixel we need to set.
// all other non zero values are free variables that we previously calculated values for
bool foundPixel = false;
size_t pixelIndex = 0;
for (size_t i = 0; i < NUMPIXELS; ++i)
{
if (!row[i].IsZero())
{
// we are setting the first pixel we find
if (!foundPixel)
{
pixelIndex = i;
foundPixel = true;
}
// subtract out all free variables which is the same as moving them to the right side of the equation
else
{
pixels[pixelIndex] -= pixels[i] * float(row[i].m_numerator) / float(row[i].m_denominator);
}
}
}

// if there is no pixel value to set on the left side of the equation, ignore this row
if (!foundPixel)
continue;

// add in the values from the right side of the equation
for (size_t i = NUMPIXELS; i < NUMPIXELS + NUMCONTROLPOINTS; ++i)
{
if (!row[i].IsZero())
pixels[pixelIndex] += controlPoints[i - NUMPIXELS] * float(row[i].m_numerator) / float(row[i].m_denominator);
}
}
}

size_t TextureCoordinateToPixelIndex2d (size_t width, size_t height, size_t y, size_t x)
{
return y * width + x;
};

void PixelIndexToTextureCoordinate2d (size_t width, size_t height, size_t pixelIndex, size_t& y, size_t& x)
{
x = pixelIndex % width;
y = pixelIndex / width;
}

size_t TextureCoordinateToPixelIndex3d (size_t width, size_t height, size_t depth, size_t z, size_t y, size_t x)
{
return
z * width * height +
y * width +
x;
};

void PixelIndexToTextureCoordinate3d (size_t width, size_t height, size_t depth, size_t pixelIndex, size_t& z, size_t& y, size_t& x)
{
x = pixelIndex % width;

pixelIndex = pixelIndex / width;

y = pixelIndex % height;

pixelIndex = pixelIndex / height;

z = pixelIndex;
}

void PiecewiseCurveTime (float time, size_t numCurves, size_t& outCurveIndex, float& outTime)
{
time *= float(numCurves);
outCurveIndex = size_t(time);

if (outCurveIndex == numCurves)
{
outCurveIndex = numCurves - 1;
outTime = 1.0f;
}
else
{
outTime = std::fmodf(time, 1.0f);
}
}

//===================================================================================================================================
//                                                       2D Textures / Quadratic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For each additional curve, add two points to the end like this:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 = C3                        1
//  P11 + P20 = 2 * C4              3 4
//  P21 = C5                        5
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * 3 control points per curve.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * there are 3 equations per curve, so 3 rows in the augmented matrix per curve.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadratic (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = N / 3;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 3;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s +
coefficients[offset + 1] * s * t * 2.0f +
coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadratic (float totalTime, const std::array<float, N>& pixels)
{
const size_t c_numCurves = (N / 2) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

float row0 = lerp(t, pixels[startRow * 2], pixels[startRow * 2 + 1]);
float row1 = lerp(t, pixels[(startRow + 1) * 2], pixels[(startRow + 1) * 2 + 1]);
return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_numPixels = c_imageWidth * c_imageHeight;
const size_t c_numControlPoints = NUMCURVES * 3;
const size_t c_numEquations = NUMCURVES * 3;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&](size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
};
auto pixelIndexToCoordinates = [&](size_t pixelIndex, char pixelCoords[10])
{
size_t y, x;
PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
sprintf(pixelCoords, "%zu%zu", y, x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this yx coordinate pattern:
//   00
//   01 10
//   11
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 3;
switch (i % 3)
{
case 0:
{
row[TextureCoordinateToPixelIndex(curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(curveIndex + 0, 1)] = CRationalNumber(1, 2);
row[TextureCoordinateToPixelIndex(curveIndex + 1, 0)] = CRationalNumber(1, 2);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

{
printf("Testing 2D Textures / Quadratic Curvesnn");

system("pause");
}

//===================================================================================================================================
//                                    2D Textures / Quadratic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For second curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 + P21 = 2 * C3              2 5
//  P20 = C4                        4
//
//  --- For third curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//  P30 P31
//
//  P00 = C0
//  P01 + P10 = 2 * C1
//  P11 = C2
//
//  P10 + P21 = 2 * C3
//  P20 = C4
//
//  P21 + P30 = 2 * C5
//  P31 = C6
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 1 + NumCurves*2.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + NumCurves*2.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadraticC0 (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = (N - 1) / 2;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 2;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s +
coefficients[offset + 1] * s * t * 2.0f +
coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadraticC0 (float totalTime, const std::array<float, N>& pixels)
{
const size_t c_numCurves = (N / 2) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

// Note we flip x axis direction every odd row to get the zig zag
float horizT = (startRow % 2) == 0 ? t : 1.0f - t;

float row0 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
++startRow;
float row1 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_numPixels = c_imageWidth * c_imageHeight;
const size_t c_numControlPoints = 1 + NUMCURVES * 2;
const size_t c_numEquations = 1 + NUMCURVES * 2;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t y, x;
PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
sprintf(pixelCoords, "%zu%zu", y, x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation has a pattern like this:
//   00
//   01 10
//
// But, pattern index is added to the y index.
// Also, the x coordinates flip from 0 to 1 on those after each pattern.
// Also, left side coefficients must add up to 1.

size_t patternIndex = i / 2;
size_t xoff = patternIndex % 2 == 1;
size_t xon = patternIndex % 2 == 0;
switch (i % 2)
{
case 0:
{
row[TextureCoordinateToPixelIndex(patternIndex + 0, xoff)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(patternIndex + 0, xon)] = CRationalNumber(1, 2);
row[TextureCoordinateToPixelIndex(patternIndex + 1, xoff)] = CRationalNumber(1, 2);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

{
printf("nTesting 2D Textures / Quadratic Curves with C0 continuitynn");

system("pause");
}

//===================================================================================================================================
//                                             3D Textures / Cubic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 5 6
//  P111 = C3                       7
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 7 8
//  P111 = C3                       9
//
//  P010 = C4                       2
//  P011 + P020 + P110 = 3 * C5     3 4 8
//  P021 + P111 + P120 = 3 * C6     5 9 10
//  P121 = C7                       11
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubic (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = N / 4;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 4;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s * s +
coefficients[offset + 1] * s * s * t * 3.0f +
coefficients[offset + 2] * s * t * t * 3.0f +
coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubic (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
const size_t c_numCurves = (N / 4) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

//    rowZYX
float row00x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
float row01x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
float row0yx = lerp(t, row00x, row01x);

float row10x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
float row11x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
float row1yx = lerp(t, row10x, row11x);

return lerp(t, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubic ()
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = NUMCURVES * 4;
const size_t c_numEquations = NUMCURVES * 4;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this zyx coordinate pattern:
//   000
//   001 010 100
//   011 101 110
//   111
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

float value1 = EvaluateBernsteinPolynomial3DCubic(t, controlPoints);
float value2 = EvaluateLinearInterpolation3DCubic(t, pixels, TextureCoordinateToPixelIndex);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubics ()
{
printf("nTesting 3D Textures / Cubic Curvesnn");

Test3DCubic<1>();
Test3DCubic<2>();
Test3DCubic<3>();
Test3DCubic<4>();

system("pause");
}

//===================================================================================================================================
//                                         3D Textures / Cubic Curves Multiple Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
// This is the same as 3D Textures / Cubic Curves, but there is a second curve stored by flipping x coordinates.
//
//  --- Other details:
//
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + (NumCurves/2).
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t HALFNUMCURVES>
void Test3DCubicMulti ()
{
const size_t NUMCURVES = HALFNUMCURVES * 2;
const size_t c_imageWidth = 2;
const size_t c_imageHeight = HALFNUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = NUMCURVES * 4;
const size_t c_numEquations = NUMCURVES * 4;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the first set of equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations / 2; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this zyx coordinate pattern:
//   000
//   001 010 100
//   011 101 110
//   111
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// create the second set of equations
for (size_t i = 0; i < c_numEquations / 2; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i + c_numEquations / 2];

// left side of the equation goes in this zyx coordinate pattern, which is the same as above but x axis flipped.
//   001
//   000 011 101
//   010 100 111
//   110
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i + c_numEquations / 2] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates);
}

void Test3DCubicsMulti ()
{
printf("nTesting 3D Textures / Cubic Curves with Multiple Curvesnn");

Test3DCubicMulti<1>();

system("pause");
}

//===================================================================================================================================
//                                       3D Textures / Cubic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  P011 + P110 + P121 = 3 * C4
//  P010 + P021 + P110 = 3 * C5
//  P020 = C6
//
//  --- For third curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//  P030 P031    P130 P131
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  P011 + P110 + P121 = 3 * C4
//  P010 + P021 + P110 = 3 * C5
//  P020 = C6
//
//  P021 + P030 + P120 = 3 * C7
//  P031 + P121 + P130 = 3 * C8
//  P131 = C9
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 1 + 3 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + 3 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubicC0 (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = (N-1) / 3;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 3;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s * s +
coefficients[offset + 1] * s * s * t * 3.0f +
coefficients[offset + 2] * s * t * t * 3.0f +
coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubicC0 (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
const size_t c_numCurves = (N / 4) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

// Note we flip x and z axis direction every odd row to get the zig zag

//    rowZYX
float xzT = (startRow % 2) == 0 ? t : 1.0f - t;
float row00x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
float row01x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
float row0yx = lerp(t, row00x, row01x);

float row10x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
float row11x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
float row1yx = lerp(t, row10x, row11x);

return lerp(xzT, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubicC0 ()
{

const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = 1 + NUMCURVES * 3;
const size_t c_numEquations = 1 + NUMCURVES * 3;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation has a pattern like this:
//   000
//   001 010 100
//   011 101 110
//
// But, pattern index is added to the y index.
// Also, the x and z coordinates flip from 0 to 1 on those after each pattern.
// Also, left side coefficients must add up to 1.
size_t patternIndex = i / 3;
size_t xz0 = patternIndex % 2 == 1;
size_t xz1 = patternIndex % 2 == 0;
switch (i % 3)
{
case 0:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

float value1 = EvaluateBernsteinPolynomial3DCubicC0(t, controlPoints);
float value2 = EvaluateLinearInterpolation3DCubicC0(t, pixels, TextureCoordinateToPixelIndex);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubicsC0 ()
{

printf("nTesting 3D Textures / Cubic Curves with C0 continuitynn");

Test3DCubicC0<1>();
Test3DCubicC0<2>();
Test3DCubicC0<3>();
Test3DCubicC0<4>();
Test3DCubicC0<5>();
Test3DCubicC0<6>();

system("pause");
}

//===================================================================================================================================
//                                                                 main
//===================================================================================================================================

int main (int agrc, char **argv)
{
Test3DCubics();
Test3DCubicsMulti();
Test3DCubicsC0();

return 0;
}


# Solving N equations and N unknowns: The Fine Print (Gauss Jordan Elimination)

In basic algebra we were taught that if we have three unknowns (variables), it takes three equations to solve for them.

There’s some fine print though that isn’t talked about until quite a bit later.

Let’s have a look at three unknowns in two equations:

$A + B + C = 2 \\ B = 5$

If we just need a third equation to solve this, why not just modify the second equation to make a third?

$-B = -5$

That obviously doesn’t work, because it doesn’t add any new information! If you try it out, you’ll find that adding that equation doesn’t get you any closer to solving for the variables.

So, it takes three equations to solve for three unknowns, but the three equations have to provide unique, meaningful information. That is the fine print.

How can we know if an equation provides unique, meaningful information though?

It turns out that linear algebra gives us a neat technique for simplifying a system of equations. It actually solves for individual variables if it’s able to, and also gets rid of redundant equations that don’t add any new information.

This simplest form is called the Reduced Row Echelon Form (Wikipedia) which you may also see abbreviated as “rref” (perhaps a bit of a confusing term for programmers) and it involves you putting the equations into a matrix and then performing an algorithm, such as Gauss–Jordan elimination (Wikipedia) to get the rref.

# Equations as a Matrix

Putting n set of equations into a matrix is really straight forward.

Each row of a matrix is a separate equation, and each column represents the coefficient of a variable.

Let’s see how with this set of equations:

$3x + y = 5\\ 2y = 7\\ y + z = 14$

Not every equation has every variable in it, so let’s fix that by putting in zero terms for the missing variables, and let’s make the one terms explicit as well:

$3x + 1y + 0z = 5\\ 0x + 2y + 0z = 7\\ 0x + 1y + 1z = 14$

Putting those equations into a matrix looks like this:

$\left[\begin{array}{rrr} 3 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 1 & 1 \end{array}\right]$

If you also include the constants on the right side of the equation, you get what is called an augmented matrix, which looks like this:

$\left[\begin{array}{rrr|r} 3 & 1 & 0 & 5 \\ 0 & 2 & 0 & 7 \\ 0 & 1 & 1 & 14 \end{array}\right]$

# Reduced Row Echelon Form

Wikipedia explains the reduced row echelon form this way:

• all nonzero rows (rows with at least one nonzero element) are above any rows of all zeroes (all zero rows, if any, belong at the bottom of the matrix), and
• the leading coefficient (the first nonzero number from the left, also called the pivot) of a nonzero row is always strictly to the right of the leading coefficient of the row above it.
• Every leading coefficient is 1 and is the only nonzero entry in its column.

This is an example of a 3×5 matrix in reduced row echelon form:
$\left[\begin{array}{rrrrr} 1 & 0 & a_1 & 0 & b_1 \\ 0 & 1 & a_2 & 0 & b_2 \\ 0 & 0 & 0 & 1 & b_3 \end{array}\right]$

Basically, the lower left triangle of the matrix (the part under the diagonal) needs to be zero, and the first number in each row needs to be one.

Looking back at the augmented matrix we made:

$\left[\begin{array}{rrr|r} 3 & 1 & 0 & 5 \\ 0 & 2 & 0 & 7 \\ 0 & 1 & 1 & 14 \end{array}\right]$

If we put it into reduced row echelon form, we get this:

$\left[\begin{array}{rrr|r} 1 & 0 & 0 & 0.5 \\ 0 & 1 & 0 & 3.5 \\ 0 & 0 & 1 & 10.5 \end{array}\right]$

There’s something really neat about the reduced row echelon form. If we take the above augmented matrix and turn it back into equations, look what we get:

$1x + 0y + 0z = 0.5\\ 0x + 1y + 0z = 3.5\\ 0x + 0y + 1z = 10.5$

Or if we simplify that:

$x = 0.5\\ y = 3.5\\ z = 10.5$

Putting it into reduced row echelon form simplified our set of equations so much that it actually solved for our variables. Neat!

How do we put a matrix into rref? We can use Gauss–Jordan elimination.

# Gauss–Jordan Elimination

Gauss Jordan Elimination is a way of doing operations on rows to be able to manipulate the matrix to get it into the desired form.

It’s often explained that there are three row operations you can do:

• Type 1: Swap the positions of two rows.
• Type 2: Multiply a row by a nonzero scalar.
• Type 3: Add to one row a scalar multiple of another.

You might notice that the first two rules are technically just cases of using the third rule. I find that easier to remember, maybe you will too.

The algorithm for getting the rref is actually pretty simple.

1. Starting with the first column of the matrix, find a row which has a non zero in that column, and make that row be the first row by swapping it with the first row.
2. Multiply the first row by a value so that the first column has a 1 in it.
3. Subtract a multiple of the first row from every other row in the matrix so that they have a zero in the first column.

You’ve now handled one column (one variable) so move onto the next.

1. Continuing on, we consider the second column. Find a row which has a non zero in that column and make that row be the second row by swapping it with the second row.
2. Multiply the second row by a value so that the second column has a 1 in it.
3. Subtract a multiple of the second row from every other row in the matrix so that they have a zero in the second column.

You repeat this process until you either run out of rows or columns, at which point you are done.

Note that if you ever find a column that has only zeros in it, you just skip that row.

Let’s work through the example augmented matrix to see how we got it into rref. Starting with this:

$\left[\begin{array}{rrr|r} 3 & 1 & 0 & 5 \\ 0 & 2 & 0 & 7 \\ 0 & 1 & 1 & 14 \end{array}\right]$

We already have a non zero in the first column, so we multiply the top row by 1/3 to get this:

$\left[\begin{array}{rrr|r} 1 & 0.3333 & 0 & 1.6666 \\ 0 & 2 & 0 & 7 \\ 0 & 1 & 1 & 14 \end{array}\right]$

All the other rows have a zero in the first column so we move to the second row and the second column. The second row already has a non zero in the second column, so we multiply the second row by 1/2 to get this:

$\left[\begin{array}{rrr|r} 1 & 0.3333 & 0 & 1.6666 \\ 0 & 1 & 0 & 3.5 \\ 0 & 1 & 1 & 14 \end{array}\right]$

To make sure the second row is the only row that has a non zero in the second column, we subtract the second row times 1/3 from the first row. We also subtract the second row from the third row. That gives us this:

$\left[\begin{array}{rrr|r} 1 & 0 & 0 & 0.5 \\ 0 & 1 & 0 & 3.5 \\ 0 & 0 & 1 & 10.5 \end{array}\right]$

Since the third row has a 1 in the third column, and all other rows have a 0 in that column we are done.

That’s all there is to it! We put the matrix into rref, and we also solved the set of equations. Neat huh?

You may notice that the ultimate rref of a matrix is just the identity matrix. This is true unless the equations can’t be fully solved.

# Overdetermined, Underdetermined & Inconsistent Equations

Systems of equations are overdetermined when they have more equations than unknowns, like the below which has three equations and two unknowns:

$x + y = 3 \\ x = 1 \\ y = 2 \\$

Putting that into (augmented) matrix form gives you this:

$\left[\begin{array}{rr|r} 1 & 1 & 3 \\ 1 & 0 & 1 \\ 0 & 1 & 2 \end{array}\right]$

If you put that into rref, you end up with this:

$\left[\begin{array}{rr|r} 1 & 0 & 1 \\ 0 & 1 & 2 \\ 0 & 0 & 0 \end{array}\right]$

The last row became zeroes, which shows us that there was redundant info in the system of equations that disappeared. We can easily see that x = 1 and y = 2, and that satisfies all three equations.

Just like we talked about in the opening of this post, if you have equations that don’t add useful information beyond what the other equations already give, it will disappear when you put it into rref. That made our over-determined system become just a determined system.

What happens though if we change the third row in the overdetermined system to be something else? For instance, we can say y=10 instead of y=2:

$x + y = 3 \\ x = 1 \\ y = 10 \\$

The augmented matrix for that is this:

$\left[\begin{array}{rr|r} 1 & 1 & 3 \\ 1 & 0 & 1 \\ 0 & 1 & 10 \end{array}\right]$

If we put that in rref, we get the identity matrix out which seems like everything is ok:

$\left[\begin{array}{rr|r} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]$

However, if we turn it back into a set of equations, we can see that we have a problem:

$x = 0 \\ x = 0 \\ 0 = 1 \\$

The result says that 0 = 1, which is not true. Having a row of “0 = 1” in rref is how you detect that a system of equations is inconsistent, or in other words, that the equations give contradictory information.

A system of equations can also be underderdetermined, meaning there isn’t enough information to solve the equations. Let’s use the example from the beginning of the post:

$A + B + C = 2 \\ B = 5 \\$

In an augmented matrix, that looks like this:

$\left[\begin{array}{rrr|r} 1 & 1 & 1 & 2 \\ 0 & 1 & 0 & 5 \\ \end{array}\right]$

Putting that in rref we get this:

$\left[\begin{array}{rrr|r} 1 & 0 & 1 & -3 \\ 0 & 1 & 0 & 5 \\ \end{array}\right]$

Converting the matrix back into equations we get this:

$A + C = -3 \\ B = 5 \\$

This says there isn’t enough information to fully solve the equations, and shows how A and C are related, even though B is completely determined.

Note that another way of looking at this is that “A” and “C” are “free variables”. That means that if your equations specify constraints, that you are free to choose a value for either A or C. If you choose a value for one, the other becomes defined. B is not a free variable because it’s value is determined.

Let’s finish the example from the beginning of the post, showing what happens when we “make up” an equation by transforming one of the equations we already have:

$A + B + C = 2 \\ B = 5\\ -B = -5$

The augmented matrix looks like this:

$\left[\begin{array}{rrr|r} 1 & 1 & 1 & 2 \\ 0 & 1 & 0 & 5 \\ 0 & -1 & 0 & -5 \\ \end{array}\right]$

Putting it in rref, we get this:

$\left[\begin{array}{rrr|r} 1 & 0 & 1 & -3 \\ 0 & 1 & 0 & 5 \\ 0 & 0 & 0 & 0 \\ \end{array}\right]$

Which as you can see, our rref matrix is the same as it was without the extra “made up” equation besides the extra row of zeros in the result.

The number of non zero rows in a matrix in rref is known as the rank of the matrix. In these last two examples, the rank of the matrix was two in both cases. That means that you can tell if adding an equation to a system of equations adds any new, meaningful information or not by seeing if it changes the rank of the matrix for the set of equations. If the rank is the same before and after adding the new equation, it doesn’t add anything new. If the rank does change, that means it does add new information.

This concept of “adding new, meaningful information” actually has a formalized term: linear independence. If a new equation is linearly independent from the other equations in the system, it will change the rank of the rref matrix, else it won’t.

The rank of a matrix for a system of equations just tells you the number of linearly independent equations there actually are, and actually gives you what those equations are in their simplest form.

Lastly I wanted to mention that the idea of a system of equations being inconsistent is completely separate from the idea of a system of equations being under determined or over determined. They can be both over determined and inconsistent, under determined and inconsistent, over determined and consistent or under determined and consistent . The two ideas are completely separate, unrelated things.

# Inverting a Matrix

Interestingly, Gauss-Jordan elimination is also a common way for efficiently inverting a matrix!

How you do that is make an augmented matrix where on the left side you have the matrix you want to invert, and on the right side you have the identity matrix.

Let’s invert a matrix I made up pretty much at random:

$\left[\begin{array}{rrr|rrr} 1 & 0 & 1 & 1 & 0 & 0 \\ 0 & 3 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1\\ \end{array}\right]$

Putting that matrix in rref, we get this:

$\left[\begin{array}{rrr|rrr} 1 & 0 & 0 & 1 & 0 & -1 \\ 0 & 1 & 0 & 0 & 0.3333 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1\\ \end{array}\right]$

The equation on the right is the inverse of the original matrix we had on the left!

You can double check by using an online matrix inverse calculator if you want: Inverse Matrix Calculator

Note that not all matrices are invertible though! When you get an inconsistent result, or the result is not the identity matrix, it wasn’t invertible.

# Solving Mx = b

Let’s say that you have two vectors x and b, and a matrix M. Let’s say that we know the matrix M and the vector b, and that we are trying to solve for the vector x.

This comes up more often that you might suspect, including when doing “least squares fitting” of an equation to a set of data points (more info on that: Incremental Least Squares Curve Fitting).

One way to solve this equation would be to calculate the inverse matrix of M and multiply that by vector b to get vector x:

$Mx = b\\ x = M^{-1} * b$

However, Gauss-Jordan elimination can help us here too.

If we make an augmented matrix where on the left we have M, and on the right we have b, we can put the matrix into rref, which will essentially multiply vector b by the inverse of M, leaving us with the vector x.

For instance, on the left is our matrix M that scales x,y,z by 2. On the right is our vector b, which is the matrix M times our unknown vector x:

$\left[\begin{array}{rrr|r} 2 & 0 & 0 & 2 \\ 0 & 2 & 0 & 4 \\ 0 & 0 & 2 & 8 \\ \end{array}\right]$

Putting that into rref form we get this:

$\left[\begin{array}{rrr|r} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 4 \\ \end{array}\right]$

From this, we know that the value of vector x is the right side of the augmented matrix: (1,2,4)

This only works when the matrix is invertible (aka when the rref goes to an identity matrix).

# Source Code

Here is some C++ source code which does Gauss-Jordan elimination. It’s written mainly to be readable, not performant!

#include <stdio.h>
#include <array>
#include <vector>
#include <assert.h>

// Define a vector as an array of floats
template<size_t N>
using TVector = std::array<float, N>;

// Define a matrix as an array of vectors
template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

// Helper function to fill out a matrix
template <size_t M, size_t N>
TMatrix<M, N> MakeMatrix (std::initializer_list<std::initializer_list<float>> matrixData)
{
TMatrix<M, N> matrix;

size_t m = 0;
assert(matrixData.size() == M);
for (const std::initializer_list<float>& rowData : matrixData)
{
assert(rowData.size() == N);
size_t n = 0;
for (float value : rowData)
{
matrix[m][n] = value;
++n;
}
++m;
}

return matrix;
}

// Make a specific row have a 1 in the colIndex, and make all other rows have 0 there
template <size_t M, size_t N>
bool MakeRowClaimVariable (TMatrix<M, N>& matrix, size_t rowIndex, size_t colIndex)
{
// Find a row that has a non zero value in this column and swap it with this row
{
// Find a row that has a non zero value
size_t nonZeroRowIndex = rowIndex;
while (nonZeroRowIndex < M && matrix[nonZeroRowIndex][colIndex] == 0.0f)
++nonZeroRowIndex;

// If there isn't one, nothing to do
if (nonZeroRowIndex == M)
return false;

// Otherwise, swap the row
if (rowIndex != nonZeroRowIndex)
std::swap(matrix[rowIndex], matrix[nonZeroRowIndex]);
}

// Scale this row so that it has a leading one
float scale = 1.0f / matrix[rowIndex][colIndex];
for (size_t normalizeColIndex = colIndex; normalizeColIndex < N; ++normalizeColIndex)
matrix[rowIndex][normalizeColIndex] *= scale;

// Make sure all rows except this one have a zero in this column.
// Do this by subtracting this row from other rows, multiplied by a multiple that makes the column disappear.
for (size_t eliminateRowIndex = 0; eliminateRowIndex < M; ++eliminateRowIndex)
{
if (eliminateRowIndex == rowIndex)
continue;

float scale = matrix[eliminateRowIndex][colIndex];
for (size_t eliminateColIndex = 0; eliminateColIndex < N; ++eliminateColIndex)
matrix[eliminateRowIndex][eliminateColIndex] -= matrix[rowIndex][eliminateColIndex] * scale;
}

return true;
}

// make matrix into reduced row echelon form
template <size_t M, size_t N>
void GaussJordanElimination (TMatrix<M, N>& matrix)
{
size_t rowIndex = 0;
for (size_t colIndex = 0; colIndex < N; ++colIndex)
{
if (MakeRowClaimVariable(matrix, rowIndex, colIndex))
{
++rowIndex;
if (rowIndex == M)
return;
}
}
}

int main (int argc, char **argv)
{
auto matrix = MakeMatrix<3, 4>(
{
{ 2.0f, 0.0f, 0.0f, 2.0f },
{ 0.0f, 2.0f, 0.0f, 4.0f },
{ 0.0f, 0.0f, 2.0f, 8.0f },
});

GaussJordanElimination(matrix);

return 0;
}


I hope you enjoyed this post and/or learned something from it. This is a precursor to an interesting (but maybe obscure) topic for my next blog post, which involves a graphics / gamedev thing.

Any comments, questions or corrections, let me know in the comments below or on twitter at @Atrix256

# Orthogonal Projection Matrix Plainly Explained

“Scratch a Pixel” has a really nice explanation of perspective and orthogonal projection matrices.

It inspired me to make a very simple / plain explanation of orthogonal projection matrices that hopefully will help them be less opaque for folks and more intuitive.

Original article: Scratch A Pixel: The Perspective and Orthographic Projection Matrix

# Let’s Get To It!

The whole purpose of an orthogonal matrix is to take x,y and z as input and output x,y and z such that valid points on the screen will have x,y,z values between -1 and 1.

If we transform a point and get an x,y or z that is outside of that range, we know the point is outside of the screen either because it’s too far left, right, up or down, or because it’s too close or too far on the z axis.

Let’s think about how we’d do this, thinking only about the x coordinate for now.

To map some range of x values from -1 to 1, we’ll need to decide on what x value maps to -1 and what x value maps to 1. We’ll call these “left” and “right”.

Given a left and right value, and an x value we want to map to the range, perhaps the most straight forward way to do it would be this:

$XOut = \frac{X-Left}{Right-Left} * 2 - 1$

The division calculates the percentage of how far X is between left and right. Multiplying that by 2 and subtracting 1 changes it so instead of valid points being from 0 to 1 (aka from 0% to 100%), they are instead between -1 and 1.

Let’s change this formula so that there is one term that is multiplied by X and another term that has everything else. (Wondering why? It’s because I’m cheating and know the final form. Don’t feel bad if it isn’t intuitive why we’d do this!)

$\frac{X-Left}{Right-Left} * 2 - 1 =\\ \\ \frac{2(X-Left)}{Right-Left} - 1 =\\ \\ \frac{2X-2*Left}{Right-Left} - 1 =\\ \\ \frac{2X-2*Left}{Right-Left} - \frac{Right-Left}{Right-Left} =\\ \\ \frac{2X-2*Left-(Right-Left)}{Right-Left} =\\ \\ \frac{2X-2*Left-Right+Left}{Right-Left} =\\ \\ \frac{2X-Left-Right}{Right-Left} =\\ \\ \frac{2X}{Right-Left} - \frac{Right+Left}{Right-Left} =\\ \\ \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left}\\$

Setting up the formula this way allows us to transform the x component of an (x,y,z,1) point using a dot product:

$(x,y,z,1) \cdot (\frac{2}{Right-Left},0,0,-\frac{Right+Left}{Right-Left}) = \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left}$

A dot product is what happens during matrix multiplication, so if we put this into a 4×4 matrix, we get the same result. Let’s check that out.

We start with an identity matrix. If we use it to transform an (x,y,z,1) point, we get the same point as output aka nothing happens.

$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Now let’s put the x transform we came up with into the matrix:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ -\frac{Right+Left}{Right-Left} & 0 & 0 & 1 \\ \end{bmatrix}$

If we use that matrix to transform an (x,y,z,1) point, it will transform our x component as we described (valid ranges of x that are between left and right will be between -1 and 1), while leaving the other components of the point alone.

As you might imagine, it’s pretty simple to get our formulas for y and z as well. Starting with the x formula, we can just change x with y and z, and right/left with top/bottom and far/near.

$XOut = \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left} \\ \\ YOut = \frac{2}{Top-Bottom}Y - \frac{Top+Bottom}{Top-Bottom} \\ \\ ZOut = \frac{2}{Far-Near}Z - \frac{Far+Near}{Far-Near}$

We can put those into our matrix to get a full orthographic projection matrix.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{2}{Far-Near} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

There we go, that’s all there is to making an orthographic projection matrix. It’s whole purpose is to convert x,y,z values to be between -1 and 1 so that the GPU knows whether points are inside our outside the screen – and thus whether they need to be clipped or not.

## Variations

While the projection matrix we made is a valid orthographic projection matrix in OpenGL, we actually need a tweak for it to be valid for DirectX. The reason for this is because while in OpenGL the clip space for z is between -1 and 1, it’s actually between 0 and 1 for DirectX!

If you leave off the *2-1 for the z formula, but leave it for x and y, you’ll end up with a matrix like this one:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{1}{Near-Far} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Near}{Near-Far} & 1 \\ \end{bmatrix}$

Another variation you’ll see is a version where the camera is centered on the origin for the x and y axis. In other words, left = -right, and top = -bottom. When that is true, right+left and top+bottom become zero which simplifies the matrix to this:

$\begin{bmatrix} \frac{2}{Width} & 0 & 0 & 0 \\ 0 & \frac{2}{Height} & 0 & 0 \\ 0 & 0 & \frac{2}{Far-Near} & 0 \\ 0 & 0 & -\frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

Another variation you’ll see is that the matrix is transposed. You’ll see this when switching between pre and post multiplication, or when switching from column major matrices to row matrices. Either is valid and it’s basically just a notation and convention thing. Here is the origional matrix we made transposed.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & -\frac{Right+Left}{Right-Left} \\ 0 & \frac{2}{Top-Bottom} & 0 & -\frac{Top+Bottom}{Top-Bottom} \\ 0 & 0 & \frac{2}{Far-Near} & -\frac{Far+Near}{Far-Near} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Lastly, the above matrices were all for a “left handed” system. That means that it assumes the positive x axis goes to the right, the positive y axis goes up, and the positive z goes into your screen (aka, the camera is looking down the positive z axis). Positive Z values will map to the valid -1 to 1 range, while negative z values will be outside the valid range.

A variation on the orthographic projection matrix we made that you’ll see is the matrix being a “right handed” matrix which is the same as the left handed matrix, except that the positive z axis goes out from your screen (aka the camera is looking down the negative z axis). Negative Z values will map to the valid -1 to 1 range, while positive z values will be outside the valid range.

To switch the handedness of the matrix, you just flip the sign of the element at (3,3), so here is our original orthographic projection matrix, but converted to right handed instead of left handed.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & -\frac{2}{Far-Near} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

You may also just see the denominator changed from $Far-Near$ to $Near-Far$ which has the same effect, and would give you something like this:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{2}{Near-Far} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

Fun trivia: the term “sinister” comes from latin, meaning “left handed”. So, when talking to someone about their graphics engine, you can ask them whether or not they use sinister projection 😛

Scratch A Pixel: The Perspective and Orthographic Projection Matrix

D3DXMatrixOrthoRH (DirectX) – shows the resulting matrix. Also links to left handed and off center variants.

glOrtho (OpenGL) – shows resulting matrix.

# How to Train Neural Networks With Backpropagation

This post is an attempt to demystify backpropagation, which is the most common method for training neural networks. This post is broken into a few main sections:

1. Explanation
2. Working through examples
3. Simple sample C++ source code using only standard includes
4. Links to deeper resources to continue learning

Let’s talk about the basics of neural nets to start out, specifically multi layer perceptrons. This is a common type of neural network, and is the type we will be talking about today. There are other types of neural networks though such as convolutional neural networks, recurrent neural networks, Hopfield networks and more. The good news is that backpropagation applies to most other types of neural networks too, so what you learn here will be applicable to other types of networks.

# Basics of Neural Networks

A neural network is made up layers.

Each layer has some number of neurons in it.

Every neuron is connected to every neuron in the previous and next layer.

Below is a diagram of a neural network, courtesy of wikipedia. Every circle is a neuron. This network takes 3 floating point values as input, passes them through 4 neurons in a hidden layer and outputs two floating point values. The hidden layer neurons and the output layer neurons do processing of the values they are giving, but the input neurons do not.

To calculate the output value of a single neuron, you multiply every input into that neuron by a weight for that input, sum them up, and add a bias that is set for the neuron. This “weighted input” value is fed into an activation function and the result is the output value of that neuron. Here is a diagram for a single neuron:

The code for calculating the output of a single neuron could look like this:

float weightedInput = bias;

for (int i = 0; i < inputs.size(); ++i)
weightedInput += inputs[i] * weights[i];

float output = Activation(weightedInput);


To evaluate an entire network of neurons, you just repeat this process for all neurons in the network, going from left to right (from input to output).

Neural networks are basically black boxes. We train them to give specific ouputs when we give them specific inputs, but it is often difficult to understand what it is that they’ve learned, or what part of the data they are picking up on.

Training a neural network just means that we adjust the weight and bias values such that when we give specific inputs, we get the desired outputs from the network. Being able to figure out what weights and biases to use can be tricky, especially for networks with lots of layers and lots of neurons per layer. This post talks about how to do just that.

Regarding training, there is a funny story where some people trained a neural network to say whether or not a military tank was in a photograph. It had a very high accuracy rate with the test data they trained it with, but when they used it with new data, it had terrible accuracy. It turns out that the training data was a bit flawed. Pictures of tanks were all taken on a sunny day, and the pictures without tanks were taken on a cloudy day. The network learned how to detect whether a picture was of a sunny day or a cloudy day, not whether there was a tank in the photo or not!

This is one type of pitfall to watch out for when dealing with neural networks – having good training data – but there are many other pitfalls to watch out for too. Architecting and training neural networks is quite literally an art form. If it were painting, this post would be teaching you how to hold a brush and what the primary colors are. There are many, many techniques to learn beyond what is written here to use as tools in your toolbox. The information in this post will allow you to succeed in training neural networks, but there is a lot more to learn to get higher levels of accuracy from your nets!

# Neural Networks Learn Using Gradient Descent

Let’s take a look at a simple neural network where we’ve chosen random values for the weights and the bias:

If given two floating point inputs, we’d calculate the output of the network like this:

$Output = Activation(Input0 * Weight0 + Input1 * Weight1 + Bias)$

Plugging in the specific values for the weights and biases, it looks like this:

$Output = Activation(Input0 * 0.23 + Input1 * -0.1 + 0.3)$

Let’s say that we want this network to output a zero when we give an input of 1,0, and that we don’t care what it outputs otherwise. We’ll plug 1 and 0 in for Input0 and Input1 respectively and see what the output of the network is right now:

$Output = Activation(1* 0.23 + 0 * -0.1 + 0.3) \\ Output = Activation(0.53)$

For the activation function, we are going to use a common one called the sigmoid activation function, which is also sometimes called the logistic activation function. It looks like this:

$\sigma(x) = \frac{1}{1+e^{-x}}$

Without going into too much detail, the reason why sigmoid is so commonly used is because it’s a smoother and differentiable version of the step function.

Applying that activation function to our output neuron, we get this:

$Output = Activation(0.53) \\ Output = \sigma(0.53) \\ Output = 0.6295$

So, we plugged in 1 and 0, but instead of getting a 0 out, we got 0.6295. Our weights and biases are wrong, but how do we correct them?

The secret to correcting our weights and biases is whichever of these terms seem least scary to you: slopes, derivatives, gradients.

If “slope” was the least scary term to you, you probably remember the line formula $y=mx+b$ and that the m value was the “rise over run” or the slope of the line. Well believe it or not, that’s all a derivative is. A derivative is just the slope of a function at a specific point on that function. Even if a function is curved, you can pick a point on the graph and get a slope at that point. The notation for a derivative is $\frac{dy}{dx}$, which literally means “change in y divided by change in x”, or “delta y divided by delta x”, which is literally rise over run.

In the case of a linear function (a line), it has the same derivative over the entire thing, so you can take a step size of any size on the x axis and multiply that step size by $\frac{dy}{dx}$ to figure out how much to add or subtract from y to stay on the line.

In the case of a non linear function, the derivative can change from one point to the next, so this slope is only guaranteed to be accurate for an infinitely small step size. In practice, people just often use “small” step sizes and calling it good enough, which is what we’ll be doing momentarily.

Now that you realize you already knew what a derivative is, we have to talk about partial derivatives. There really isn’t anything very scary about them and they still mean the exact same thing – they are the slope! They are even calculated the exact same way, but they use a fancier looking d in their notation: $\frac{\partial y}{\partial x}$.

The reason partial derivatives even exist is because if you have a function of multiple variables like $z=f(x,y)=x^2+3y+2$, you have two variables that you can take the derivative of. You can calculate $\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$. The first value tells you how much the z value changes for a change in x, the second value tells you how much the z value changes for a change in y.

By the way, if you are curious, the partial derivatives for that function above are below. When calculating partial derivatives, any variable that isn’t the one you care about, you just treat as a constant and do normal derivation.

$\frac{\partial z}{\partial x} = 2x\\ \frac{\partial z}{\partial y} = 3\\$

If you put both of those values together into a vector $(\frac{\partial z}{\partial x},\frac{\partial z}{\partial y})$ you have what is called the gradient vector.

The gradient vector has an interesting property, which is that it points in the direction that makes the function output grow the most. Basically, if you think of your function as a surface, it points up the steepest direction of the surface, from the point you evaluated the function at.

We are going to use that property to train our neural network by doing the following:

1. Calculate the gradient of a function that describes the error in our network. This means we will have the partial derivatives of all the weights and biases in the network.
2. Multiply the gradient by a small “learning rate” value, such as 0.05
3. Subtract these scaled derivatives from the weights and biases to decrease the error a small amount.

This technique is called steepest gradient descent (SGD) and when we do the above, our error will decrease by a small amount. The only exception is that if we use too large of a learning rate, it’s possible that we make the error grow, but usually the error will decrease.

We will do the above over and over, until either the error is small enough, or we’ve decided we’ve tried enough iterations that we think the neural network is never going to learn the things we want to teach it. If the network doesn’t learn, it means it needs to be re-architected with a different structure, different numbers of neurons and layers, different activation functions, etc. This is part of the “art” that I mentioned earlier.

Before moving on, there is one last thing to talk about: global minimums vs local minimums.

Imagine that the function describing the error in our network is visualized as bumpy ground. When we initialize our weights and biases to random numbers we are basically just choosing a random location on the ground to start at. From there, we act like a ball, and just roll down hill from wherever we are. We are definitely going to get to the bottom of SOME bump / hole in the ground, but there is absolutely no reason to except that we’ll get to the bottom of the DEEPEST bump / hole.

The problem is that SGD will find a LOCAL minimum – whatever we are closest too – but it might not find the GLOBAL minimum.

In practice, this doesn’t seem to be too large of a problem, at least for people casually using neural nets like you and me, but it is one of the active areas of research in neural networks: how do we do better at finding more global minimums?

You might notice the strange language I’m using where I say we have a function that describes the error, instead of just saying we use the error itself. The function I’m talking about is called the “cost function” and the reason for this is that different ways of describing the error give us different desirable properties.

For instance, a common cost function is to use mean squared error of the actual output compared to the desired output.

For a single training example, you plug the input into the network and calculate the output. You then plug the actual output and the target output into the function below:

$Cost = ||target-output||^2$

In other words, you take the vector of the neuron outputs, subtract it from the actual output that we wanted, calculate the length of the resulting vector and square it. This gives you the squared error.

The reason we use squared error in the cost function is because this way error in either direction is a positive number, so when gradient descent does it’s work, we’ll find the smallest magnitude of error, regardless of whether it’s positive or negative amounts. We could use absolute value, but absolute value isn’t differentiable, while squaring is.

To handle calculating the cost of multiple inputs and outputs, you just take the average of the squared error for each piece of training data. This gives you the mean squared error as the cost function across all inputs. You also average the derivatives to get the combined gradient.

# More on Training

Before we go into backpropagation, I want to re-iterate this point: Neural Networks Learn Using Gradient Descent.

All you need is the gradient vector of the cost function, aka the partial derivatives of all the weights and the biases for the cost.

Backpropagation gets you the gradient vector, but it isn’t the only way to do so!

Another way to do it is to use dual numbers which you can read about on my post about them: Multivariable Dual Numbers & Automatic Differentiation.

Using dual numbers, you would evaluate the output of the network, using dual numbers instead of floating point numbers, and at the end you’d have your gradient vector. It’s not quite as efficient as backpropagation (or so I’ve heard, I haven’t tried it), but if you know how dual numbers work, it’s super easy to implement.

Another way to get the gradient vector is by doing so numerically using finite differences. You can read about numerical derivatives on my post here: Finite Differences

Basically what you would do is if you were trying to calculate the partial derivative of a weight, like $\frac{\partial Cost}{\partial Weight0}$, you would first calculate the cost of the network as usual, then you would add a small value to Weight0 and evaluate the cost again. You subtract the new cost from the old cost, and divide by the small value you added to Weight0. This will give you the partial derivative for that weight value. You’d repeat this for all your weights and biases.

Since realistic neural networks often have MANY MANY weights and biases, calculating the gradient numerically is a REALLY REALLY slow process because of how many times you have to run the network to get cost values with adjusted weights. The only upside is that this method is even easier to implement than dual numbers. You can literally stop reading and go do this right now if you want to 😛

Lastly, there is a way to train neural networks which doesn’t use derivatives or the gradient vector, but instead uses the more brute force-ish method of genetic algorithms.

Using genetic algorithms to train neural networks is a huge topic even to summarize, but basically you create a bunch of random networks, see how they do, and try combining features of networks that did well. You also let some of the “losers” reproduce as well, and add in some random mutation to help stay out of local minimums. Repeat this for many many generations, and you can end up with a well trained network!

Here’s a fun video visualizing neural networks being trained by genetic algorithms: Youtube: Learning using a genetic algorithm on a neural network

# Backpropagation is Just the Chain Rule!

Going back to our talk of dual numbers for a second, dual numbers are useful for what is called “forward mode automatic differentiation”.

Backpropagation actually uses “reverse mode automatic differentiation”, so the two techniques are pretty closely tied, but they are both made possible by what is known as the chain rule.

The chain rule basically says that if you can write a derivative like this:

$dy/dx$

That you can also write it like this:

$dy/du*du/dx$

That might look weird or confusing, but since we know that derivatives are actual values, aka actual ratios, aka actual FRACTIONS, let’s think back to fractions for a moment.

$3/2 = 1.5$

So far so good? Now let’s choose some number out of the air – say, 5 – and do the same thing we did with the chain rule
$3/2 = \\ 3/5 * 5/2 = \\ 15/10 = \\ 3/2 = \\ 1.5$

Due to doing the reverse of cross cancellation, we are able to inject multiplicative terms into fractions (and derivatives!) and come up with the same answer.

Ok, but who cares?

Well, when we are evaluating the output of a neural network for given input, we have lots of equations nested in each other. We have neurons feeding into neurons feeding into neurons etc, with the logistic activation function at each step.

Instead of trying to figure out how to calculate the derivatives of the weights and biases for the entire monster equation (it’s common to have hundreds or thousands of neurons or more!), we can instead calculate derivatives for each step we do when evaluating the network and then compose them together.

Basically, we can break the problem into small bites instead of having to deal with the equation in it’s entirety.

Instead of calculating the derivative of how a specific weight affects the cost directly, we can instead calculate these:

1. dCost/dOutput: The derivative of how a neuron’s output affects cost
2. dOutput/dWeightedInput: The derivative of how the weighted input of a neuron affects a neuron’s output
3. dWeightedInput/dWeight: The derivative of how a weight affects the weighted input of a neuron

Then, when we multiply them all together, we get the real value that we want:
dCost/dOutput * dOutput/dWeightedInput * dWeightedInput/dWeight = dCost/dWeight

Now that we understand all the basic parts of back propagation, I think it’d be best to work through some examples of increasing complexity to see how it all actually fits together!

# Backpropagation Example 1: Single Neuron, One Training Example

This example takes one input and uses a single neuron to make one output. The neuron is only trained to output a 0 when given a 1 as input, all other behavior is undefined. This is implemented as the Example1() function in the sample code.

# Backpropagation Example 2: Single Neuron, Two Training Examples

This time, we are going to teach it not only that it should output 0 when given a 1, but also that it should output 1 when given a 0.

We have two training examples, and we are training the neuron to act like a NOT gate. This is implemented as the Example2() function in the sample code.

The first thing we do is calculate the derivatives (gradient vector) for each of the inputs.

We already calculated the “input 1, output 0” derivatives in the last example:
$\frac{\partial Cost}{\partial Weight} = 0.1476 \\ \frac{\partial Cost}{\partial Bias} = 0.1476$

If we follow the same steps with the “input 0, output 1” training example we get these:
$\frac{\partial Cost}{\partial Weight} = 0.0 \\ \frac{\partial Cost}{\partial Bias} = -0.0887$

To get the actual derivatives to train the network with, we just average them!
$\frac{\partial Cost}{\partial Weight} = 0.0738 \\ \frac{\partial Cost}{\partial Bias} = 0.0294$

From there, we do the same adjustments as before to the weight and bias values to get a weight of 0.2631 and a bias of 0.4853.

If you are wondering how to calculate the cost, again you just take the cost of each training example and average them. Adjusting the weight and bias values causes the cost to drop from 0.1547 to 0.1515, so we have made progress.

It takes 10 times as many iterations with these two training examples to get the same level of error as it did with only one training example though.

As we saw in the last example, after 10,000 iterations, the error was 0.007176.

In this example, after 100,000 iterations, the error is 0.007141. At that point, weight is -9.879733 and bias is 4.837278

# Backpropagation Example 3: Two Neurons in One Layer

Here is the next example, implemented as Example3() in the sample code. Two input neurons feed to two neurons in a single layer giving two outputs.

Let’s look at how we’d calculate the derivatives needed to train this network using the training example that when we give the network 01 as input that it should give out 10 as output.

First comes the forward pass where we calculate the network’s output when we give it 01 as input.

$Z0=input0*weight0+input1*weight1+bias0 \\ Z0=0*0.2+1*0.8+0.5 \\ Z0=1.3 \\ \\ O0=\sigma(1.3) \\ O0=0.7858\\ \\ Z1=input0*weight2+input0*weight3+bias1\\ Z1=0*0.6+1*0.4+0.1\\ Z1=0.5\\ \\ O1=\sigma(0.5)\\ O1=0.6225$

Next we calculate a cost. We don’t strictly need to do this step since we don’t use this value during backpropagation, but this will be useful to verify that we’ve improved things after an iteration of training.

$Cost=0.5*||target-actual||^2\\ Cost=0.5*||(1,0)-(0.7858,0.6225)||^2\\ Cost=0.5*||(0.2142,-0.6225)||^2\\ Cost=0.5*0.6583^2\\ Cost=0.2167$

Now we begin the backwards pass to calculate the derivatives that we’ll need for training.

Let’s calculate dCost/dZ0 aka the error in neuron 0. We’ll do this by calculating dCost/dO0, then dO0/dZ0 and then multiplying them together to get dCost/dZ0. Just like before, this is also the derivative for the bias of the neuron, so this value is also dCost/dBias0.

$\frac{\partial Cost}{\partial O0}=O0-target0\\ \frac{\partial Cost}{\partial O0}=0.7858-1\\ \frac{\partial Cost}{\partial O0}=-0.2142\\ \\ \frac{\partial O0}{\partial Z0} = O0 * (1-O0)\\ \frac{\partial O0}{\partial Z0} = 0.7858 * 0.2142\\ \frac{\partial O0}{\partial Z0} = 0.1683\\ \\ \frac{\partial Cost}{\partial Z0} = \frac{\partial Cost}{\partial O0} * \frac{\partial O0}{\partial Z0}\\ \frac{\partial Cost}{\partial Z0} = -0.2142 * 0.1683\\ \frac{\partial Cost}{\partial Z0} = -0.0360\\ \\ \frac{\partial Cost}{\partial Bias0} = -0.0360$

We can use dCost/dZ0 to calculate dCost/dWeight0 and dCost/dWeight1 by multiplying it by dZ0/dWeight0 and dZ0/dWeight1, which are input0 and input1 respectively.

$\frac{\partial Cost}{\partial Weight0} = \frac{\partial Cost}{\partial Z0} * \frac{\partial Z0}{\partial Weight0} \\ \frac{\partial Cost}{\partial Weight0} = -0.0360 * 0 \\ \frac{\partial Cost}{\partial Weight0} = 0\\ \\ \frac{\partial Cost}{\partial Weight1} = \frac{\partial Cost}{\partial Z0} * \frac{\partial Z0}{\partial Weight1} \\ \frac{\partial Cost}{\partial Weight1} = -0.0360 * 1 \\ \frac{\partial Cost}{\partial Weight1} = -0.0360$

Next we need to calculate dCost/dZ1 aka the error in neuron 1. We’ll do this like before. We’ll calculate dCost/dO1, then dO1/dZ1 and then multiplying them together to get dCost/dZ1. Again, this is also the derivative for the bias of the neuron, so this value is also dCost/dBias1.

$\frac{\partial Cost}{\partial O1}=O1-target1\\ \frac{\partial Cost}{\partial O1}=0.6225-0\\ \frac{\partial Cost}{\partial O1}=0.6225\\ \\ \frac{\partial O1}{\partial Z1} = O1 * (1-O1)\\ \frac{\partial O1}{\partial Z1} = 0.6225 * 0.3775\\ \frac{\partial O1}{\partial Z1} = 0.235\\ \\ \frac{\partial Cost}{\partial Z1} = \frac{\partial Cost}{\partial O1} * \frac{\partial O1}{\partial Z1}\\ \frac{\partial Cost}{\partial Z1} = 0.6225 * 0.235\\ \frac{\partial Cost}{\partial Z1} = 0.1463\\ \\ \frac{\partial Cost}{\partial Bias1} = 0.1463$

Just like with neuron 0, we can use dCost/dZ1 to calculate dCost/dWeight2 and dCost/dWeight3 by multiplying it by dZ1/dWeight2 and dZ1/dWeight2, which are input0 and input1 respectively.

$\frac{\partial Cost}{\partial Weight2} = \frac{\partial Cost}{\partial Z1} * \frac{\partial Z1}{\partial Weight2} \\ \frac{\partial Cost}{\partial Weight2} = 0.1463 * 0 \\ \frac{\partial Cost}{\partial Weight2} = 0\\ \\ \frac{\partial Cost}{\partial Weight3} = \frac{\partial Cost}{\partial Z1} * \frac{\partial Z1}{\partial Weight3} \\ \frac{\partial Cost}{\partial Weight3} = 0.1463 * 1 \\ \frac{\partial Cost}{\partial Weight3} = 0.1463$

After using these derivatives to update the weights and biases with a learning rate of 0.5, they become:
Weight0 = 0.2
Weight1 = 0.818
Weight2 = 0.6
Weight3 = 0.3269
Bias0 = 0.518
Bias1 = 0.0269

Using these values, the cost becomes 0.1943, which dropped from 0.2167, so we have indeed made progress with our learning!

Interestingly, it takes about twice as many trainings as example 1 to get a similar level of error. In this case, 20,000 iterations of learning results in an error of 0.007142.

If we have the network learn the four patterns below instead:
00 = 00
01 = 10
10 = 10
11 = 11

It takes 520,000 learning iterations to get to an error of 0.007223.

# Backpropagation Example 4: Two Layers, Two Neurons Each

This is the last example, implemented as Example4() in the sample code. Two input neurons feed to two neurons in a hidden layer, feeding into two neurons in the output layer giving two outputs. This is the exact same network that is walked through on this page which is also linked to at the end of this post: A Step by Step Backpropagation Example

First comes the forward pass where we calculate the network’s output. We’ll give it 0.05 and 0.1 as input, and we’ll say our desired output is 0.01 and 0.99.

$Z0=input0*weight0+input1*weight1+bias0 \\ Z0=0.05*0.15+0.1*0.2+0.35 \\ Z0=0.3775 \\ \\ O0=\sigma(0.3775) \\ O0=0.5933 \\ \\ Z1=input0*weight2+input1*weight3+bias1\\ Z1=0.05*0.25+0.1*0.3+0.35\\ Z1=0.3925\\ \\ O1=\sigma(0.3925)\\ O1=0.5969\\ \\ Z2=O0*weight4+O1*weight5+bias2\\ Z2=0.5933*0.4+0.5969*0.45+0.6\\ Z2=1.106\\ \\ O2=\sigma(1.106)\\ O2=0.7514\\ \\ Z3=O0*weight6+O1*weight7+bias3\\ Z3=0.5933*0.5+0.5969*0.55+0.6\\ Z3=1.225\\ \\ O3=\sigma(1.225)\\ O3=0.7729$

Next we calculate the cost, taking O2 and O3 as our actual output, and 0.01 and 0.99 as our target (desired) output.

$Cost=0.5*||target-actual||^2\\ Cost=0.5*||(0.01,0.99)-(0.7514,0.7729)||^2\\ Cost=0.5*||(-0.7414,-0.2171)||^2\\ Cost=0.5*0.7725^2\\ Cost=0.2984$

Now we start the backward pass to calculate the derivatives for training.

## Neuron 2

First we’ll calculate dCost/dZ2 aka the error in neuron 2, remembering that the value is also dCost/dBias2.

$\frac{\partial Cost}{\partial O2}=O2-target0\\ \frac{\partial Cost}{\partial O2}=0.7514-0.01\\ \frac{\partial Cost}{\partial O2}=0.7414\\ \\ \frac{\partial O2}{\partial Z2} = O2 * (1-O2)\\ \frac{\partial O2}{\partial Z2} = 0.7514 * 0.2486\\ \frac{\partial O2}{\partial Z2} = 0.1868\\ \\ \frac{\partial Cost}{\partial Z2} = \frac{\partial Cost}{\partial O2} * \frac{\partial O2}{\partial Z2}\\ \frac{\partial Cost}{\partial Z2} = 0.7414 * 0.1868\\ \frac{\partial Cost}{\partial Z2} = 0.1385\\ \\ \frac{\partial Cost}{\partial Bias2} = 0.1385$

We can use dCost/dZ2 to calculate dCost/dWeight4 and dCost/dWeight5.

$\frac{\partial Cost}{\partial Weight4} = \frac{\partial Cost}{\partial Z2} * \frac{\partial Z2}{\partial Weight4}\\ \frac{\partial Cost}{\partial Weight4} = \frac{\partial Cost}{\partial Z2} * O0\\ \frac{\partial Cost}{\partial Weight4} = 0.1385 * 0.5933\\ \frac{\partial Cost}{\partial Weight4} = 0.0822\\ \\ \frac{\partial Cost}{\partial Weight5} = \frac{\partial Cost}{\partial Z2} * \frac{\partial Z2}{\partial Weight5}\\ \frac{\partial Cost}{\partial Weight5} = \frac{\partial Cost}{\partial Z2} * O1\\ \frac{\partial Cost}{\partial Weight5} = 0.1385 * 0.5969\\ \frac{\partial Cost}{\partial Weight5} = 0.0827\\$

## Neuron 3

Next we’ll calculate dCost/dZ3 aka the error in neuron 3, which is also dCost/dBias3.

$\frac{\partial Cost}{\partial O3}=O3-target1\\ \frac{\partial Cost}{\partial O3}=0.7729-0.99\\ \frac{\partial Cost}{\partial O3}=-0.2171\\ \\ \frac{\partial O3}{\partial Z3} = O3 * (1-O3)\\ \frac{\partial O3}{\partial Z3} = 0.7729 * 0.2271\\ \frac{\partial O3}{\partial Z3} = 0.1755\\ \\ \frac{\partial Cost}{\partial Z3} = \frac{\partial Cost}{\partial O3} * \frac{\partial O3}{\partial Z3}\\ \frac{\partial Cost}{\partial Z3} = -0.2171 * 0.1755\\ \frac{\partial Cost}{\partial Z3} = -0.0381\\ \\ \frac{\partial Cost}{\partial Bias3} = -0.0381$

We can use dCost/dZ3 to calculate dCost/dWeight6 and dCost/dWeight7.

$\frac{\partial Cost}{\partial Weight6} = \frac{\partial Cost}{\partial Z3} * \frac{\partial Z3}{\partial Weight6}\\ \frac{\partial Cost}{\partial Weight6} = \frac{\partial Cost}{\partial Z3} * O0\\ \frac{\partial Cost}{\partial Weight6} = -0.0381 * 0.5933\\ \frac{\partial Cost}{\partial Weight6} = -0.0226\\ \\ \frac{\partial Cost}{\partial Weight7} = \frac{\partial Cost}{\partial Z3} * \frac{\partial Z3}{\partial Weight7}\\ \frac{\partial Cost}{\partial Weight7} = \frac{\partial Cost}{\partial Z3} * O1\\ \frac{\partial Cost}{\partial Weight7} = -0.0381 * 0.5969\\ \frac{\partial Cost}{\partial Weight7} = -0.0227\\$

## Neuron 0

Next, we want to calculate dCost/dO0, but doing that requires us to do something new. Neuron 0 affects both neuron 2 and neuron 3, which means that it affects the cost through those two neurons as well. That means our calculation for dCost/dO0 is going to be slightly different, where we add the derivatives of both paths together. Let’s work through it:

$\frac{\partial Cost}{\partial O0} = \frac{\partial Cost}{\partial Z2} * \frac{\partial Z2}{\partial O0} + \frac{\partial Cost}{\partial Z3} * \frac{\partial Z3}{\partial O0}\\ \frac{\partial Cost}{\partial O0} = \frac{\partial Cost}{\partial Z2} * Weight4 + \frac{\partial Cost}{\partial Z3} * Weight6\\ \frac{\partial Cost}{\partial O0} = 0.1385 * 0.4 - 0.0381 * 0.5\\ \frac{\partial Cost}{\partial O0} = 0.0364$

We can then continue and calculate dCost/dZ0, which is also dCost/dBias0, and the error in neuron 0.

$\frac{\partial O0}{\partial Z0} = O0 * (1-O0)\\ \frac{\partial O0}{\partial Z0} = 0.5933 * 0.4067\\ \frac{\partial O0}{\partial Z0} = 0.2413\\ \\ \frac{\partial Cost}{\partial Z0} = \frac{\partial Cost}{\partial O0} * \frac{\partial O0}{\partial Z0}\\ \frac{\partial Cost}{\partial Z0} = 0.0364 * 0.2413\\ \frac{\partial Cost}{\partial Z0} = 0.0088\\ \\ \frac{\partial Cost}{\partial Bias0} = 0.0088$

We can use dCost/dZ0 to calculate dCost/dWeight0 and dCost/dWeight1.

$\frac{\partial Cost}{\partial Weight0} = \frac{\partial Cost}{\partial Z0} * \frac{\partial Z0}{\partial Weight0}\\ \frac{\partial Cost}{\partial Weight0} = \frac{\partial Cost}{\partial Z0} * input0\\ \frac{\partial Cost}{\partial Weight0} = 0.0088 * 0.05\\ \frac{\partial Cost}{\partial Weight0} = 0.0004\\ \\ \frac{\partial Cost}{\partial Weight1} = \frac{\partial Cost}{\partial Z0} * \frac{\partial Z0}{\partial Weight1}\\ \frac{\partial Cost}{\partial Weight1} = \frac{\partial Cost}{\partial Z0} * input1\\ \frac{\partial Cost}{\partial Weight1} = 0.0088 * 0.1\\ \frac{\partial Cost}{\partial Weight1} = 0.0009\\$

## Neuron 1

We are almost done, so hang in there. For our home stretch, we need to calculate dCost/dO1 similarly as we did for dCost/dO0, and then use that to calculate the derivatives of bias1 and weight2 and weight3.

$\frac{\partial Cost}{\partial O1} = \frac{\partial Cost}{\partial Z2} * \frac{\partial Z2}{\partial O1} + \frac{\partial Cost}{\partial Z3} * \frac{\partial Z3}{\partial O1}\\ \frac{\partial Cost}{\partial O1} = \frac{\partial Cost}{\partial Z2} * Weight5 + \frac{\partial Cost}{\partial Z3} * Weight7\\ \frac{\partial Cost}{\partial O1} = 0.1385 * 0.45 - 0.0381 * 0.55\\ \frac{\partial Cost}{\partial O1} = 0.0414\\ \\ \frac{\partial O1}{\partial Z1} = O1 * (1-O1)\\ \frac{\partial O1}{\partial Z1} = 0.5969 * 0.4031\\ \frac{\partial O1}{\partial Z1} = 0.2406\\ \\ \frac{\partial Cost}{\partial Z1} = \frac{\partial Cost}{\partial O1} * \frac{\partial O1}{\partial Z1}\\ \frac{\partial Cost}{\partial Z1} = 0.0414 * 0.2406\\ \frac{\partial Cost}{\partial Z1} = 0.01\\ \\ \frac{\partial Cost}{\partial Bias1} = 0.01$

Lastly, we will use dCost/dZ1 to calculate dCost/dWeight2 and dCost/dWeight3.

$\frac{\partial Cost}{\partial Weight2} = \frac{\partial Cost}{\partial Z1} * \frac{\partial Z1}{\partial Weight2}\\ \frac{\partial Cost}{\partial Weight2} = \frac{\partial Cost}{\partial Z1} * input0\\ \frac{\partial Cost}{\partial Weight2} = 0.01 * 0.05\\ \frac{\partial Cost}{\partial Weight2} = 0.0005\\ \\ \frac{\partial Cost}{\partial Weight3} = \frac{\partial Cost}{\partial Z1} * \frac{\partial Z1}{\partial Weight3}\\ \frac{\partial Cost}{\partial Weight3} = \frac{\partial Cost}{\partial Z1} * input1\\ \frac{\partial Cost}{\partial Weight3} = 0.01 * 0.1\\ \frac{\partial Cost}{\partial Weight3} = 0.001\\$

## Backpropagation Done

Phew, we have all the derivatives we need now.

Here’s our new weights and biases using a learning rate of 0.5:

Weight0 = 0.15 – (0.5 * 0.0004) = 0.1498
Weight1 = 0.2 – (0.5 * 0.0009) = 0.1996
Weight2 = 0.25 – (0.5 * 0.0005) = 0.2498
Weight3 = 0.3 – (0.5 * 0.001) = 0.2995
Weight4 = 0.4 – (0.5 * 0.0822) = 0.3589
Weight5 = 0.45 – (0.5 * 0.0827) = 0.4087
Weight6 = 0.5 – (0.5 * -0.0226) = 0.5113
Weight7 = 0.55 – (0.5 * -0.0227) = 0.5614
Bias0 = 0.35 – (0.5 * 0.0088) = 0.3456
Bias1 = 0.35 – (0.5 * 0.01) = 0.345
Bias2 = 0.6 – (0.5 * 0.1385) = 0.5308
Bias3 = 0.6 – (0.5 * -0.0381) = 0.6191

Using these new values, the cost function value drops from 0.2984 to 0.2839, so we have made progress!

Interestingly, it only takes 5,000 iterations of learning for this network to reach an error of 0.007157, when it took 10,000 iterations of learning for example 1 to get to 0.007176.

Before moving on, take a look at the weight adjustments above. You might notice that the derivatives for the weights are much smaller for weights 0,1,2,3 compared to weights 4,5,6,7. The reason for this is because weights 0,1,2,3 appear earlier in the network. The problem is that earlier layer neurons don’t learn as fast as later layer neurons and this is caused by the nature of the neuron activation functions – specifically, that the sigmoid function has a long tail near 0 and 1 – and is called the “vanishing gradient problem”. The opposite effect can also happen however, where earlier layer gradients explode to super huge numbers, so the more general term is called the “unstable gradient problem”. This is an active area of research on how to address, and this becomes more and more of a problem the more layers you have in your network.

You can use other activation functions such as tanh, identity, relu and others to try and get around this problem. If trying different activation functions, the forward pass (evaluation of a neural network) as well as the backpropagation of error pass remain the same, but of course the calculation for getting O from Z changes, and of course, calculating the derivative deltaO/deltaZ becomes different. Everything else remains the same.

# Sample Code

Below is the sample code which implements all the back propagation examples we worked through above.

Note that this code is meant to be readable and understandable. The code is not meant to be re-usable or highly efficient.

A more efficient implementation would use SIMD instructions, multithreading, stochastic gradient descent, and other things.

It’s also useful to note that calculating a neuron’s Z value is actually a dot product and an addition and that the addition can be handled within the dot product by adding a “fake input” to each neuron that is a constant of 1. This lets you do a dot product to calculate the Z value of a neuron, which you can take further and combine into matrix operations to calculate multiple neuron values at once. You’ll often see neural networks described in matrix notation because of this, but I have avoided that in this post to try and make things more clear to programmers who may not be as comfortable thinking in strictly matrix notation.

#include <stdio.h>
#include <array>

// Nonzero value enables csv logging.
#define LOG_TO_CSV_NUMSAMPLES() 50

// ===== Example 1 - One Neuron, One training Example =====

void Example1RunNetwork (
float input, float desiredOutput,
float weight, float bias,
float& error, float& cost, float& actualOutput,
float& deltaCost_deltaWeight, float& deltaCost_deltaBias, float& deltaCost_deltaInput
) {
// calculate Z (weighted input) and O (activation function of weighted input) for the neuron
float Z = input * weight + bias;
float O = 1.0f / (1.0f + std::exp(-Z));

// the actual output of the network is the activation of the neuron
actualOutput = O;

// calculate error
error = std::abs(desiredOutput - actualOutput);

// calculate cost
cost = 0.5f * error * error;

// calculate how much a change in neuron activation affects the cost function
// deltaCost/deltaO = O - target
float deltaCost_deltaO = O - desiredOutput;

// calculate how much a change in neuron weighted input affects neuron activation
// deltaO/deltaZ = O * (1 - O)
float deltaO_deltaZ = O * (1 - O);

// calculate how much a change in a neuron's weighted input affects the cost function.
// This is deltaCost/deltaZ, which equals deltaCost/deltaO * deltaO/deltaZ
// This is also deltaCost/deltaBias and is also refered to as the error of the neuron
float neuronError = deltaCost_deltaO * deltaO_deltaZ;
deltaCost_deltaBias = neuronError;

// calculate how much a change in the weight affects the cost function.
// deltaCost/deltaWeight = deltaCost/deltaO * deltaO/deltaZ * deltaZ/deltaWeight
// deltaCost/deltaWeight = neuronError * deltaZ/deltaWeight
// deltaCost/deltaWeight = neuronError * input
deltaCost_deltaWeight = neuronError * input;

// As a bonus, calculate how much a change in the input affects the cost function.
// Follows same logic as deltaCost/deltaWeight, but deltaZ/deltaInput is the weight.
// deltaCost/deltaInput = neuronError * weight
deltaCost_deltaInput = neuronError * weight;
}

void Example1 ()
{
#if LOG_TO_CSV_NUMSAMPLES() > 0
// open the csv file for this example
FILE *file = fopen("Example1.csv","w+t");
if (file != nullptr)
fprintf(file, ""training index","error","cost","weight","bias","dCost/dWeight","dCost/dBias","dCost/dInput"n");
#endif

// learning parameters for the network
const float c_learningRate = 0.5f;
const size_t c_numTrainings = 10000;

// training data
// input: 1, output: 0
const std::array<float, 2> c_trainingData = {1.0f, 0.0f};

// starting weight and bias values
float weight = 0.3f;
float bias = 0.5f;

// iteratively train the network
float error = 0.0f;
for (size_t trainingIndex = 0; trainingIndex < c_numTrainings; ++trainingIndex)
{
// run the network to get error and derivatives
float output = 0.0f;
float cost = 0.0f;
float deltaCost_deltaWeight = 0.0f;
float deltaCost_deltaBias = 0.0f;
float deltaCost_deltaInput = 0.0f;
Example1RunNetwork(c_trainingData[0], c_trainingData[1], weight, bias, error, cost, output, deltaCost_deltaWeight, deltaCost_deltaBias, deltaCost_deltaInput);

#if LOG_TO_CSV_NUMSAMPLES() > 0
const size_t trainingInterval = (c_numTrainings / (LOG_TO_CSV_NUMSAMPLES() - 1));
if (file != nullptr && (trainingIndex % trainingInterval == 0 || trainingIndex == c_numTrainings - 1))
{
// log to the csv
fprintf(file, ""%zi","%f","%f","%f","%f","%f","%f","%f",n", trainingIndex, error, cost, weight, bias, deltaCost_deltaWeight, deltaCost_deltaBias, deltaCost_deltaInput);
}
#endif

weight -= deltaCost_deltaWeight * c_learningRate;
bias -= deltaCost_deltaBias * c_learningRate;
}

printf("Example1 Final Error: %fn", error);

#if LOG_TO_CSV_NUMSAMPLES() > 0
if (file != nullptr)
fclose(file);
#endif
}

// ===== Example 2 - One Neuron, Two training Examples =====

void Example2 ()
{
#if LOG_TO_CSV_NUMSAMPLES() > 0
// open the csv file for this example
FILE *file = fopen("Example2.csv","w+t");
if (file != nullptr)
fprintf(file, ""training index","error","cost","weight","bias","dCost/dWeight","dCost/dBias","dCost/dInput"n");
#endif

// learning parameters for the network
const float c_learningRate = 0.5f;
const size_t c_numTrainings = 100000;

// training data
// input: 1, output: 0
// input: 0, output: 1
const std::array<std::array<float, 2>, 2> c_trainingData = { {
{1.0f, 0.0f},
{0.0f, 1.0f}
} };

// starting weight and bias values
float weight = 0.3f;
float bias = 0.5f;

// iteratively train the network
float avgError = 0.0f;
for (size_t trainingIndex = 0; trainingIndex < c_numTrainings; ++trainingIndex)
{
avgError = 0.0f;
float avgOutput = 0.0f;
float avgCost = 0.0f;
float avgDeltaCost_deltaWeight = 0.0f;
float avgDeltaCost_deltaBias = 0.0f;
float avgDeltaCost_deltaInput = 0.0f;

// run the network to get error and derivatives for each training example
for (const std::array<float, 2>& trainingData : c_trainingData)
{
float error = 0.0f;
float output = 0.0f;
float cost = 0.0f;
float deltaCost_deltaWeight = 0.0f;
float deltaCost_deltaBias = 0.0f;
float deltaCost_deltaInput = 0.0f;
Example1RunNetwork(trainingData[0], trainingData[1], weight, bias, error, cost, output, deltaCost_deltaWeight, deltaCost_deltaBias, deltaCost_deltaInput);

avgError += error;
avgOutput += output;
avgCost += cost;
avgDeltaCost_deltaWeight += deltaCost_deltaWeight;
avgDeltaCost_deltaBias += deltaCost_deltaBias;
avgDeltaCost_deltaInput += deltaCost_deltaInput;
}

avgError /= (float)c_trainingData.size();
avgOutput /= (float)c_trainingData.size();
avgCost /= (float)c_trainingData.size();
avgDeltaCost_deltaWeight /= (float)c_trainingData.size();
avgDeltaCost_deltaBias /= (float)c_trainingData.size();
avgDeltaCost_deltaInput /= (float)c_trainingData.size();

#if LOG_TO_CSV_NUMSAMPLES() > 0
const size_t trainingInterval = (c_numTrainings / (LOG_TO_CSV_NUMSAMPLES() - 1));
if (file != nullptr && (trainingIndex % trainingInterval == 0 || trainingIndex == c_numTrainings - 1))
{
// log to the csv
fprintf(file, ""%zi","%f","%f","%f","%f","%f","%f","%f",n", trainingIndex, avgError, avgCost, weight, bias, avgDeltaCost_deltaWeight, avgDeltaCost_deltaBias, avgDeltaCost_deltaInput);
}
#endif

weight -= avgDeltaCost_deltaWeight * c_learningRate;
bias -= avgDeltaCost_deltaBias * c_learningRate;
}

printf("Example2 Final Error: %fn", avgError);

#if LOG_TO_CSV_NUMSAMPLES() > 0
if (file != nullptr)
fclose(file);
#endif
}

// ===== Example 3 - Two inputs, two neurons in one layer =====

struct SExample3Training
{
std::array<float, 2> m_input;
std::array<float, 2> m_output;
};

void Example3RunNetwork (
const std::array<float, 2>& input, const std::array<float, 2>& desiredOutput,
const std::array<float, 4>& weights, const std::array<float, 2>& biases,
float& error, float& cost, std::array<float, 2>& actualOutput,
std::array<float, 4>& deltaCost_deltaWeights, std::array<float, 2>& deltaCost_deltaBiases, std::array<float, 2>& deltaCost_deltaInputs
) {

// calculate Z0 and O0 for neuron0
float Z0 = input[0] * weights[0] + input[1] * weights[1] + biases[0];
float O0 = 1.0f / (1.0f + std::exp(-Z0));

// calculate Z1 and O1 for neuron1
float Z1 = input[0] * weights[2] + input[1] * weights[3] + biases[1];
float O1 = 1.0f / (1.0f + std::exp(-Z1));

// the actual output of the network is the activation of the neurons
actualOutput[0] = O0;
actualOutput[1] = O1;

// calculate error
float diff0 = desiredOutput[0] - actualOutput[0];
float diff1 = desiredOutput[1] - actualOutput[1];
error = std::sqrt(diff0*diff0 + diff1*diff1);

// calculate cost
cost = 0.5f * error * error;

//----- Neuron 0 -----

// calculate how much a change in neuron 0 activation affects the cost function
// deltaCost/deltaO0 = O0 - target0
float deltaCost_deltaO0 = O0 - desiredOutput[0];

// calculate how much a change in neuron 0 weighted input affects neuron 0 activation
// deltaO0/deltaZ0 = O0 * (1 - O0)
float deltaO0_deltaZ0 = O0 * (1 - O0);

// calculate how much a change in neuron 0 weighted input affects the cost function.
// This is deltaCost/deltaZ0, which equals deltaCost/deltaO0 * deltaO0/deltaZ0
// This is also deltaCost/deltaBias0 and is also refered to as the error of neuron 0
float neuron0Error = deltaCost_deltaO0 * deltaO0_deltaZ0;
deltaCost_deltaBiases[0] = neuron0Error;

// calculate how much a change in weight0 affects the cost function.
// deltaCost/deltaWeight0 = deltaCost/deltaO0 * deltaO/deltaZ0 * deltaZ0/deltaWeight0
// deltaCost/deltaWeight0 = neuron0Error * deltaZ/deltaWeight0
// deltaCost/deltaWeight0 = neuron0Error * input0
// similar thing for weight1
deltaCost_deltaWeights[0] = neuron0Error * input[0];
deltaCost_deltaWeights[1] = neuron0Error * input[1];

//----- Neuron 1 -----

// calculate how much a change in neuron 1 activation affects the cost function
// deltaCost/deltaO1 = O1 - target1
float deltaCost_deltaO1 = O1 - desiredOutput[1];

// calculate how much a change in neuron 1 weighted input affects neuron 1 activation
// deltaO0/deltaZ1 = O1 * (1 - O1)
float deltaO1_deltaZ1 = O1 * (1 - O1);

// calculate how much a change in neuron 1 weighted input affects the cost function.
// This is deltaCost/deltaZ1, which equals deltaCost/deltaO1 * deltaO1/deltaZ1
// This is also deltaCost/deltaBias1 and is also refered to as the error of neuron 1
float neuron1Error = deltaCost_deltaO1 * deltaO1_deltaZ1;
deltaCost_deltaBiases[1] = neuron1Error;

// calculate how much a change in weight2 affects the cost function.
// deltaCost/deltaWeight2 = deltaCost/deltaO1 * deltaO/deltaZ1 * deltaZ0/deltaWeight1
// deltaCost/deltaWeight2 = neuron1Error * deltaZ/deltaWeight1
// deltaCost/deltaWeight2 = neuron1Error * input0
// similar thing for weight3
deltaCost_deltaWeights[2] = neuron1Error * input[0];
deltaCost_deltaWeights[3] = neuron1Error * input[1];

//----- Input -----

// As a bonus, calculate how much a change in the inputs affect the cost function.
// A complication here compared to Example1 and Example2 is that each input affects two neurons instead of only one.
// That means that...
// deltaCost/deltaInput0 = deltaCost/deltaZ0 * deltaZ0/deltaInput0 + deltaCost/deltaZ1 * deltaZ1/deltaInput0
//                       = neuron0Error * weight0 + neuron1Error * weight2
// and
// deltaCost/deltaInput1 = deltaCost/deltaZ0 * deltaZ0/deltaInput1 + deltaCost/deltaZ1 * deltaZ1/deltaInput1
//                       = neuron0Error * weight1 + neuron1Error * weight3
deltaCost_deltaInputs[0] = neuron0Error * weights[0] + neuron1Error * weights[2];
deltaCost_deltaInputs[1] = neuron0Error * weights[1] + neuron1Error * weights[3];
}

void Example3 ()
{
#if LOG_TO_CSV_NUMSAMPLES() > 0
// open the csv file for this example
FILE *file = fopen("Example3.csv","w+t");
if (file != nullptr)
fprintf(file, ""training index","error","cost"n");
#endif

// learning parameters for the network
const float c_learningRate = 0.5f;
const size_t c_numTrainings = 520000;

// training data: OR/AND
// input: 00, output: 00
// input: 01, output: 10
// input: 10, output: 10
// input: 11, output: 11
const std::array<SExample3Training, 4> c_trainingData = { {
{{0.0f, 0.0f}, {0.0f, 0.0f}},
{{0.0f, 1.0f}, {1.0f, 0.0f}},
{{1.0f, 0.0f}, {1.0f, 0.0f}},
{{1.0f, 1.0f}, {1.0f, 1.0f}},
} };

// starting weight and bias values
std::array<float, 4> weights = { 0.2f, 0.8f, 0.6f, 0.4f };
std::array<float, 2> biases = { 0.5f, 0.1f };

// iteratively train the network
float avgError = 0.0f;
for (size_t trainingIndex = 0; trainingIndex < c_numTrainings; ++trainingIndex)
{
//float avgCost = 0.0f;
std::array<float, 2> avgOutput = { 0.0f, 0.0f };
std::array<float, 4> avgDeltaCost_deltaWeights = { 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 2> avgDeltaCost_deltaBiases = { 0.0f, 0.0f };
std::array<float, 2> avgDeltaCost_deltaInputs = { 0.0f, 0.0f };
avgError = 0.0f;
float avgCost = 0.0;

// run the network to get error and derivatives for each training example
for (const SExample3Training& trainingData : c_trainingData)
{
float error = 0.0f;
std::array<float, 2> output = { 0.0f, 0.0f };
float cost = 0.0f;
std::array<float, 4> deltaCost_deltaWeights = { 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 2> deltaCost_deltaBiases = { 0.0f, 0.0f };
std::array<float, 2> deltaCost_deltaInputs = { 0.0f, 0.0f };
Example3RunNetwork(trainingData.m_input, trainingData.m_output, weights, biases, error, cost, output, deltaCost_deltaWeights, deltaCost_deltaBiases, deltaCost_deltaInputs);

avgError += error;
avgCost += cost;
for (size_t i = 0; i < avgOutput.size(); ++i)
avgOutput[i] += output[i];
for (size_t i = 0; i < avgDeltaCost_deltaWeights.size(); ++i)
avgDeltaCost_deltaWeights[i] += deltaCost_deltaWeights[i];
for (size_t i = 0; i < avgDeltaCost_deltaBiases.size(); ++i)
avgDeltaCost_deltaBiases[i] += deltaCost_deltaBiases[i];
for (size_t i = 0; i < avgDeltaCost_deltaInputs.size(); ++i)
avgDeltaCost_deltaInputs[i] += deltaCost_deltaInputs[i];
}

avgError /= (float)c_trainingData.size();
avgCost /= (float)c_trainingData.size();
for (size_t i = 0; i < avgOutput.size(); ++i)
avgOutput[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaWeights.size(); ++i)
avgDeltaCost_deltaWeights[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaBiases.size(); ++i)
avgDeltaCost_deltaBiases[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaInputs.size(); ++i)
avgDeltaCost_deltaInputs[i] /= (float)c_trainingData.size();

#if LOG_TO_CSV_NUMSAMPLES() > 0
const size_t trainingInterval = (c_numTrainings / (LOG_TO_CSV_NUMSAMPLES() - 1));
if (file != nullptr && (trainingIndex % trainingInterval == 0 || trainingIndex == c_numTrainings - 1))
{
// log to the csv
fprintf(file, ""%zi","%f","%f"n", trainingIndex, avgError, avgCost);
}
#endif

for (size_t i = 0; i < weights.size(); ++i)
weights[i] -= avgDeltaCost_deltaWeights[i] * c_learningRate;
for (size_t i = 0; i < biases.size(); ++i)
biases[i] -= avgDeltaCost_deltaBiases[i] * c_learningRate;
}

printf("Example3 Final Error: %fn", avgError);

#if LOG_TO_CSV_NUMSAMPLES() > 0
if (file != nullptr)
fclose(file);
#endif
}

// ===== Example 4 - Two layers with two neurons in each layer =====

void Example4RunNetwork (
const std::array<float, 2>& input, const std::array<float, 2>& desiredOutput,
const std::array<float, 8>& weights, const std::array<float, 4>& biases,
float& error, float& cost, std::array<float, 2>& actualOutput,
std::array<float, 8>& deltaCost_deltaWeights, std::array<float, 4>& deltaCost_deltaBiases, std::array<float, 2>& deltaCost_deltaInputs
) {
// calculate Z0 and O0 for neuron0
float Z0 = input[0] * weights[0] + input[1] * weights[1] + biases[0];
float O0 = 1.0f / (1.0f + std::exp(-Z0));

// calculate Z1 and O1 for neuron1
float Z1 = input[0] * weights[2] + input[1] * weights[3] + biases[1];
float O1 = 1.0f / (1.0f + std::exp(-Z1));

// calculate Z2 and O2 for neuron2
float Z2 = O0 * weights[4] + O1 * weights[5] + biases[2];
float O2 = 1.0f / (1.0f + std::exp(-Z2));

// calculate Z3 and O3 for neuron3
float Z3 = O0 * weights[6] + O1 * weights[7] + biases[3];
float O3 = 1.0f / (1.0f + std::exp(-Z3));

// the actual output of the network is the activation of the output layer neurons
actualOutput[0] = O2;
actualOutput[1] = O3;

// calculate error
float diff0 = desiredOutput[0] - actualOutput[0];
float diff1 = desiredOutput[1] - actualOutput[1];
error = std::sqrt(diff0*diff0 + diff1*diff1);

// calculate cost
cost = 0.5f * error * error;

//----- Neuron 2 -----

// calculate how much a change in neuron 2 activation affects the cost function
// deltaCost/deltaO2 = O2 - target0
float deltaCost_deltaO2 = O2 - desiredOutput[0];

// calculate how much a change in neuron 2 weighted input affects neuron 2 activation
// deltaO2/deltaZ2 = O2 * (1 - O2)
float deltaO2_deltaZ2 = O2 * (1 - O2);

// calculate how much a change in neuron 2 weighted input affects the cost function.
// This is deltaCost/deltaZ2, which equals deltaCost/deltaO2 * deltaO2/deltaZ2
// This is also deltaCost/deltaBias2 and is also refered to as the error of neuron 2
float neuron2Error = deltaCost_deltaO2 * deltaO2_deltaZ2;
deltaCost_deltaBiases[2] = neuron2Error;

// calculate how much a change in weight4 affects the cost function.
// deltaCost/deltaWeight4 = deltaCost/deltaO2 * deltaO2/deltaZ2 * deltaZ2/deltaWeight4
// deltaCost/deltaWeight4 = neuron2Error * deltaZ/deltaWeight4
// deltaCost/deltaWeight4 = neuron2Error * O0
// similar thing for weight5
deltaCost_deltaWeights[4] = neuron2Error * O0;
deltaCost_deltaWeights[5] = neuron2Error * O1;

//----- Neuron 3 -----

// calculate how much a change in neuron 3 activation affects the cost function
// deltaCost/deltaO3 = O3 - target1
float deltaCost_deltaO3 = O3 - desiredOutput[1];

// calculate how much a change in neuron 3 weighted input affects neuron 3 activation
// deltaO3/deltaZ3 = O3 * (1 - O3)
float deltaO3_deltaZ3 = O3 * (1 - O3);

// calculate how much a change in neuron 3 weighted input affects the cost function.
// This is deltaCost/deltaZ3, which equals deltaCost/deltaO3 * deltaO3/deltaZ3
// This is also deltaCost/deltaBias3 and is also refered to as the error of neuron 3
float neuron3Error = deltaCost_deltaO3 * deltaO3_deltaZ3;
deltaCost_deltaBiases[3] = neuron3Error;

// calculate how much a change in weight6 affects the cost function.
// deltaCost/deltaWeight6 = deltaCost/deltaO3 * deltaO3/deltaZ3 * deltaZ3/deltaWeight6
// deltaCost/deltaWeight6 = neuron3Error * deltaZ/deltaWeight6
// deltaCost/deltaWeight6 = neuron3Error * O0
// similar thing for weight7
deltaCost_deltaWeights[6] = neuron3Error * O0;
deltaCost_deltaWeights[7] = neuron3Error * O1;

//----- Neuron 0 -----

// calculate how much a change in neuron 0 activation affects the cost function
// deltaCost/deltaO0 = deltaCost/deltaZ2 * deltaZ2/deltaO0 + deltaCost/deltaZ3 * deltaZ3/deltaO0
// deltaCost/deltaO0 = neuron2Error * weight4 + neuron3error * weight6
float deltaCost_deltaO0 = neuron2Error * weights[4] + neuron3Error * weights[6];

// calculate how much a change in neuron 0 weighted input affects neuron 0 activation
// deltaO0/deltaZ0 = O0 * (1 - O0)
float deltaO0_deltaZ0 = O0 * (1 - O0);

// calculate how much a change in neuron 0 weighted input affects the cost function.
// This is deltaCost/deltaZ0, which equals deltaCost/deltaO0 * deltaO0/deltaZ0
// This is also deltaCost/deltaBias0 and is also refered to as the error of neuron 0
float neuron0Error = deltaCost_deltaO0 * deltaO0_deltaZ0;
deltaCost_deltaBiases[0] = neuron0Error;

// calculate how much a change in weight0 affects the cost function.
// deltaCost/deltaWeight0 = deltaCost/deltaO0 * deltaO0/deltaZ0 * deltaZ0/deltaWeight0
// deltaCost/deltaWeight0 = neuron0Error * deltaZ0/deltaWeight0
// deltaCost/deltaWeight0 = neuron0Error * input0
// similar thing for weight1
deltaCost_deltaWeights[0] = neuron0Error * input[0];
deltaCost_deltaWeights[1] = neuron0Error * input[1];

//----- Neuron 1 -----

// calculate how much a change in neuron 1 activation affects the cost function
// deltaCost/deltaO1 = deltaCost/deltaZ2 * deltaZ2/deltaO1 + deltaCost/deltaZ3 * deltaZ3/deltaO1
// deltaCost/deltaO1 = neuron2Error * weight5 + neuron3error * weight7
float deltaCost_deltaO1 = neuron2Error * weights[5] + neuron3Error * weights[7];

// calculate how much a change in neuron 1 weighted input affects neuron 1 activation
// deltaO1/deltaZ1 = O1 * (1 - O1)
float deltaO1_deltaZ1 = O1 * (1 - O1);

// calculate how much a change in neuron 1 weighted input affects the cost function.
// This is deltaCost/deltaZ1, which equals deltaCost/deltaO1 * deltaO1/deltaZ1
// This is also deltaCost/deltaBias1 and is also refered to as the error of neuron 1
float neuron1Error = deltaCost_deltaO1 * deltaO1_deltaZ1;
deltaCost_deltaBiases[1] = neuron1Error;

// calculate how much a change in weight2 affects the cost function.
// deltaCost/deltaWeight2 = deltaCost/deltaO1 * deltaO1/deltaZ1 * deltaZ1/deltaWeight2
// deltaCost/deltaWeight2 = neuron1Error * deltaZ2/deltaWeight2
// deltaCost/deltaWeight2 = neuron1Error * input0
// similar thing for weight3
deltaCost_deltaWeights[2] = neuron1Error * input[0];
deltaCost_deltaWeights[3] = neuron1Error * input[1];

//----- Input -----

// As a bonus, calculate how much a change in the inputs affect the cost function.
// A complication here compared to Example1 and Example2 is that each input affects two neurons instead of only one.
// That means that...
// deltaCost/deltaInput0 = deltaCost/deltaZ0 * deltaZ0/deltaInput0 + deltaCost/deltaZ1 * deltaZ1/deltaInput0
//                       = neuron0Error * weight0 + neuron1Error * weight2
// and
// deltaCost/deltaInput1 = deltaCost/deltaZ0 * deltaZ0/deltaInput1 + deltaCost/deltaZ1 * deltaZ1/deltaInput1
//                       = neuron0Error * weight1 + neuron1Error * weight3
deltaCost_deltaInputs[0] = neuron0Error * weights[0] + neuron1Error * weights[2];
deltaCost_deltaInputs[1] = neuron0Error * weights[1] + neuron1Error * weights[3];
}

void Example4 ()
{
#if LOG_TO_CSV_NUMSAMPLES() > 0
// open the csv file for this example
FILE *file = fopen("Example4.csv","w+t");
if (file != nullptr)
fprintf(file, ""training index","error","cost"n");
#endif

// learning parameters for the network
const float c_learningRate = 0.5f;
const size_t c_numTrainings = 5000;

// training data: 0.05, 0.1 in = 0.01, 0.99 out
const std::array<SExample3Training, 1> c_trainingData = { {
{{0.05f, 0.1f}, {0.01f, 0.99f}},
} };

// starting weight and bias values
std::array<float, 8> weights = { 0.15f, 0.2f, 0.25f, 0.3f, 0.4f, 0.45f, 0.5f, 0.55f};
std::array<float, 4> biases = { 0.35f, 0.35f, 0.6f, 0.6f };

// iteratively train the network
float avgError = 0.0f;
for (size_t trainingIndex = 0; trainingIndex < c_numTrainings; ++trainingIndex)
{
std::array<float, 2> avgOutput = { 0.0f, 0.0f };
std::array<float, 8> avgDeltaCost_deltaWeights = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 4> avgDeltaCost_deltaBiases = { 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 2> avgDeltaCost_deltaInputs = { 0.0f, 0.0f };
avgError = 0.0f;
float avgCost = 0.0;

// run the network to get error and derivatives for each training example
for (const SExample3Training& trainingData : c_trainingData)
{
float error = 0.0f;
std::array<float, 2> output = { 0.0f, 0.0f };
float cost = 0.0f;
std::array<float, 8> deltaCost_deltaWeights = { 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 4> deltaCost_deltaBiases = { 0.0f, 0.0f, 0.0f, 0.0f };
std::array<float, 2> deltaCost_deltaInputs = { 0.0f, 0.0f };
Example4RunNetwork(trainingData.m_input, trainingData.m_output, weights, biases, error, cost, output, deltaCost_deltaWeights, deltaCost_deltaBiases, deltaCost_deltaInputs);

avgError += error;
avgCost += cost;
for (size_t i = 0; i < avgOutput.size(); ++i)
avgOutput[i] += output[i];
for (size_t i = 0; i < avgDeltaCost_deltaWeights.size(); ++i)
avgDeltaCost_deltaWeights[i] += deltaCost_deltaWeights[i];
for (size_t i = 0; i < avgDeltaCost_deltaBiases.size(); ++i)
avgDeltaCost_deltaBiases[i] += deltaCost_deltaBiases[i];
for (size_t i = 0; i < avgDeltaCost_deltaInputs.size(); ++i)
avgDeltaCost_deltaInputs[i] += deltaCost_deltaInputs[i];
}

avgError /= (float)c_trainingData.size();
avgCost /= (float)c_trainingData.size();
for (size_t i = 0; i < avgOutput.size(); ++i)
avgOutput[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaWeights.size(); ++i)
avgDeltaCost_deltaWeights[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaBiases.size(); ++i)
avgDeltaCost_deltaBiases[i] /= (float)c_trainingData.size();
for (size_t i = 0; i < avgDeltaCost_deltaInputs.size(); ++i)
avgDeltaCost_deltaInputs[i] /= (float)c_trainingData.size();

#if LOG_TO_CSV_NUMSAMPLES() > 0
const size_t trainingInterval = (c_numTrainings / (LOG_TO_CSV_NUMSAMPLES() - 1));
if (file != nullptr && (trainingIndex % trainingInterval == 0 || trainingIndex == c_numTrainings - 1))
{
// log to the csv
fprintf(file, ""%zi","%f","%f"n", trainingIndex, avgError, avgCost);
}
#endif

for (size_t i = 0; i < weights.size(); ++i)
weights[i] -= avgDeltaCost_deltaWeights[i] * c_learningRate;
for (size_t i = 0; i < biases.size(); ++i)
biases[i] -= avgDeltaCost_deltaBiases[i] * c_learningRate;
}

printf("Example4 Final Error: %fn", avgError);

#if LOG_TO_CSV_NUMSAMPLES() > 0
if (file != nullptr)
fclose(file);
#endif
}

int main (int argc, char **argv)
{
Example1();
Example2();
Example3();
Example4();
system("pause");
return 0;
}


The sample code outputs csv files showing how the values of the networks change over time. One of the reasons for this is because I want to show you error over time.

Below is example 4’s error over time, as we do it’s 5,000 learning iterations.

The other examples show a similarly shaped graph, where there is a lot of learning in the very beginning, and then there is a very long tail of learning very slowly.

When you train neural networks as I’ve described them, you will almost always see this, and sometimes will also see a slow learning time at the BEGINNING of the training.

This issue is also due to the activation function used, just like the unstable gradient problem, and is also an active area of research.

To help fix this issue, there is something called a “cross entropy cost function” which you can use instead of the mean squared error cost function I have been using.

That cost function essentially cancels out the non linearity of the activation function so that you get nicer linear learning progress, and can get networks to learn more quickly and evenly. However, it only cancels out the non linearity for the LAST layer in the network. This means it’s still a problem for networks that have more layers.

Lastly, there is an entirely different thing you can use backpropagation for. We adjusted the weights and biases to get our desired output for the desired inputs. What if instead we adjusted our inputs to give us the desired outputs?

You can do that by using backpropagation to calculate the dCost/dInput derivatives and using those to adjust the input, in the exact same way we adjusted the weights and biases.

You can use this to do some interesting things, including:

1. finding images that a network will recognize as a familiar object, that a human wouldn’t. Start with static as input to the network, and adjust inputs to give the desired output.
2. Modifying images that a network recognizes, into images it doesn’t recognize, but a human would. Start with a well recognized image, and adjust inputs using gradient ASCENT (add the derivatives, don’t subtract them) until the network stops recognizing it.

Believe it or not, this is how all those creepy “deep dream” images were made that came out of google as well, like the one below.

Now that you know the basics, you are ready to learn some more if you are interested. If you still have some questions about things I did or didn’t talk about, these resources might help you make sense of it too. I used these resources and they were all very helpful! You can also give me a shout in the comments below, or on twitter at @Atrix256.

# Multivariable Dual Numbers & Automatic Differentiation

In a previous post I showed how to use dual numbers to be able to get both the value and derivative of a function at the same time:
Dual Numbers & Automatic Differentiation

That post mentions that you can extend it to multivariable functions but doesn’t explain how. This post is that explanation, including simple working C++ code!

Extending this to multivariable functions is useful for ray marching, calculating analytical surface normals and also likely useful for training a neural network if you want an alternative to back propagation. I’m not sure about the efficiency comparison of this versus back propagation but I intend on looking into it (:

# How Does it Work?

It turns out to be really simple to use dual numbers with multivariable functions. The end result is that you want a partial derivative for each variable in the equation, so to do that, you just have a dual number per variable, and process the entire equation for each of those dual numbers separately.

We’ll work through an example. Let’s find the partial derivatives of x and y of the function $3x^2-2y^3$, at input (5,2).

We’ll start by finding the derivative of x, and then the derivative of y.

# Example: df/dx

We start by making a dual number for our x value, remembering that the real part is the actual value for x, and the dual part is the derivative of x, which is 1:

$5+1\epsilon$

or:

$5+\epsilon$

We multiply that value by itself to get the $x^2$ value, keeping in mind that $\epsilon^2$ is zero:
$(5+\epsilon)*(5+\epsilon)= \\ 25+10\epsilon+\epsilon^2= \\ 25+10\epsilon \\$

Next we need to multiply that by 3 to get the $3x^2$ term:

$3*(25+10\epsilon) = 75+30\epsilon$

Putting that aside for a moment, we need to make the $2y^3$ term. We start by making our y value:

$2+0\epsilon$

or:

$2$

If you are wondering why it has a zero for the epsilon term, it’s because when we are calculating the partial derivative of x, y is a constant, so has a derivative of zero.

Next, we multiply this y value by itself twice to get the $y^3$ value:

$2*2*2=8$

We then multiply it by 2 to get the $2y^3$ term:

$8*2=16$

Now that we have our two terms, we subtract the y term from the x term to get our final result:

$75+30\epsilon-16 = \\ 59+30\epsilon$

This result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of x at that point is 30.

That checks out, let’s move on to the derivative of y!

# Example: df/dy

Calculating the derivative of y is very similar to calculating the derivative of x, except that it’s the x term that has an epsilon value (derivative) of 0, instead of the y term. The y term has the epsilon value of 1 this time as well. We’ll work through it to see how it plays out.

First up, we need to make the value for x:

$5+0\epsilon$

or:

$5$

Next we square it and multiply it by 3 to get the $3x^2$ term:

$5*5*3=75$

Next we need to make the value for y, remembering that we use an epsilon value of 1 since the derivative of y is 1 this time around.

$2+\epsilon$

We cube that value and multiply by 2 to get the $2y^3$ term:
$2*(2+\epsilon)*(2+\epsilon)*(2+\epsilon)= \\ 2*(2+\epsilon)*(4+4\epsilon+\epsilon^2)= \\ 2*(2+\epsilon)*(4+4\epsilon)= \\ 2*(8+12\epsilon+4\epsilon^2)= \\ 2*(8+12\epsilon)= \\ 16+24\epsilon$

Now we subtract the y term from the x term to get the final result:

$75 - (16+24\epsilon)= \\ 59-24\epsilon$

This result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of y at that point is -24.

That also checks out, so we got the correct value and partial derivatives for the equation.

# Reducing Redundancy

There was quite a bit of redundancy when working through the x and y derivatives wasn’t there? Increasing the number of variables will increase the amount of redundancy too, so it doesn’t scale up well.

Luckily there is a way to address this. Basically, instead of making two dual numbers which have two items, you make them share the real value (since it’s the same for both, as is the work to make it) and append the dual values for x and y to it.

$x'=(a+b\epsilon) \\ y'=(a+b\epsilon)$

You have:

$(a+b\epsilon_x+c\epsilon_y)$

Then, in your math or in your program, you treat it as if it’s two different dual numbers packed into one. This lets you do the work for the real number once instead of twice, but still lets you do your dual number work for each variable independently.

While it’s probably easiest to think of these as two dual numbers packed into one value, there is actually a mathematical basis for it as well, which may or may not surprise you.

Check out what happens when we multiply two of these together, keeping in mind that multiplying ANY two epsilon values together becomes zero, even if they are different epsilons:

$(a+b\epsilon_x+c\epsilon_y) * (d+e\epsilon_x+f\epsilon_y)= \\ ad + ae\epsilon_x + af\epsilon_y + bd\epsilon_x + be\epsilon_x^2 + bf\epsilon_x\epsilon_y + cd\epsilon_y + ce\epsilon_x\epsilon_y + cf\epsilon_y^2= \\ ad + ae\epsilon_x + af\epsilon_y + bd\epsilon_x + cd\epsilon_y= \\ ad + (ae+bd)\epsilon_x + (af+cd)\epsilon_y$

The interesting thing is that the above result gives you the same values as if you did the same work for two dual numbers individually.

Let’s see this three component dual number in action by re-doing the example again. Note that this pattern scales up to ANY number of variables!

# Example: Both Derivatives (Gradient Vector)

Our goal is to calculate the value and partial derivatives of the function $3x^2-2y^3$ at location (5,2).

First we make our x value:

$5 + 1\epsilon_x + 0\epsilon_y$

or:

$5 + \epsilon_x$

We square that and multiply it by 3 to get our $3x^2$ term:

$3*(5 + \epsilon_x)*(5 + \epsilon_x)= \\ 3*(25+10\epsilon_x+\epsilon_x^2)= \\ 3*(25+10\epsilon_x)= \\ 75+30\epsilon_x$

Next, we make our y value:

$2 + 0\epsilon_x + 1\epsilon_y$

or:

$2 + \epsilon_y$

We cube it and multiply it by 2 to get our $2x^3$ term:

$16+24\epsilon_y$

Lastly we subtract the y term from the x term to get our final answer:

$(75+30\epsilon_x) - (16+24\epsilon_y)= \\ 59+30\epsilon_x-24\epsilon_y$

The result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of x at that point is 30, and the derivative of y at that point is -24.

Neat, right?!

# Example Code

Here is the example code output:

Here is the code that generated it:

#include <stdio.h>
#include <cmath>
#include <array>
#include <algorithm>

#define PI 3.14159265359f

#define EPSILON 0.001f  // for numeric derivatives calculation

template <size_t NUMVARIABLES>
class CDualNumber
{
public:

// constructor to make a constant
CDualNumber (float f = 0.0f) {
m_real = f;
std::fill(m_dual.begin(), m_dual.end(), 0.0f);
}

// constructor to make a variable value.  It sets the derivative to 1.0 for whichever variable this is a value for.
CDualNumber (float f, size_t variableIndex) {
m_real = f;
std::fill(m_dual.begin(), m_dual.end(), 0.0f);
m_dual[variableIndex] = 1.0f;
}

// storage for real and dual values
float							m_real;
std::array<float, NUMVARIABLES> m_dual;
};

//----------------------------------------------------------------------
// Math Operations
//----------------------------------------------------------------------
template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator + (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real + b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] + b.m_dual[i];
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator - (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real - b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] - b.m_dual[i];
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator * (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real * b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_real * b.m_dual[i] + a.m_dual[i] * b.m_real;
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator / (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real / b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = (a.m_dual[i] * b.m_real - a.m_real * b.m_dual[i]) / (b.m_real * b.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> sqrt (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
float sqrtReal = sqrt(a.m_real);
ret.m_real = sqrtReal;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = 0.5f * a.m_dual[i] / sqrtReal;
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> pow (const CDualNumber<NUMVARIABLES> &a, float y)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = pow(a.m_real, y);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = y * a.m_dual[i] * pow(a.m_real, y - 1.0f);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> sin (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = sin(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] * cos(a.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> cos (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = cos(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = -a.m_dual[i] * sin(a.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> tan (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = tan(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] / (cos(a.m_real) * cos(a.m_real));
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> atan (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = tan(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] / (1.0f + a.m_real * a.m_real);
return ret;
}

// templated so it can work for both a CDualNumber<1> and a float
template <typename T>
inline T SmoothStep (const T& x)
{
return x * x * (T(3.0f) - T(2.0f) * x);
}

//----------------------------------------------------------------------
// Test Functions
//----------------------------------------------------------------------

void TestSmoothStep (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// calculate value and derivative using dual numbers
CDualNumber<1> y = SmoothStep(x);

// calculate numeric derivative using central differences
float derivNumeric = (SmoothStep(input + EPSILON) - SmoothStep(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 6.0f * input - 6.0f * input * input;

// show value and derivatives
printf("(smoothstep) y=3x^2-2x^3  (x=%0.4f)n", input);
printf("  y = %0.4fn", y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

void TestTrig (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// sin
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = sin(x);

// calculate numeric derivative using central differences
float derivNumeric = (sin(input + EPSILON) - sin(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = cos(input);

// show value and derivatives
printf("sin(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// cos
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = cos(x);

// calculate numeric derivative using central differences
float derivNumeric = (cos(input + EPSILON) - cos(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = -sin(input);

// show value and derivatives
printf("cos(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// tan
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = tan(x);

// calculate numeric derivative using central differences
float derivNumeric = (tan(input + EPSILON) - tan(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.0f / (cos(input)*cos(input));

// show value and derivatives
printf("tan(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// atan
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = atan(x);

// calculate numeric derivative using central differences
float derivNumeric = (atan(input + EPSILON) - atan(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.0f / (1.0f + input * input);

// show value and derivatives
printf("atan(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}
}

void TestSimple (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// sqrt
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = CDualNumber<1>(3.0f) / sqrt(x);

// calculate numeric derivative using central differences
float derivNumeric = ((3.0f / sqrt(input + EPSILON)) - (3.0f / sqrt(input - EPSILON))) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = -3.0f / (2.0f * pow(input, 3.0f / 2.0f));

// show value and derivatives
printf("3/sqrt(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// pow
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = pow(x + CDualNumber<1>(1.0f), 1.337f);

// calculate numeric derivative using central differences
float derivNumeric = ((pow(input + 1.0f + EPSILON, 1.337f)) - (pow(input + 1.0f - EPSILON, 1.337f))) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.337f * pow(input + 1.0f, 0.337f);

// show value and derivatives
printf("(%0.4f+1)^1.337 = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}
}

void Test2D (float inputx, float inputy)
{
// create dual numbers as the value of x and y
CDualNumber<2> x(inputx, 0);
CDualNumber<2> y(inputy, 1);

// z = 3x^2 - 2y^3
{
// calculate value and partial derivatives using dual numbers
CDualNumber<2> z = CDualNumber<2>(3.0f) * x * x - CDualNumber<2>(2.0f) * y * y * y;

// calculate numeric partial derivatives using central differences
auto f = [] (float x, float y) {
return 3.0f * x * x - 2.0f * y * y * y;
};
float derivNumericX = (f(inputx + EPSILON, inputy) - f(inputx - EPSILON, inputy)) / (2.0f * EPSILON);
float derivNumericY = (f(inputx, inputy + EPSILON) - f(inputx, inputy - EPSILON)) / (2.0f * EPSILON);

// calculate actual partial derivatives
float derivActualX = 6.0f * inputx;
float derivActualY = -6.0f * inputy * inputy;

// show value and derivatives
printf("z=3x^2-2y^3 (x = %0.4f, y = %0.4f)n", inputx, inputy);
printf("  z = %0.4fn", z.m_real);
printf("  dual# dz/dx = %0.4fn", z.m_dual[0]);
printf("  dual# dz/dy = %0.4fn", z.m_dual[1]);
printf("  actual dz/dx = %0.4fn", derivActualX);
printf("  actual dz/dy = %0.4fn", derivActualY);
printf("  numeric dz/dx = %0.4fn", derivNumericX);
printf("  numeric dz/dy = %0.4fnn", derivNumericY);
}
}

void Test3D (float inputx, float inputy, float inputz)
{
// create dual numbers as the value of x and y
CDualNumber<3> x(inputx, 0);
CDualNumber<3> y(inputy, 1);
CDualNumber<3> z(inputz, 2);

// w = sin(x*cos(2*y)) / tan(z)
{
// calculate value and partial derivatives using dual numbers
CDualNumber<3> w = sin(x * cos(CDualNumber<3>(2.0f)*y)) / tan(z);

// calculate numeric partial derivatives using central differences
auto f = [] (float x, float y, float z) {
return sin(x*cos(2.0f*y)) / tan(z);
};
float derivNumericX = (f(inputx + EPSILON, inputy, inputz) - f(inputx - EPSILON, inputy, inputz)) / (2.0f * EPSILON);
float derivNumericY = (f(inputx, inputy + EPSILON, inputz) - f(inputx, inputy - EPSILON, inputz)) / (2.0f * EPSILON);
float derivNumericZ = (f(inputx, inputy, inputz + EPSILON) - f(inputx, inputy, inputz - EPSILON)) / (2.0f * EPSILON);

// calculate actual partial derivatives
float derivActualX = cos(inputx*cos(2.0f*inputy))*cos(2.0f * inputy) / tan(inputz);
float derivActualY = cos(inputx*cos(2.0f*inputy)) *-2.0f*inputx*sin(2.0f*inputy) / tan(inputz);
float derivActualZ = sin(inputx * cos(2.0f * inputy)) / -(sin(inputz) * sin(inputz));

// show value and derivatives
printf("w=sin(x*cos(2*y))/tan(z) (x = %0.4f, y = %0.4f, z = %0.4f)n", inputx, inputy, inputz);
printf("  w = %0.4fn", w.m_real);
printf("  dual# dw/dx = %0.4fn", w.m_dual[0]);
printf("  dual# dw/dy = %0.4fn", w.m_dual[1]);
printf("  dual# dw/dz = %0.4fn", w.m_dual[2]);
printf("  actual dw/dx = %0.4fn", derivActualX);
printf("  actual dw/dy = %0.4fn", derivActualY);
printf("  actual dw/dz = %0.4fn", derivActualZ);
printf("  numeric dw/dx = %0.4fn", derivNumericX);
printf("  numeric dw/dy = %0.4fn", derivNumericY);
printf("  numeric dw/dz = %0.4fnn", derivNumericZ);
}
}

int main (int argc, char **argv)
{
TestSmoothStep(0.5f);
TestSmoothStep(0.75f);
TestTrig(PI * 0.25f);
TestSimple(3.0f);
Test2D(1.5f, 3.28f);
Test3D(7.12f, 8.93f, 12.01f);
return 0;
}


# Closing

One of the neatest things about dual numbers is that they give precise results. They are not approximations and they are not numerical methods, unlike the central differences method that I compared them to in the example program (More info on numerical derivatives here: Finite Differences). Using dual numbers gives you exact derivatives, within the limitations of (eg) floating point math.

It turns out that backpropagation (the method that is commonly used to train neural networks) is just steepest gradient descent. You can read about that here: Backpropogation is Just Steepest Descent with Automatic Differentiation

That makes me wonder how dual numbers would do in run time speed compared to back propagation as well as numerical methods for getting the gradient to adjust a neural network during training.

If I had to guess, I’d say that dual numbers may be slightly slower than backpropagation, but not as slow as numerical methods which are going to be much, much slower. We’ll see though. It may be much easier to implement neural network learning using dual numbers compared to backpropagation, so may be worth an exploration and write up, even if only to make neural networks a little bit more accessible to people.

Comments, corrections, etc? Let me know in the comments below, or contact me on twitter at @Atrix256