# When Random Numbers Are Too Random: Low Discrepancy Sequences

Random numbers can be useful in graphics and game development, but they have a pesky and sometimes undesirable habit of clumping together.

This is a problem in path tracing and monte carlo integration when you take N samples, but the samples aren’t well spread across the sampling range.

This can also be a problem for situations like when you are randomly placing objects in the world or generating treasure for a treasure chest. You don’t want your randomly placed trees to only be in one part of the forest, and you don’t want a player to get only trash items or only godly items when they open a treasure chest. Ideally you want to have some randomness, but you don’t want the random number generator to give you all of the same or similar random numbers.

The problem is that random numbers can be TOO random, like in the below where you can see clumps and large gaps between the 100 samples.

For cases like that, when you want random numbers that are a little bit more well distributed, you might find some use in low discrepancy sequences.

The standalone C++ code (one source file, standard headers, no libraries to link to) I used to generate the data and images are at the bottom of this post, as well as some links to more resources.

# What Is Discrepancy?

In this context, discrepancy is a measurement of the highest or lowest density of points in a sequence. High discrepancy means that there is either a large area of empty space, or that there is an area that has a high density of points. Low discrepancy means that there are neither, and that your points are more or less pretty evenly distributed.

The lowest discrepancy possible has no randomness at all, and in the 1 dimensional case means that the points are evenly distributed on a grid. For monte carlo integration and the game dev usage cases I mentioned, we do want some randomness, we just want the random points to be spread out a little more evenly.

If more formal math notation is your thing, discrepancy is defined as:
$D_{N}(P)=\sup _{{B\in J}}\left|{\frac {A(B;P)}{N}}-\lambda _{s}(B)\right|$

Equidistributed sequence

For monte carlo integration specifically, this is the behavior each thing gives you:

• High Discrepancy: Random Numbers / White Noise aka Uniform Distribution – At lower sample counts, convergance is slower (and have higher variance) due to the possibility of not getting good coverage over the area you integrating. At higher sample counts, this problem disappears. (Hint: real time graphics and preview renderings use a smaller number of samples)
• Lowest Discrepancy: Regular Grid – This will cause aliasing, unlike the other “random” based sampling, which trade aliasing for noise. Noise is preferred over aliasing.
• Low Discrepancy: Low Discrepancy Sequences – In lower numbers of samples, this will have faster convergence by having better coverage of the sampling space, but will use randomness to get rid of aliasing by introducing noise.

Also interesting to note, Quasi Monte Carlo has provably better asymptotic convergence than regular monte carlo integration.

# 1 Dimensional Sequences

We’ll first look at 1 dimensional sequences.

## Grid

Here are 100 samples evenly spaced:

## Random Numbers (White Noise)

This is actually a high discrepancy sequence. To generate this, you just use a standard random number generator to pick 100 points between 0 and 1. I used std::mt19937 with a std::uniform_real_distribution from 0 to 1:

## Subrandom Numbers

Subrandom numbers are ways to decrease the discrepancy of white noise.

One way to do this is to break the sampling space in half. You then generate even numbered samples in the first half of the space, and odd numbered samples in the second half of the space.

There’s no reason you can’t generalize this into more divisions of space though.

This splits the space into 4 regions:

8 regions:

16 regions:

32 regions:

There are other ways to generate subrandom numbers though. One way is to generate random numbers between 0 and 0.5, and add them to the last sample, plus 0.5. This gives you a random walk type setup.

Here is that:

## Uniform Sampling + Jitter

If you take the first subrandom idea to the logical maximum, you break your sample space up into N sections and place one point within those N sections to make a low discrepancy sequence made up of N points.

Another way to look at this is that you do uniform sampling, but add some random jitter to the samples, between +/- half a uniform sample size, to keep the samples in their own areas.

This is that:

I have heard that Pixar invented this technique interestingly.

# Irrational Numbers

Rational numbers are numbers which can be described as fractions, such as 0.75 which can be expressed as 3/4. Irrational numbers are numbers which CANNOT be described as fractions, such as pi, or the golden ratio, or the square root of a prime number.

Interestingly you can use irrational numbers to generate low discrepancy sequences. You start with some value (could be 0, or could be a random number), add the irrational number, and modulus against 1.0. To get the next sample you add the irrational value again, and modulus against 1.0 again. Rinse and repeat until you get as many samples as you want.

Some values work better than others though, and apparently the golden ratio is provably the best choice (1.61803398875…), says Wikipedia.

Here is the golden ratio, using 4 different random (white noise) starting values:

Here I’ve used the square root of 2, with 4 different starting random numbers again:

Lastly, here is pi, with 4 random starting values:

## Van der Corput Sequence

The Van der Corput sequence is the 1d equivelant of the Halton sequence which we’ll talk about later.

How you generate values in the Van der Corput sequence is you convert the index of your sample into some base.

For instance if it was base 2, you would convert your index to binary. If it was base 16, you would convert your index to hexadecimal.

Now, instead of treating the digits as if they are $B^0$, $B^1$, $B^2$, etc (where B is the base), you instead treat them as $B^{-1}$, $B^{-2}$, $B^{-3}$ and so on. In other words, you multiply each digit by a fraction and add up the results.

To show a couple quick examples, let’s say we wanted sample 6 in the sequence of base 2.

First we convert 6 to binary which is 110. From right to left, we have 3 digits: a 0 in the 1’s place, a 1 in the 2’s place, and a 1 in the 4’s place. $0*1 + 1*2 + 1*4 = 6$, so we can see that 110 is in fact 6 in binary.

To get the Van der Corput value for this, instead of treating it as the 1’s, 2’s and 4’s digit, we treat it as the 1/2, 1/4 and 1/8’s digit.

$0 * 1/2 + 1 * 1/4 + 1 * 1/8 = 3/8$.

So, sample 6 in the Van der Corput sequence using base 2 is 3/8.

Let’s try sample 21 in base 3.

First we convert 21 to base 3 which is 210. We can verify this is right by seeing that $0 * 1 + 1 * 3 + 2 * 9 = 21$.

Instead of a 1’s, 3’s and 9’s digit, we are going to treat it like a 1/3, 1/9 and 1/27 digit.

$0 * 1/3 + 1 * 1/9 + 2 * 1/27 = 5/27$

So, sample 21 in the Van der Corput sequence using base 3 is 5/27.

Here is the Van der Corput sequence for base 2:

Here it is for base 3:

Base 4:

Base 5:

## Sobol

One dimensional Sobol is actually just the Van der Corput sequence base 2 re-arranged a little bit, but it’s generated differently.

You start with 0 (either using it as sample 0 or sample -1, doesn’t matter which), and for each sample you do this:

1. Calculate the Ruler function value for the current sample’s index(more info in a second)
2. Make the direction vector by shifting 1 left (in binary) 31 – ruler times.
3. XOR the last sample by the direction vector to get the new sample
4. To interpret the sample as a floating point number you divide it by $2^{32}$

That might sound completely different than the Van der Corput sequence but it actually is the same thing – just re-ordered.

In the final step when dividing by $2^{32}$, we are really just interpreting the binary number as a fraction just like before, but it’s the LEFT most digit that is the 1/2 spot, not the RIGHT most digit.

The Ruler Function goes like: 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0, …

It’s pretty easy to calculate too. Calculating the ruler function for an index (starting at 1) is just the zero based index of the right most 1’s digit after converting the number to binary.

1 in binary is 001 so Ruler(1) is 0.
2 in binary is 010 so Ruler(2) is 1.
3 in binary is 011 so Ruler(3) is 0.
4 in binary is 100 so Ruler(4) is 2.
5 in binary is 101 so Ruler(5) is 0.
and so on.

Here is 1D Sobol:

# Hammersley

In one dimension, the Hammersley sequence is the same as the base 2 Van der Corput sequence, and in the same order. If that sounds strange that it’s the same, it’s a 2d sequence I broke down into a 1d sequence for comparison. The one thing Hammersley has that makes it unique in the 1d case is that you can truncate bits.

It doesn’t seem that useful for 1d Hammersley to truncate bits but knowing that is useful info too I guess. Look at the 2d version of Hammersley to get a fairer look at it, because it’s meant to be a 2d sequence.

Here is Hammersley:

With 1 bit truncated:

With 2 bits truncated:

# Poisson Disc

Poisson disc points are points which are densely packed, but have a minimum distance from each other.

Computer scientists are still working out good algorithms to generate these points efficiently.

I use “Mitchell’s Best-Candidate” which means that when you want to generate a new point in the sequence, you generate N new points, and choose whichever point is farthest away from the other points you’ve generated so far.

Here it is where N is 100:

# 2 Dimensional Sequences

Next up, let’s look at some 2 dimensional sequences.

## Grid

Below is 2d uniform samples on a grid.

Note that uniform grid is not particularly low discrepancy for the 2d case! More info here: Is it expected that uniform points would have non zero discrepancy?

## Random

Here are 100 random points:

## Uniform Grid + Jitter

Here is a uniform grid that has random jitter applied to the points. Jittered grid is a pretty commonly used low discrepancy sampling technique that has good success.

## Subrandom

Just like in 1 dimensions, you can apply the subrandom ideas to 2 dimensions where you divide the X and Y axis into so many sections, and randomly choose points in the sections.

If you divide X and Y into the same number of sections though, you are going to have a problem because some areas are not going to have any points in them.

@Reedbeta pointed out that instead of using i%x and i%y, that you could use i%x and (i/x)%y to make it pick points in all regions.

Picking different numbers for X and Y can be another way to give good results. Here’s dividing X and Y into 2 and 3 sections respectively:

If you choose co-prime numbers for divisions for each axis you can get maximal period of repeats. 2 and 3 are coprime so the last example is a good example of that, but here is 3 and 11:

Here is 3 and 97. 97 is large enough that with only doing 100 samples, we are almost doing jittered grid on the y axis.

Here is the other subrandom number from 1d, where we start with a random value for X and Y, and then add a random number between 0 and 0.5 to each, also adding 0.5, to make a “random walk” type setup again:

## Halton

The Halton sequence is just the Van der Corput sequence, but using a different base on each axis.

Here is the Halton sequence where X and Y use bases 2 and 3:

Here it is using bases 5 and 7:

Here are bases 13 and 9:

# Irrational Numbers

The irrational numbers technique can be used for 2d as well but I wasn’t able to find out how to make it give decent looking output that didn’t have an obvious diagonal pattern in them. Bart Wronski shared a neat paper that explains how to use the golden ratio in 2d with great success: Golden Ratio Sequences For Low-Discrepancy Sampling

This uses the golden ratio for the X axis and the square root of 2 for the Y axis. Below that is the same, with a random starting point, to make it give a different sequence.

Here X axis uses square root of 2 and Y axis uses square root of 3. Below that is a random starting point, which gives the same discrepancy.

## Hammersley

In 2 dimensions, the Hammersley sequence uses the 1d Hammersley sequence for the X axis: Instead of treating the binary version of the index as binary, you treat it as fractions like you do for Van der Corput and sum up the fractions.

For the Y axis, you just reverse the bits and then do the same!

Here is the Hammersley sequence. Note we would have to take 128 samples (not just the 100 we did) if we wanted it to fill the entire square with samples.

Truncating bits in 2d is a bit useful. Here is 1 bit truncated:

2 bits truncated:

## Poisson Disc

Using the same method we did for 1d, we can generate points in 2d space:

## N Rooks

There is a sampling pattern called N-Rooks where you put N rooks onto a chess board and arrange them such that no two are in the same row or column.

A way to generate these samples is to realize that there will be only one rook per row, and that none of them will ever be in the same column. So, you make an array that has numbers 0 to N-1, and then shuffle the array. The index into the array is the row, and the value in the array is the column.

Here are 100 rooks:

## Sobol

Sobol in two dimensions is more complex to explain so I’ll link you to the source I used: Sobol Sequences Made Simple.

The 1D sobol already covered is used for the X axis, and then something more complex was used for the Y axis:

Bart Wronski has a really great series on a related topic: Dithering in Games

Wikipedia: Low Discrepancy Sequence

Wikipedia: Halton Sequence

Wikipedia: Van der Corput Sequence

Using Fibonacci Sequence To Generate Colors

Deeper info and usage cases for low discrepancy sequences

Poisson-Disc Sampling

Low discrepancy sequences are related to blue noise. Where white noise contains all frequencies evenly, blue noise has more high frequencies and fewer low frequencies. Blue noise is essentially the ultimate in low discrepancy, but can be expensive to compute. Here are some pages on blue noise:

Free Blue Noise Textures

The problem with 3D blue noise

Stippling and Blue Noise

Vegetation placement in “The Witness”

Here are some links from @marc_b_reynolds:
Sobol (low-discrepancy) sequence in 1-3D, stratified in 2-4D.
Classic binary-reflected gray code.
Sobol.h

Weyl Sequence

## Code

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers and performance counter.  Sorry non windows people!
#include <vector>
#include <stdint.h>
#include <random>
#include <array>
#include <algorithm>
#include <stdlib.h>
#include <set>

typedef uint8_t uint8;

#define NUM_SAMPLES 100  // to simplify some 2d code, this must be a square
#define NUM_SAMPLES_FOR_COLORING 100

// Turning this on will slow things down significantly because it's an O(N^5) operation for 2d!
#define CALCULATE_DISCREPANCY 0

#define IMAGE1D_WIDTH 600
#define IMAGE1D_HEIGHT 50
#define IMAGE2D_WIDTH 300
#define IMAGE2D_HEIGHT 300

#define AXIS_HEIGHT 40
#define DATA_HEIGHT 20
#define DATA_WIDTH 2

#define COLOR_FILL SColor(255,255,255)
#define COLOR_AXIS SColor(0, 0, 0)

//======================================================================================
struct SImageData
{
SImageData ()
: m_width(0)
, m_height(0)
{ }

size_t m_width;
size_t m_height;
size_t m_pitch;
std::vector<uint8> m_pixels;
};

struct SColor
{
SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
: R(_R), G(_G), B(_B)
{ }

uint8 B, G, R;
};

//======================================================================================
bool SaveImage (const char *fileName, const SImageData &image)
{
// open the file if we can
FILE *file;
file = fopen(fileName, "wb");
if (!file) {
printf("Could not save %s\n", fileName);
return false;
}

// write the data and close the file
fclose(file);

return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
image.m_width = width;
image.m_height = height;
image.m_pitch = 4 * ((width * 24 + 31) / 32);
image.m_pixels.resize(image.m_pitch * image.m_width);
std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
void ImageClear (SImageData& image, const SColor& color)
{
uint8* row = &image.m_pixels[0];
for (size_t rowIndex = 0; rowIndex < image.m_height; ++rowIndex)
{
SColor* pixels = (SColor*)row;
std::fill(pixels, pixels + image.m_width, color);

row += image.m_pitch;
}
}

//======================================================================================
void ImageBox (SImageData& image, size_t x1, size_t x2, size_t y1, size_t y2, const SColor& color)
{
for (size_t y = y1; y < y2; ++y)
{
uint8* row = &image.m_pixels[y * image.m_pitch];
SColor* start = &((SColor*)row)[x1];
std::fill(start, start + x2 - x1, color);
}
}

//======================================================================================
float Distance (float x1, float y1, float x2, float y2)
{
float dx = (x2 - x1);
float dy = (y2 - y1);

return std::sqrtf(dx*dx + dy*dy);
}

//======================================================================================
SColor DataPointColor (size_t sampleIndex)
{
SColor ret;
float percent = (float(sampleIndex) / (float(NUM_SAMPLES_FOR_COLORING) - 1.0f));

ret.R = uint8((1.0f - percent) * 255.0f);
ret.G = 0;
ret.B = uint8(percent * 255.0f);

float mag = (float)sqrt(ret.R*ret.R + ret.G*ret.G + ret.B*ret.B);
ret.R = uint8((float(ret.R) / mag)*255.0f);
ret.G = uint8((float(ret.G) / mag)*255.0f);
ret.B = uint8((float(ret.B) / mag)*255.0f);

return ret;
}

//======================================================================================
float RandomFloat (float min, float max)
{
static std::random_device rd;
static std::mt19937 mt(rd());
std::uniform_real_distribution<float> dist(min, max);
return dist(mt);
}

//======================================================================================
size_t Ruler (size_t n)
{
size_t ret = 0;
while (n != 0 && (n & 1) == 0)
{
n /= 2;
++ret;
}
return ret;
}

//======================================================================================
float CalculateDiscrepancy1D (const std::array<float, NUM_SAMPLES>& samples)
{
// some info about calculating discrepancy
// https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

// Calculates the discrepancy of this data.
// Assumes the data is [0,1) for valid sample range
std::array<float, NUM_SAMPLES> sortedSamples = samples;
std::sort(sortedSamples.begin(), sortedSamples.end());

float maxDifference = 0.0f;
for (size_t startIndex = 0; startIndex <= NUM_SAMPLES; ++startIndex)
{
// startIndex 0 = 0.0f.  startIndex 1 = sortedSamples[0]. etc

float startValue = 0.0f;
if (startIndex > 0)
startValue = sortedSamples[startIndex - 1];

for (size_t stopIndex = startIndex; stopIndex <= NUM_SAMPLES; ++stopIndex)
{
// stopIndex 0 = sortedSamples[0].  startIndex[N] = 1.0f. etc

float stopValue = 1.0f;
if (stopIndex < NUM_SAMPLES)
stopValue = sortedSamples[stopIndex];

float length = stopValue - startValue;

// open interval (startValue, stopValue)
size_t countInside = 0;
for (float sample : samples)
{
if (sample > startValue &&
sample < stopValue)
{
++countInside;
}
}
float density = float(countInside) / float(NUM_SAMPLES);
float difference = std::abs(density - length);
if (difference > maxDifference)
maxDifference = difference;

// closed interval [startValue, stopValue]
countInside = 0;
for (float sample : samples)
{
if (sample >= startValue &&
sample <= stopValue)
{
++countInside;
}
}
density = float(countInside) / float(NUM_SAMPLES);
difference = std::abs(density - length);
if (difference > maxDifference)
maxDifference = difference;
}
}
return maxDifference;
}

//======================================================================================
float CalculateDiscrepancy2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples)
{
// some info about calculating discrepancy
// https://math.stackexchange.com/questions/1681562/how-to-calculate-discrepancy-of-a-sequence

// Calculates the discrepancy of this data.
// Assumes the data is [0,1) for valid sample range.

// Get the sorted list of unique values on each axis
std::set<float> setSamplesX;
std::set<float> setSamplesY;
for (const std::array<float, 2>& sample : samples)
{
setSamplesX.insert(sample[0]);
setSamplesY.insert(sample[1]);
}
std::vector<float> sortedXSamples;
std::vector<float> sortedYSamples;
sortedXSamples.reserve(setSamplesX.size());
sortedYSamples.reserve(setSamplesY.size());
for (float f : setSamplesX)
sortedXSamples.push_back(f);
for (float f : setSamplesY)
sortedYSamples.push_back(f);

// Get the sorted list of samples on the X axis, for faster interval testing
std::array<std::array<float, 2>, NUM_SAMPLES> sortedSamplesX = samples;
std::sort(sortedSamplesX.begin(), sortedSamplesX.end(),
[] (const std::array<float, 2>& itemA, const std::array<float, 2>& itemB)
{
return itemA[0] < itemB[0];
}
);

// calculate discrepancy
float maxDifference = 0.0f;
for (size_t startIndexY = 0; startIndexY <= sortedYSamples.size(); ++startIndexY)
{
float startValueY = 0.0f;
if (startIndexY > 0)
startValueY = *(sortedYSamples.begin() + startIndexY - 1);

for (size_t startIndexX = 0; startIndexX <= sortedXSamples.size(); ++startIndexX)
{
float startValueX = 0.0f;
if (startIndexX > 0)
startValueX = *(sortedXSamples.begin() + startIndexX - 1);

for (size_t stopIndexY = startIndexY; stopIndexY <= sortedYSamples.size(); ++stopIndexY)
{
float stopValueY = 1.0f;
if (stopIndexY < sortedYSamples.size())
stopValueY = sortedYSamples[stopIndexY];

for (size_t stopIndexX = startIndexX; stopIndexX <= sortedXSamples.size(); ++stopIndexX)
{
float stopValueX = 1.0f;
if (stopIndexX < sortedXSamples.size())
stopValueX = sortedXSamples[stopIndexX];

// calculate area
float length = stopValueX - startValueX;
float height = stopValueY - startValueY;
float area = length * height;

// open interval (startValue, stopValue)
size_t countInside = 0;
for (const std::array<float, 2>& sample : samples)
{
if (sample[0] > startValueX &&
sample[1] > startValueY &&
sample[0] < stopValueX &&
sample[1] < stopValueY)
{
++countInside;
}
}
float density = float(countInside) / float(NUM_SAMPLES);
float difference = std::abs(density - area);
if (difference > maxDifference)
maxDifference = difference;

// closed interval [startValue, stopValue]
countInside = 0;
for (const std::array<float, 2>& sample : samples)
{
if (sample[0] >= startValueX &&
sample[1] >= startValueY &&
sample[0] <= stopValueX &&
sample[1] <= stopValueY)
{
++countInside;
}
}
density = float(countInside) / float(NUM_SAMPLES);
difference = std::abs(density - area);
if (difference > maxDifference)
maxDifference = difference;
}
}
}
}

return maxDifference;
}

//======================================================================================
void Test1D (const char* fileName, const std::array<float, NUM_SAMPLES>& samples)
{
// create and clear the image
SImageData image;

// setup the canvas
ImageClear(image, COLOR_FILL);

// calculate the discrepancy
#if CALCULATE_DISCREPANCY
float discrepancy = CalculateDiscrepancy1D(samples);
printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
#endif

// draw the sample points
size_t i = 0;
for (float f: samples)
{
size_t pos = size_t(f * float(IMAGE1D_WIDTH)) + IMAGE_PAD;
ImageBox(image, pos, pos + 1, IMAGE1D_CENTERY - DATA_HEIGHT / 2, IMAGE1D_CENTERY + DATA_HEIGHT / 2, DataPointColor(i));
++i;
}

// draw the axes lines. horizontal first then the two vertical
ImageBox(image, IMAGE_PAD, IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);
ImageBox(image, IMAGE1D_WIDTH + IMAGE_PAD, IMAGE1D_WIDTH + IMAGE_PAD + 1, IMAGE1D_CENTERY - AXIS_HEIGHT / 2, IMAGE1D_CENTERY + AXIS_HEIGHT / 2, COLOR_AXIS);

// save the image
SaveImage(fileName, image);
}

//======================================================================================
void Test2D (const char* fileName, const std::array<std::array<float,2>, NUM_SAMPLES>& samples)
{
// create and clear the image
SImageData image;

// setup the canvas
ImageClear(image, COLOR_FILL);

// calculate the discrepancy
#if CALCULATE_DISCREPANCY
float discrepancy = CalculateDiscrepancy2D(samples);
printf("%s Discrepancy = %0.2f%%\n", fileName, discrepancy*100.0f);
#endif

// draw the sample points
size_t i = 0;
for (const std::array<float, 2>& sample : samples)
{
size_t posx = size_t(sample[0] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
size_t posy = size_t(sample[1] * float(IMAGE2D_WIDTH)) + IMAGE_PAD;
ImageBox(image, posx - 1, posx + 1, posy - 1, posy + 1, DataPointColor(i));
++i;
}

// horizontal lines

// vertical lines

// save the image
SaveImage(fileName, image);
}

//======================================================================================
void TestUniform1D (bool jitter)
{
// calculate the sample points
const float c_cellSize = 1.0f / float(NUM_SAMPLES+1);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = float(i+1) / float(NUM_SAMPLES+1);
if (jitter)
samples[i] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
}

// save bitmap etc
if (jitter)
Test1D("1DUniformJitter.bmp", samples);
else
Test1D("1DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom1D ()
{
// calculate the sample points
const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
samples[i] = RandomFloat(0.0f, 1.0f);

// save bitmap etc
Test1D("1DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA1D (size_t numRegions)
{
const float c_randomRange = 1.0f / float(numRegions);

// calculate the sample points
const float c_halfJitter = 1.0f / float((NUM_SAMPLES + 1) * 2);
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = RandomFloat(0.0f, c_randomRange);
samples[i] += float(i % numRegions) / float(numRegions);
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DSubRandomA_%zu.bmp", numRegions);
Test1D(fileName, samples);
}

//======================================================================================
void TestSubRandomB1D ()
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
float sample = RandomFloat(0.0f, 0.5f);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
sample = std::fmodf(sample + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
samples[i] = sample;
}

// save bitmap etc
Test1D("1DSubRandomB.bmp", samples);
}

//======================================================================================
void TestVanDerCorput (size_t base)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i] = 0.0f;
float denominator = float(base);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % base;
samples[i] += float(multiplier) / denominator;
n = n / base;
denominator *= base;
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DVanDerCorput_%zu.bmp", base);
Test1D(fileName, samples);
}

//======================================================================================
void TestIrrational1D (float irrational, float seed)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
float sample = seed;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
sample = std::fmodf(sample + irrational, 1.0f);
samples[i] = sample;
}

// save bitmap etc
char irrationalStr[256];
sprintf(irrationalStr, "%f", irrational);
char seedStr[256];
sprintf(seedStr, "%f", seed);
char fileName[256];
sprintf(fileName, "1DIrrational_%s_%s.bmp", &irrationalStr[2], &seedStr[2]);
Test1D(fileName, samples);
}

//======================================================================================
void TestSobol1D ()
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t ruler = Ruler(i + 1);
size_t direction = size_t(size_t(1) << size_t(31 - ruler));
sampleInt = sampleInt ^ direction;
samples[i] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// save bitmap etc
Test1D("1DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley1D (size_t truncateBits)
{
// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t n = i >> truncateBits;
float base = 1.0f / 2.0f;
samples[i] = 0.0f;
while (n)
{
if (n & 1)
samples[i] += base;
n /= 2;
base /= 2.0f;
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "1DHammersley_%zu.bmp", truncateBits);
Test1D(fileName, samples);
}

//======================================================================================
float MinimumDistance1D (const std::array<float, NUM_SAMPLES>& samples, size_t numSamples, float x)
{
// Used by poisson.
// This returns the minimum distance that point (x) is away from the sample points, from [0, numSamples).
float minimumDistance = 0.0f;
for (size_t i = 0; i < numSamples; ++i)
{
float distance = std::abs(samples[i] - x);
if (i == 0 || distance < minimumDistance)
minimumDistance = distance;
}
return minimumDistance;
}

//======================================================================================
void TestPoisson1D ()
{
// every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
const size_t c_bestOfAttempts = 100;

// calculate the sample points
std::array<float, NUM_SAMPLES> samples;
for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
{
// generate some random points and keep the one that has the largest minimum distance from any of the existing points
float bestX = 0.0f;
float bestMinDistance = 0.0f;
for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
{
float attemptX = RandomFloat(0.0f, 1.0f);
float minDistance = MinimumDistance1D(samples, sampleIndex, attemptX);

if (minDistance > bestMinDistance)
{
bestX = attemptX;
bestMinDistance = minDistance;
}
}
samples[sampleIndex] = bestX;
}

// save bitmap etc
Test1D("1DPoisson.bmp", samples);
}

//======================================================================================
void TestUniform2D (bool jitter)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_cellSize = 1.0f / float(c_oneSide+1);
for (size_t iy = 0; iy < c_oneSide; ++iy)
{
for (size_t ix = 0; ix < c_oneSide; ++ix)
{
size_t sampleIndex = iy * c_oneSide + ix;

samples[sampleIndex][0] = float(ix + 1) / (float(c_oneSide + 1));
if (jitter)
samples[sampleIndex][0] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);

samples[sampleIndex][1] = float(iy + 1) / (float(c_oneSide) + 1.0f);
if (jitter)
samples[sampleIndex][1] += RandomFloat(-c_cellSize*0.5f, c_cellSize*0.5f);
}
}

// save bitmap etc
if (jitter)
Test2D("2DUniformJitter.bmp", samples);
else
Test2D("2DUniform.bmp", samples);
}

//======================================================================================
void TestUniformRandom2D ()
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i][0] = RandomFloat(0.0f, 1.0f);
samples[i][1] = RandomFloat(0.0f, 1.0f);
}

// save bitmap etc
Test2D("2DUniformRandom.bmp", samples);
}

//======================================================================================
void TestSubRandomA2D (size_t regionsX, size_t regionsY)
{
const float c_randomRangeX = 1.0f / float(regionsX);
const float c_randomRangeY = 1.0f / float(regionsY);

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samples[i][0] = RandomFloat(0.0f, c_randomRangeX);
samples[i][0] += float(i % regionsX) / float(regionsX);

samples[i][1] = RandomFloat(0.0f, c_randomRangeY);
samples[i][1] += float(i % regionsY) / float(regionsY);
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DSubRandomA_%zu_%zu.bmp", regionsX, regionsY);
Test2D(fileName, samples);
}

//======================================================================================
void TestSubRandomB2D ()
{
// calculate the sample points
float samplex = RandomFloat(0.0f, 0.5f);
float sampley = RandomFloat(0.0f, 0.5f);
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samplex = std::fmodf(samplex + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
sampley = std::fmodf(sampley + 0.5f + RandomFloat(0.0f, 0.5f), 1.0f);
samples[i][0] = samplex;
samples[i][1] = sampley;
}

// save bitmap etc
Test2D("2DSubRandomB.bmp", samples);
}

//======================================================================================
void TestHalton (size_t basex, size_t basey)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
const size_t c_oneSide = size_t(std::sqrt(NUM_SAMPLES));
const float c_halfJitter = 1.0f / float((c_oneSide + 1) * 2);
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = 0.0f;
{
float denominator = float(basex);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % basex;
samples[i][0] += float(multiplier) / denominator;
n = n / basex;
denominator *= basex;
}
}

// y axis
samples[i][1] = 0.0f;
{
float denominator = float(basey);
size_t n = i;
while (n > 0)
{
size_t multiplier = n % basey;
samples[i][1] += float(multiplier) / denominator;
n = n / basey;
denominator *= basey;
}
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DHalton_%zu_%zu.bmp", basex, basey);
Test2D(fileName, samples);
}

//======================================================================================
void TestSobol2D ()
{
// calculate the sample points

// x axis
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
size_t ruler = Ruler(i + 1);
size_t direction = size_t(size_t(1) << size_t(31 - ruler));
sampleInt = sampleInt ^ direction;
samples[i][0] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// y axis
// uses numbers: new-joe-kuo-6.21201

// Direction numbers
std::vector<size_t> V;
V.resize((size_t)ceil(log((double)NUM_SAMPLES) / log(2.0)));
V[0] = size_t(1) << size_t(31);
for (size_t i = 1; i < V.size(); ++i)
V[i] = V[i - 1] ^ (V[i - 1] >> 1);

// Samples
sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i) {
size_t ruler = Ruler(i + 1);
sampleInt = sampleInt ^ V[ruler];
samples[i][1] = float(sampleInt) / std::pow(2.0f, 32.0f);
}

// save bitmap etc
Test2D("2DSobol.bmp", samples);
}

//======================================================================================
void TestHammersley2D (size_t truncateBits)
{
// figure out how many bits we are working in.
size_t value = 1;
size_t numBits = 0;
while (value < NUM_SAMPLES)
{
value *= 2;
++numBits;
}

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
size_t sampleInt = 0;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = 0.0f;
{
size_t n = i >> truncateBits;
float base = 1.0f / 2.0f;
while (n)
{
if (n & 1)
samples[i][0] += base;
n /= 2;
base /= 2.0f;
}
}

// y axis
samples[i][1] = 0.0f;
{
size_t n = i >> truncateBits;
size_t mask = size_t(1) << (numBits - 1 - truncateBits);

float base = 1.0f / 2.0f;
{
samples[i][1] += base;
base /= 2.0f;
}
}
}

// save bitmap etc
char fileName[256];
sprintf(fileName, "2DHammersley_%zu.bmp", truncateBits);
Test2D(fileName, samples);
}

//======================================================================================
void TestRooks2D ()
{
// make and shuffle rook positions
std::random_device rd;
std::mt19937 mt(rd());
std::array<size_t, NUM_SAMPLES> rookPositions;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
rookPositions[i] = i;
std::shuffle(rookPositions.begin(), rookPositions.end(), mt);

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
// x axis
samples[i][0] = float(rookPositions[i]) / float(NUM_SAMPLES-1);

// y axis
samples[i][1] = float(i) / float(NUM_SAMPLES - 1);
}

// save bitmap etc
Test2D("2DRooks.bmp", samples);
}

//======================================================================================
void TestIrrational2D (float irrationalx, float irrationaly, float seedx, float seedy)
{
// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
float samplex = seedx;
float sampley = seedy;
for (size_t i = 0; i < NUM_SAMPLES; ++i)
{
samplex = std::fmodf(samplex + irrationalx, 1.0f);
sampley = std::fmodf(sampley + irrationaly, 1.0f);

samples[i][0] = samplex;
samples[i][1] = sampley;
}

// save bitmap etc
char irrationalxStr[256];
sprintf(irrationalxStr, "%f", irrationalx);
char irrationalyStr[256];
sprintf(irrationalyStr, "%f", irrationaly);
char seedxStr[256];
sprintf(seedxStr, "%f", seedx);
char seedyStr[256];
sprintf(seedyStr, "%f", seedy);
char fileName[256];
sprintf(fileName, "2DIrrational_%s_%s_%s_%s.bmp", &irrationalxStr[2], &irrationalyStr[2], &seedxStr[2], &seedyStr[2]);
Test2D(fileName, samples);
}

//======================================================================================
float MinimumDistance2D (const std::array<std::array<float, 2>, NUM_SAMPLES>& samples, size_t numSamples, float x, float y)
{
// Used by poisson.
// This returns the minimum distance that point (x,y) is away from the sample points, from [0, numSamples).
float minimumDistance = 0.0f;
for (size_t i = 0; i < numSamples; ++i)
{
float distance = Distance(samples[i][0], samples[i][1], x, y);
if (i == 0 || distance < minimumDistance)
minimumDistance = distance;
}
return minimumDistance;
}

//======================================================================================
void TestPoisson2D ()
{
// every time we want to place a point, we generate this many points and choose the one farthest away from all the other points (largest minimum distance)
const size_t c_bestOfAttempts = 100;

// calculate the sample points
std::array<std::array<float, 2>, NUM_SAMPLES> samples;
for (size_t sampleIndex = 0; sampleIndex < NUM_SAMPLES; ++sampleIndex)
{
// generate some random points and keep the one that has the largest minimum distance from any of the existing points
float bestX = 0.0f;
float bestY = 0.0f;
float bestMinDistance = 0.0f;
for (size_t attempt = 0; attempt < c_bestOfAttempts; ++attempt)
{
float attemptX = RandomFloat(0.0f, 1.0f);
float attemptY = RandomFloat(0.0f, 1.0f);
float minDistance = MinimumDistance2D(samples, sampleIndex, attemptX, attemptY);

if (minDistance > bestMinDistance)
{
bestX = attemptX;
bestY = attemptY;
bestMinDistance = minDistance;
}
}
samples[sampleIndex][0] = bestX;
samples[sampleIndex][1] = bestY;
}

// save bitmap etc
Test2D("2DPoisson.bmp", samples);
}

//======================================================================================
int main (int argc, char **argv)
{
// 1D tests
{
TestUniform1D(false);
TestUniform1D(true);

TestUniformRandom1D();

TestSubRandomA1D(2);
TestSubRandomA1D(4);
TestSubRandomA1D(8);
TestSubRandomA1D(16);
TestSubRandomA1D(32);

TestSubRandomB1D();

TestVanDerCorput(2);
TestVanDerCorput(3);
TestVanDerCorput(4);
TestVanDerCorput(5);

// golden ratio mod 1 aka (sqrt(5) - 1)/2
TestIrrational1D(0.618034f, 0.0f);
TestIrrational1D(0.618034f, 0.385180f);
TestIrrational1D(0.618034f, 0.775719f);
TestIrrational1D(0.618034f, 0.287194f);

// sqrt(2) - 1
TestIrrational1D(0.414214f, 0.0f);
TestIrrational1D(0.414214f, 0.385180f);
TestIrrational1D(0.414214f, 0.775719f);
TestIrrational1D(0.414214f, 0.287194f);

// PI mod 1
TestIrrational1D(0.141593f, 0.0f);
TestIrrational1D(0.141593f, 0.385180f);
TestIrrational1D(0.141593f, 0.775719f);
TestIrrational1D(0.141593f, 0.287194f);

TestSobol1D();

TestHammersley1D(0);
TestHammersley1D(1);
TestHammersley1D(2);

TestPoisson1D();
}

// 2D tests
{
TestUniform2D(false);
TestUniform2D(true);

TestUniformRandom2D();

TestSubRandomA2D(2, 2);
TestSubRandomA2D(2, 3);
TestSubRandomA2D(3, 11);
TestSubRandomA2D(3, 97);

TestSubRandomB2D();

TestHalton(2, 3);
TestHalton(5, 7);
TestHalton(13, 9);

TestSobol2D();

TestHammersley2D(0);
TestHammersley2D(1);
TestHammersley2D(2);

TestRooks2D();

// X axis = golden ratio mod 1 aka (sqrt(5)-1)/2
// Y axis = sqrt(2) mod 1
TestIrrational2D(0.618034f, 0.414214f, 0.0f, 0.0f);
TestIrrational2D(0.618034f, 0.414214f, 0.775719f, 0.264045f);

// X axis = sqrt(2) mod 1
// Y axis = sqrt(3) mod 1
TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.0f, 0.0f);
TestIrrational2D(std::fmodf((float)std::sqrt(2.0f), 1.0f), std::fmodf((float)std::sqrt(3.0f), 1.0f), 0.775719f, 0.264045f);

TestPoisson2D();
}

#if CALCULATE_DISCREPANCY
printf("\n");
system("pause");
#endif
}


# Improved Storage Space Efficiency of GPU Texture Sampler Bezier Curve Evaluation

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves (also just polynomials in general as well as rational polynomials, and also surfaces and volumes made by tensor products). You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension improves on the efficiency of the storage space usage, allowing a higher density of curve data per pixel, but the post also talks about some caveats and limitations.

This post is divided into the following sections:

1. Basic Idea of Extension
2. 2D Texture / Quadratic Piecewise Curves
3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity
4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency
5. Real World Limitations
6. 3D Texture / Cubic Piecewise Curves
7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?
8. 3D Texture / Cubic Piecewise Curves – C0 Continuity
9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency
10. Generalizing The Unit Hyper Cube
11. Closing
12. Code

# 1. Basic Idea of Extension

Let’s talk about the base technique before going into the details of the extension.

The image below shows how bilinear interpolation across the diagonal between pixels can calculate points on curves. Bilinear interpolation is exactly equivalent to the De Casteljau algorithm when the u and v coordinate are the same value.

Linear interpolation between two values A and B at time t is done with this formula:
$A(1-t) + Bt$

I’ve found useful to replace (1-t) with it’s own symbol s. That makes it become this:
$As + Bt$

Now, if you bilinear interpolate between 4 values, you have two rows. One row has A,B in it and the other row has C,D in it. To bilinear interpolate between these four values at time (t,t), the formula is this:
$(As + Bt)s + (Cs+Dt)t$

If you expand that and collect like terms you come up with this equation:
$As^2 + (B+C)st + Dt^2$

At this point, the last step is to make B and C the same value (make them both into B) and then rename D to C since that letter is unused. The resulting formula turns out to be the formula for a quadratic Bezier curve. This shows that mathematically, bilinear interpolation can be made to be mathematically the same as the quadratic Bezier formula. (Note: there are extensions to get higher order curves and surfaces as well)
$As^2 + 2Bst + Ct^2$

However, for this extension we are going to take one step back to the prior equation:
$As^2 + (B+C)st + Dt^2$

What you may notice is that the two values in the corners of the 2×2 bilinear interpolation don’t have to be the exact value of the middle control point of the quadratic Bezier curve – they only have to AVERAGE to that value.

This is interesting because to encode two different piecewise quadratic curves (C0-C2 and C3-C5) into a 2d texture before this extension, I would do it like this:

$A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3 \\ E = C_4 \\ F = C_5\\$

That uses 8 pixels to store the 6 control points of the two quadratic curves.

With the ideas of this extension, one way it could look now is this:

$A = C_0 \\ B + C = 2*C_1 : B = 2*C_1 - C_3 \\ D = C_2 \\ C = C_3 \\ D + E = 2*C_4 : E = 2*C_4 - C_2\\ F = C_5\\$

The result is that 6 pixels are used instead of 8, for storing the 6 control points of the two quadratic curves.

That isn’t the only result though, so let’s explore the details (:

# 2. 2D Texture / Quadratic Piecewise Curves

Let’s start by more formally looking at the 2d texture / quadratic curve case.

We are going to number the pixels by their texture coordinate location (in the form of Pyx) instead of using letters. Later on that will help show a pattern of generalization. We are still using the same notation for control points where C0 is the first control point, C1 is the second control point and so on.

Looking at a single quadratic curve we have this texture which has these constraints on it’s pixel values:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\$

To analyze this, let’s make an augmented matrix. The left matrix is a 3×4 matrix where each column is a pixel and each row is the left side of the equation for a constraint. The right matrix is a 3×3 matrix where each column is a control point and each row is the right side of the equation for a constraint. The first row of the matrix is column labels to help see what’s going on more easily.

Note that i put my pixel columns and control point columns in ascending order in the matrix, but if you put them in a different order, you’d get the same (or equivalent) results as I did. It’s just my convention they are in this order.

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

The next step would be to put this matrix into reduced row echelon form to solve the equations to see what the values of the pixels need to be, but the matrix is in fact already in rref form! (For more information on rref, check out my last post: Solving N equations and N unknowns: The Fine Print (Gauss Jordan Elimination))

What we can see by looking at the rref of the matrix is that either P01 or P10 can be a free variable – meaning we can choose whatever value we want for it. After we choose a value for either of those variables (pixels), the rest of the pixels are fully defined.

Deciding that P10 is the free variable (just by convention that it isn’t the leading non zero value), the second equation (constraint) becomes P01 = 2*C1-P10.

If we choose the value C1 for P10, that means that P01 must equal C1 too (this is how the original technique worked). If we choose 0 for P10, that means that P01 must equal 2*C1. This is because P01 must always equal 2*C1-P10. We then are in the new territory of this extension, where the pixels representing the middle control point have some freedom about what values they can take on, so long as they average to the middle control point value.

Let’s add a row of pixels and try encoding a second quadratic curve:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5$

Let’s again make an augmented matrix with pixels on the left and control points on the right.

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]$

Putting that into rref to solve for the pixel values we get this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 \end{array}\right]$

We got the identity matrix on the left, so we don’t have any inconsistencies or free variables.

If we turn that matrix back into equations we get this:

$P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = C_2 \\ P_{20} = 2*C_4 - C_2 \\ P_{21} = C_5$

We were successful! We can store two piecewise Bezier curves in 6 pixels by setting the pixel values to these specific values.

The last example we’ll show is the next stage, where it falls apart. We’ll add another row of pixels and try to encode 3 Bezier curves (9 control points) into those 8 pixels.

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5 \\ P_{20} = C_6 \\ P_{21}+P_{30} = 2*C_7 \\ P_{31} = C_8$

This is the augmented matrix with pixels on the left and control points on the right:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

The rref form is:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & -2 & 0 & 1 & 0 & 0 \\ \end{array}\right]$

Let’s turn that back into equations.

$P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = 2*C_4 - C_6 \\ P_{20} = C_6 \\ P_{21} = C_5 \\ P_{30} = 2*C_7 - C_5 \\ P_{31} = C_8 \\ 0 = C_2 - 2*C_4 + C_6$

We have a problem unfortunately! The bottom row says this:

$0 = C_2 - 2*C_4 + C_6$

That means that we can only store these curves in this pixel configuration if we limit the values of the control points 2,4,6 to values that make that last equation true.

Since my desire is to be able to store curves in textures without “unusual” restrictions on what the control points can be, I’m going to count this as a failure for a general case solution.

It only gets worse from here for the case of trying to add another row of pixels for each curve you want to add.

It looks like storing two quadratic curves in a 2×6 group of pixels is the most optimal (data dense) storage. If you go any higher, it puts restrictions on the control points. If you go any lower, you have a free variable, which means you aren’t making full use of all of the pixels you have.

This means that if you are storing piecewise quadratic curves in 2d textures, doing it this way will cause you to use 3/4 as many pixels as doing it the other way, and you will be using 1 pixel per control point stored, instead of 1.333 pixels per control point stored.

This isn’t the end of the story though, so let’s continue (:

# 3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity

If we add the requirement that our piecewise curves must be connected (aka that they have C0 continuity), we can actually do something pretty interesting. Take a look at this setup:

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5$

Putting this into matrix form looks like this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

In rref it becomes this:

$\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 & 0 & -2 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 \\ \end{array}\right]$

Turning the rref back into equations we get:

$P_{00} = C_0 \\ P_{01} - P_{21} = 2*C_1-2*C_4 \\ P_{10} + P_{21} = 2*C_4 \\ P_{11} = C_3 \\ P_{20} = C_5 \\ 0 = C_2 - C_3$

P21 is a free variable, so we can set it to whatever we want. Once we choose a value, the pixel values P01 and P10 will be fully defined.

The bottom equation might have you worried, because it looks like an inconsistency (aka restriction) but it is actually expected.

That last equation says 0 = C2-C3 which can be rearranged into C2 = C3. That just means that the end of our first curve has to equal the beginning of our second curve. That is C0 just the continuity we already said we’d agree to.

So, it worked! Let’s try adding a row of pixels and another curve to see what happens.

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5\\ P_{20} = C_6\\ P_{21}+P_{30} = 2*C_7\\ P_{31} = C_8$

Putting that into matrix form:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ \end{array}\right]$

And in rref:

$\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 2 & 0 & 0 & -2 & 0 & 0 & 2 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & -2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0\\ \end{array}\right]$

Turning the rref back into equations:

$P_{00} = C_0 \\ P_{01}+P_{30} = 2*C_1 - 2*C_4+2*C_7 \\ P_{10}-P_{30} = 2*C_4-2*C_7 \\ P_{11} = C_3 \\ P_{20} = C_6 \\ P_{21} + P_{30} = 2*C_7\\ P_{31} = C_8\\ 0 = C_2 - C_3\\ 0 = C_5 - C_6\\$

We see that P30 is a free variable, and the last two rows show us we have the C0 continuity requirements: C2 = C3 and C5 = C6.

The last section without C0 continuity reached it’s limit of storage space efficiency after storing two curves (6 control points) in 6 pixels.

When we add the C0 continuity requirement, we were able to take it further and store 3 curves in 8 pixels. Technically those 3 curves have 9 control points, but because the end point of each curve has to be the same as the start point of the next curve it makes it so in reality there is only 3 control points for the first curve and then 2 additional control points for each additional curve. That makes 8 control points for 3 curves, not 9.

Unlike the last section, using this zigzag pattern with C0 continuity, you can encode any number of curves. I am not sure how to prove it, but from observation, there is no sign of any shrinking of capacity as we increase the number of curves, adding two more rows of pixels for each curve. If you know how to prove this more formally, please let me know!

Note that instead of explicitly having 3 control points per curve, where the first control point of a curve has to equal the last control point of the previous curve, you can instead describe the piecewise curves with fewer control points. You need 3 control points for the first curve, and then 2 control points for each curve after that.

Mathematically both ways are equivelant and you’ll get to the same answer. The accompanying source code works that way, but I show this example in this longer way to more explicitly show how things work.

# 4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency

Let’s compare the storage efficiency of the last two sections to each other, as well as to the original technique.

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 6 & 1.00 & 3.00 \\ 3 & 2x5 & 10 & 9 & 1.11 & 3.33 \\ 4 & 2x6 & 12 & 12 & 1.00 & 3.00 \\ 5 & 2x8 & 16 & 15 & 1.06 & 3.20 \\ 6 & 2x9 & 18 & 18 & 1.00 & 3.00 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 5 & 1.20 & 3.00 \\ 3 & 2x4 & 8 & 7 & 1.14 & 2.66 \\ 4 & 2x5 & 10 & 9 & 1.11 & 2.50 \\ 5 & 2x6 & 12 & 11 & 1.09 & 2.40 \\ 6 & 2x7 & 14 & 13 & 1.08 & 2.33 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x4 & 8 & 6 & 1.33 & 4.00 \\ 3 & 2x6 & 12 & 9 & 1.33 & 4.00 \\ 4 & 2x8 & 16 & 12 & 1.33 & 4.00 \\ 5 & 2x10 & 20 & 15 & 1.33 & 4.00 \\ 6 & 2x12 & 24 & 18 & 1.33 & 4.00 \\ \hline \end{array}$

The tables show that the first method uses fewer pixels per control point, while the second method uses fewer pixels per curve.

The first method can get you to what I believe to be the maximum density of 1 pixel per control point if you store an even number of curves. It can also give you a curve for every 3 pixels of storage.

The second method approaches the 1 pixel per control point as you store more and more curves and also approaches 2 pixels of storage per curve stored. Note that the second method’s table is using the convention of 3 control points are used for the first curve, and 2 additional control points for each curve after that.

The deciding factor for which method to use is probably going to be whether or not you want to force C0 continuity of your curve data. If so, you’d use the second technique, else you’d use the first.

The original technique uses a constant 1.33 pixels per control point, and 4 pixels to store each curve. Those numbers shows how this extension improves on the storage efficiency of the original technique.

# 5. Real World Limitations

This extension has a problem that the original technique does not have unfortunately.

While the stuff above is correct mathematically, there are limitations on the values we can store in actual textures. For instance, if we have 8 bit uint8 color channels we can only store values 0 to 255.

Looking at one of the equations $P_{01} = 2*C_1 - C_3$, if C1 is 255 and C3 is 0, we are going to need to store 510 in the 8 bits we have for P01, which we can’t. If C1 is 0 and C3 is not zero, we are going to have to store a negative value in the 8 bits we have for P01, which we can’t.

This becomes less of a problem when using 16 bit floats per color channel, and is basically solved when using 32 bit floats per color channels, but that makes the technique hungrier for storage and less efficient again.

While that limits the usefulness of this extension, there are situations where this would still be appropriate – like if you already have your data stored in 16 or 32 bit color channels like some data (eg position data) would require..

The extension goes further, into 3d textures and beyond though, so let’s explore a little bit more.

# 6. 3D Texture / Cubic Piecewise Curves

The original technique talks about how to use a 2x2x2 3d volume texture to store a cubic Bezier curve (per color channel) and to retrieve it by doing a trilinear interpolated texture read.

If you have four control points A,B,C,D then the first slice of the volume texture will be a 2d texture storing the quadratic Bezier curve defined by A,B,C and the second slice will store B,C,D. You still sample along the diagonal of the texture but this time it’s a 3d diagonal instead of 2d. Here is that setup, where the texture is sampled along the diagonal from from A to D:

$A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3$

Let’s look at what this extension means for 3d textures / cubic curves.

The equation for a cubic Bezier curve looks like this:

$As^3 + 3Bs^2t + 3Cst^2 + Dt^3$

If we derive that from trilinear interpolation between 8 points A,B,C,D,E,F,G,H, the second to last step would look like this:

$As^3 + (B+C+E)s^2t + (D+F+G)st^2 + Ht^3$

So, similar to our 2d setup, we have some freedom about our values.

In the original technique, B,C,E would have to be equal to the second control point, and D,F,G would have to be equal to the third control point. With the new extension, in both cases, those groups of values only have to AVERAGE to their specific control points. Once again, this gives us some freedoms for the values we can use, and lets us use our pixels more efficiently.

Here is the setup, again using texture coordinates (in the form Pzyx) for the pixels instead of letters.

$P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3$

here’s how the equations look in matrix form, which also happens to already be in rref:

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

P010 and P100 are free variables and so are P101 and P110, making a total of four free variables. They can be set to any value desired, which will then define the value that P001 and P011 need to be.

Let’s add another piecewise cubic Bezier curve, and another row of pixels to the texture to see what happens.

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{010} = C_{4}\\ P_{011} + P_{020} + P_{110} = 3C_{5}\\ P_{021} + P_{111} + P_{120} = 3C_{6}\\ P_{121} = C_{7}\\$

Here are the equations in matrix form:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Here it is in rref:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Putting that back into equations we have this:

$P_{000} = C_{0}\\ P_{001} + P_{100} = 3C_{1} + -C_{4}\\ P_{010} = C_{4}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{020} + -P_{101} = -3C_{2} + 3C_{5}\\ P_{021} + P_{120} = -C_{3} + 3C_{6}\\ P_{111} = C_{3}\\ P_{121} = C_{7}\\$

The result is that we still have four free variables: P100, P101, P110 and P120. When we give values to those pixels, we will then be able to calculate the values for P001, P011, P020 and P021.

There is a limit to this pattern though. Where the maximum number of curves to follow the pattern was 2 with the 2d / quadratic case, the maximum number of curves to follow this pattern with the 3d / cubic case is 3. As soon as you try to put 4 curves in this pattern it fails by having constraints. Interestingly, we still have 4 free variables when putting 3 curves in there, so it doesn’t follow the 2d case where free variables disappeared as we put more curves in, indicating when the failure would happen.

If you know how to more formally analyze when these patterns of equations will fail, please let me know!

# 7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?

Looking at the 3d texture case of 2x2x2 storing a single curve, I saw that there were 4 free variables. Since it takes 4 control points to define a cubic curve, I wondered if we could use those 4 free variables to encode another cubic curve.

Here’s a setup where the x axis is flipped for the second curve. It’s a little bit hard to tell from the diagram, but the blue line does still go through the center of the 3d cube. It goes from P001 to P110, while the first curve still goes from P000 to P111.

Here’s what the equations look like:

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3*C_{1}\\ P_{011} + P_{101} + P_{110} = 3*C_{2}\\ P_{111} = C_{3}\\ P_{001} = C_{4}\\ P_{000} + P_{011} + P_{101} = 3*C_{5}\\ P_{010} + P_{100} + P_{111} = 3*C_{6}\\ P_{110} = C_{7}\\$

And in matrix form:

$\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

After putting the matrix in rref to solve the equations, we get this matrix:

$\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 3 & 0 & 0 & -3 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & \frac{1}{3} & \frac{-1}{3} & 0 & -1 & 0 \\ \end{array}\right]$

Which is this set of equations:

$P_{000} = -3C_{2} + 3C_{5} + C_{7}\\ P_{001} = C_{4}\\ P_{010} + P_{100} = -C_{3} + 3C_{6}\\ P_{011} + P_{101} = 3C_{2} + -C_{7}\\ P_{110} = C_{7}\\ P_{111} = C_{3}\\ 0 = C_{0} + 3C_{2} - 3C_{5} - C_{7}\\ 0 = C_{1} + C_{3}/3 - C_{4}/3 + -C_{6}\\$

In the end there are 2 free variables, but also 2 constraints on the values that the control points can take. The constraints mean it doesn’t work which is unfortunate. That would have been a nice way to bring the 3d / cubic case to using 1 pixel per control point!

I also tried other variations like flipping y or z along with x (flipping all three just makes the first curve in the reverse direction!) but couldn’t find anything that worked. Too bad!

# 8. 3D Texture / Cubic Piecewise Curves – C0 Continuity

Since the regular 3d texture / cubic curve pattern has a limit (3 curves), let’s look at the C0 continuity version like we did for the 2d texture / quadratic case where we sample zig zag style.

Since the sampling has to pass through the center of the cube, we need to flip both x and z each curve.

That gives us a setup like this:

Here are the constraints for the pixel values:

$P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{020} = C_{6}\\$

Which looks like this in matrix form:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

Here is the matrix solved in rref:

$\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & -1 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & 0 & 0 & -3 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & -3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ \end{array}\right]$

And here is that matrix put back into equations form:

$P_{000} = C_{0}\\ P_{001} - P_{021} + P_{100} - P_{120} = 3C_{1} - 3C_{5}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{020} = C_{6}\\ P_{101} - P_{121} = 3C_{2} - 3C_{4}\\ P_{111} = C_{3}\\$

It worked! It also has 5 free variables.

This pattern works for as many curves as i tried (21 of them), and each time you add another curve / row of this pattern you gain another free variable.

So, storing 2 curves results in 6 free variables, 3 curves has 7 free variables, 4 curves has 8 free variables and so on.

# 9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency

Let’s compare storage efficiency of these 3d texture / cubic curve techniques like we did for the 2d texture / quadratic curve techniques.

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 8 & 1.50 & 6.00 \\ 3 & 2x4x2 & 16 & 12 & 1.33 & 5.33 \\ 4 & 2x6x2 & 24 & 16 & 1.50 & 6.00 \\ 5 & 2x7x2 & 28 & 20 & 1.40 & 5.60 \\ 6 & 2x8x2 & 32 & 24 & 1.33 & 5.33 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 7 & 1.71 & 6.00 \\ 3 & 2x4x2 & 16 & 10 & 1.60 & 5.33 \\ 4 & 2x5x2 & 20 & 13 & 1.54 & 5.00 \\ 5 & 2x6x2 & 24 & 16 & 1.50 & 4.80 \\ 6 & 2x7x2 & 28 & 19 & 1.47 & 4.67 \\ \hline \end{array}$

$\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x4x2 & 16 & 8 & 2.00 & 8.00 \\ 3 & 2x6x2 & 24 & 12 & 2.00 & 8.00 \\ 4 & 2x8x2 & 32 & 16 & 2.00 & 8.00 \\ 5 & 2x10x2 & 40 & 20 & 2.00 & 8.00 \\ 6 & 2x12x2 & 48 & 24 & 2.00 & 8.00 \\ \hline \end{array}$

The original technique had a constant 2 pixels per control point and 8 pixels per cubic curve.

The basic extension lets you bring that down to 1.33 pixels per control point, and 5.33 pixels per curve.

If C0 continuity is desired, as you store more and more curves the extension can bring things down towards 1.33 pixels per control point, and 4 pixels per curve. (Remember that with the C0 extension you have 4 control points for the first curve and then 3 more for each subsequent curve, so that 1.33 pixels per control point isn’t exactly an apples to apples comparison vs the basic extension).

The pattern continues for 4D textures and higher (for higher than cubic curves too!), but working through the 2d and 3d cases for quadratic / cubic curves is the most likely usage case both because 4d textures and higher are kind of excessive (probably you’d need to do multiple texture reads to simulate them), but also when fitting curves to data, quadratic and cubic curves tend to do well in that they don’t usually overfit the data or have as many problems with ringing.

Despite that, I do think it’s useful to look at it from an N dimensional point of view to see the larger picture, so let’s do that next.

# 10. Generalizing The Unit Hyper Cube

Let’s ignore the zig zag sampling pattern and storing multiple curves in a texture and just get back to the basic idea.

Given an N dimensional texture that is 2x2x…x2 that you are going to sample across the diagonal to get a degree N Bezier curve from, how do you know what values to put in which control points to use this technique?

You could derive it from N-linear interpolation, but that is a lot of work.

The good news is it turns out there is a simple pattern, that is also pretty interesting.

Let’s check out the 1d, 2d and 3d cases to see what patterns we might be able to see.

1d Texture / linear Bezier / linear interpolation:

$P_{0} = C_0 \\ P_{1} = C_1 \\$

$\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]$

$P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\$

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

3d Texture / Cubic Bezier:

$P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3$

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

The first pattern you might see is that the right side of the equations for an N dimensional hypercube is the identity matrix, but instead of using 1 for the value along the diagonal, it uses values from Pascal’s Triangle (binomial coefficients).

To simplify this a bit though, we could also notice that the number on the right side of the equation equals the sum of the numbers on the left side of the equation. Mathematically it would be the same to say that the numbers on the left side of the equation have to sum up to 1. This would make the matrix on the right just be the identity matrix and we can forget about Pascal’s triangle numbers (they will show up implicitly as divisors of the left side equation coefficients but there’s no need to explicitly calculate them).

But then we are still left with the matrix on the left. How do we know which pixels belong in which rows?

It turns out there is another interesting pattern here. In all the matrices above it follows this pattern:

• Row 0 has a “1” wherever the pixel coordinate has 0 ones set
• Row 1 has a “1” wherever the pixel coordinate has 1 ones set
• Row 2 has a “1” wherever the pixel coordinate has 2 ones set
• Row 3 has a “1” wherever the pixel coordinate has 3 ones set
• ….

That pattern continues indefinitely, but don’t forget that the numbers (coefficients) on the left side of the equation must add up to one.

Here is the matrix form of 1d / linear, 2d / quadratic, and 3d / cubic again with the right matrix being the identity matrix, and the equations below them. Notice the pattern about counts of one bits in each row!

1D:

$\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]$

$P_0 = C_0 \\ P_1 = C_1 \\$

2D:

$\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]$

$P_{00} = C_0 \\ P_{01}/2 + P_{10}/2 = C_1 \\ P_{11} = C_2 \\$

3D:

$\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3} & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ \end{array}\right]$

$P_{000} = C_0 \\ P_{001}/3 + P_{010}/3 + P_{100}/3 = C_1 \\ P_{011}/3 + P_{101}/3 + P_{110}/3 = C_2 \\ P_{111} = C_3 \\$

Here are the formulas for linear, quadratic and cubic Bezier curves to show a different way of looking at this. Below each is the same formula but with the 1d, 2d and 3d pixels in the formula instead of the control points, using the formulas above which relate pixel values to control point values. Note that I have replaced (1-t) with s for easier reading.

$f(t) = As + Bt \\ \\ f(t) = P_0s + P_1t\\$

$f(t) = As^2 + 2Bst + Ct^2 \\ \\ f(t) = P_{00}s^2 + (P_{01}+P_{10})st + P_{11}t^2 \\$

$f(t) = As^3 + 3Bs^2t + 3Cst^2 + Dt^3 \\ f(t) = P_{000}s^3 + (P_{001}+P_{010}+P_{100})s^2t + (P_{011}+P_{101}+P_{110})st^2 + P_{111}t^3$

I think it’s really interesting how in the last equation as an example, “3B” literally becomes 3 values which could have the value of B. In the plain vanilla technique they did have the value of B. In this extension, the only requirement is that they average to B.

It’s also interesting to notice that if you have an N bit number and you count how many permutations have each possible number of bits turned on, the resulting counts is the Pascal’s triangle row. That is nothing new, but it seems like there might be a fun way to convert a set of random numbers (white noise) into a Gaussian distribution, just by counting how many one bits there were in each number. That isn’t new either, and there are better algorithms, but still I think it’s an interesting idea, and may be useful in a pinch since it seems pretty computationally inexpensive.

# 11. Closing

This extension makes storage efficiency a bit better than the plain vanilla technique, especially if you are interested in C0 continuous curves.

The extension does come at a price though, as you may find yourself in a situation where you need to store a value that is outside of the possible values for common data formats to store (such as needing to store a negative number or a larger than 255 number in a uint8).

Even so, if these three criteria are met:

1. You are already storing data in textures. (Counter point: compute is usually preferred over texture lookup these days)
2. You are relying on the texture interpolator to interpolate values between data points. (Counter point: if you don’t want the interpolation, use a buffer instead so you fit more of the data you actually care about in the cache)
3. You are storing data in 16 or 32 bit real numbers. (Counter point: uint8 is half as large as 16 bit and a quarter as large as 32 bit already)

Then this may be an attractive solution for you, even over the plain vanilla technique.

For future work, I think it would be interesting to see how this line of thinking applies to surfaces.

I also think there is probably some fertile ground looking into what happens when sampling off of the diagonal of the textures. Intuitively it seems you might be able to store some special case higher order curves in lower dimension / storage textures, but I haven’t looked into it yet.

A common usage case when encoding data in a texture would probably include putting curves side by side on the x axis of the texture. It could be interesting to look into whether curves need to be completely separate from each other horizontally (aka 2 pixel of width for each track of curves in the texture), or if you could perhaps fit two curves side by side in a 3 pixel width, or any similar ideas.

Lastly, when looking at these groups of points on these N dimensional hyper-cubes, I can’t help but wonder what kinds of shapes they are. Are they simplices? If so, is there a pattern to the dimensions they are of?

It’s a bit hard to visualize, but taking a look at the first few rows of pascal’s triangle / hyper cubes here’s what I found:

• Dimension 1 (line) : Row 2 = 1,1. Those are both points, so are simplices of 0 dimension.
• Dimension 2 (square) : Row 3 = 1,2,1. The 1’s are points, the 2 is a line, so are simplices of dimension 0, 1, 0.
• Dimension 3 (cube) : Row 4 = 1,3,3,1. The 1’s are still points. The 3’s are in fact triangles, I checked. So, simplices of dimension 0, 2, 2, 0.
• Dimension 4 (hypercube) : Row 5 = 1,4,6,4,1. The 1’s are points. The 4’s are tetrahedrons. The 6 is a 3 dimensional object. I’m not sure it’s shape but that makes it not be a simplex. Possibly it’s two simplices fused together some how. I don’t really know. So, the dimensions anyways are: 0, 3, 3, 3, 0.
• Beyond? That’s as far as I looked. If you look further / deeper and find anything interesting please share!

Update: PBS Infinite Series ended up posting a video on the topic of hypercube slices and pascal’s triangle (seriously!). Give it a watch if you are interested in how these things relate: Dissecting Hypercubes with Pascal’s Triangle | Infinite Series

If you have a usage case for this or any of the related techniques, I’d love to hear about them.

# 12. Code

It’s easy to talk about things and claim that everything is correct, when in fact, the moment you try it, everything falls apart.

I made up some simple standalone c++ code that goes through the cases we talked about, doing the math we did, and also verifying that the texture interpolation is equivalent to actually calculating Bezier curves (using Bernstein polynomials).

You can also use this code as a starting point to explore higher curve counts or other storage patterns. It uses only standard includes and no libraries, so it should be real easy to drop this into a compiler and start experimenting.

Here’s some example output, which shows 6 cubic curves stored in a 3d texture using the zig zag sampling pattern.

Here’s the code:

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <stdlib.h>
#include <array>
#include <algorithm>
#include <unordered_set>
#include <random>
#include <vector>

#define SHOW_MATHJAX_MATRIX() 0
#define SHOW_MATHJAX_EQUATIONS() 0
#define SHOW_EQUATIONS_BEFORE_SOLVE() 0
#define EQUALITY_TEST_SAMPLES 1000

typedef int32_t TINT;

TINT CalculateGCD (TINT smaller, TINT larger);
TINT CalculateLCM (TINT smaller, TINT larger);

// A rational number, to handle fractional numbers without typical floating point issues
struct CRationalNumber
{
CRationalNumber (TINT numerator = 0, TINT denominator = 1)
: m_numerator(numerator)
, m_denominator(denominator)
{ }

TINT m_numerator;
TINT m_denominator;

CRationalNumber Reciprocal () const
{
return CRationalNumber(m_denominator, m_numerator);
}

void Reduce ()
{
if (m_numerator != 0 && m_denominator != 0)
{
TINT div = CalculateGCD(m_numerator, m_denominator);
m_numerator /= div;
m_denominator /= div;
}

if (m_denominator < 0)
{
m_numerator *= -1;
m_denominator *= -1;
}

if (m_numerator == 0)
m_denominator = 1;
}

bool IsZero () const
{
return m_numerator == 0 && m_denominator != 0;
}

// NOTE: the functions below assume Reduce() has happened
bool IsOne () const
{
return m_numerator == 1 && m_denominator == 1;
}

bool IsMinusOne () const
{
return m_numerator == -1 && m_denominator == 1;
}

bool IsWholeNumber () const
{
return m_denominator == 1;
}
};

// Define a vector as an array of rational numbers
template<size_t N>
using TVector = std::array<CRationalNumber, N>;

// Define a matrix as an array of vectors
template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

//===================================================================================================================================
//                                              GCD / LCM
//===================================================================================================================================

// from my blog post: https://blog.demofox.org/2015/01/24/programmatically-calculating-gcd-and-lcm/

TINT CalculateGCD (TINT smaller, TINT larger)
{
// make sure A <= B before starting
if (larger < smaller)
std::swap(smaller, larger);

// loop
while (1)
{
// if the remainder of larger / smaller is 0, they are the same
// so return smaller as the GCD
TINT remainder = larger % smaller;
if (remainder == 0)
return smaller;

// otherwise, the new larger number is the old smaller number, and
// the new smaller number is the remainder
larger = smaller;
smaller = remainder;
}
}

TINT CalculateLCM (TINT A, TINT B)
{
// LCM(A,B) = (A/GCD(A,B))*B
return (A / CalculateGCD(A, B))*B;
}

//===================================================================================================================================
//                                              RATIONAL NUMBER MATH
//===================================================================================================================================

void CommonDenominators (CRationalNumber& a, CRationalNumber& b)
{
TINT lcm = CalculateLCM(a.m_denominator, b.m_denominator);

a.m_numerator *= lcm / a.m_denominator;
b.m_numerator *= lcm / b.m_denominator;

a.m_denominator = lcm;
b.m_denominator = lcm;
}

bool operator == (const CRationalNumber& a, const CRationalNumber& b)
{
CRationalNumber _a(a), _b(b);
CommonDenominators(_a, _b);
return _a.m_numerator == _b.m_numerator;
}

void operator *= (CRationalNumber& a, const CRationalNumber& b)
{
a.m_numerator *= b.m_numerator;
a.m_denominator *= b.m_denominator;
}

CRationalNumber operator * (const CRationalNumber& a, const CRationalNumber& b)
{
return CRationalNumber(a.m_numerator * b.m_numerator, a.m_denominator * b.m_denominator);
}

void operator -= (CRationalNumber& a, const CRationalNumber& b)
{
CRationalNumber _b(b);
CommonDenominators(a, _b);
a.m_numerator -= _b.m_numerator;
}

//===================================================================================================================================
//                                              GAUSS-JORDAN ELIMINATION CODE
//===================================================================================================================================

// From my blog post: https://blog.demofox.org/2017/04/10/solving-n-equations-and-n-unknowns-the-fine-print-gauss-jordan-elimination/

// Make a specific row have a 1 in the colIndex, and make all other rows have 0 there
template <size_t M, size_t N>
bool MakeRowClaimVariable (TMatrix<M, N>& matrix, size_t rowIndex, size_t colIndex)
{
// Find a row that has a non zero value in this column and swap it with this row
{
// Find a row that has a non zero value
size_t nonZeroRowIndex = rowIndex;
while (nonZeroRowIndex < M && matrix[nonZeroRowIndex][colIndex].IsZero())
++nonZeroRowIndex;

// If there isn't one, nothing to do
if (nonZeroRowIndex == M)
return false;

// Otherwise, swap the row
if (rowIndex != nonZeroRowIndex)
std::swap(matrix[rowIndex], matrix[nonZeroRowIndex]);
}

// Scale this row so that it has a leading one
CRationalNumber scale = matrix[rowIndex][colIndex].Reciprocal();
for (size_t normalizeColIndex = colIndex; normalizeColIndex < N; ++normalizeColIndex)
{
matrix[rowIndex][normalizeColIndex] *= scale;
matrix[rowIndex][normalizeColIndex].Reduce();
}

// Make sure all rows except this one have a zero in this column.
// Do this by subtracting this row from other rows, multiplied by a multiple that makes the column disappear.
for (size_t eliminateRowIndex = 0; eliminateRowIndex < M; ++eliminateRowIndex)
{
if (eliminateRowIndex == rowIndex)
continue;

CRationalNumber scale = matrix[eliminateRowIndex][colIndex];
for (size_t eliminateColIndex = 0; eliminateColIndex < N; ++eliminateColIndex)
{
matrix[eliminateRowIndex][eliminateColIndex] -= matrix[rowIndex][eliminateColIndex] * scale;
matrix[eliminateRowIndex][eliminateColIndex].Reduce();
}
}

return true;
}

// make matrix into reduced row echelon form
template <size_t M, size_t N>
void GaussJordanElimination (TMatrix<M, N>& matrix)
{
size_t rowIndex = 0;
for (size_t colIndex = 0; colIndex < N; ++colIndex)
{
if (MakeRowClaimVariable(matrix, rowIndex, colIndex))
{
++rowIndex;
if (rowIndex == M)
return;
}
}
}

//===================================================================================================================================
//                                                           Shared Testing Code
//===================================================================================================================================

template <size_t M, size_t N, typename LAMBDA>
void PrintEquations (
TMatrix<M, N>& augmentedMatrix,
size_t numPixels,
LAMBDA& pixelIndexToCoordinates
)
{
char pixelCoords[10];

#if SHOW_MATHJAX_MATRIX()
// print the matrix opening stuff
printf("\left[\begin{array}{");
for (size_t i = 0; i < N; ++i)
{
if (i == numPixels)
printf("|");
printf("r");
}
printf("}n");
for (size_t i = 0; i < numPixels; ++i)
{
pixelIndexToCoordinates(i, pixelCoords);
if (i == 0)
printf("P_{%s}", pixelCoords);
else
printf(" & P_{%s}", pixelCoords);
}
for (size_t i = numPixels; i < N; ++i)
{
printf(" & C_{%zu}", i-numPixels);
}
printf(" \\n");

// Print the matrix
for (const TVector<N>& row : augmentedMatrix)
{
bool first = true;
for (const CRationalNumber& n : row)
{
if (first)
first = false;
else
printf(" & ");

if (n.IsWholeNumber())
printf("%i", n.m_numerator);
else
printf("\frac{%i}{%i}", n.m_numerator, n.m_denominator);
}
printf(" \\n");
}

// print the matrix closing stuff
printf("\end{array}\right]n");
#endif

// print equations
for (const TVector<N>& row : augmentedMatrix)
{
// indent
#if SHOW_MATHJAX_EQUATIONS() == 0
printf("    ");
#endif

// left side of the equation
bool leftHasATerm = false;
for (size_t i = 0; i < numPixels; ++i)
{
if (!row[i].IsZero())
{
if (leftHasATerm)
printf(" + ");
pixelIndexToCoordinates(i, pixelCoords);

#if SHOW_MATHJAX_EQUATIONS()
if (row[i].IsOne())
printf("P_{%s}", pixelCoords);
else if (row[i].IsMinusOne())
printf("-P_{%s}", pixelCoords);
else if (row[i].IsWholeNumber())
printf("%iP_{%s}", row[i].m_numerator, pixelCoords);
else if (row[i].m_numerator == 1)
printf("P_{%s}/%i", pixelCoords, row[i].m_denominator);
else
printf("P_{%s} * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
#else
if (row[i].IsOne())
printf("P%s", pixelCoords);
else if (row[i].IsMinusOne())
printf("-P%s", pixelCoords);
else if (row[i].IsWholeNumber())
printf("%iP%s", row[i].m_numerator, pixelCoords);
else if (row[i].m_numerator == 1)
printf("P%s/%i", pixelCoords, row[i].m_denominator);
else
printf("P%s * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
#endif
leftHasATerm = true;
}
}
if (!leftHasATerm)
printf("0 = ");
else
printf(" = ");

// right side of the equation
bool rightHasATerm = false;
for (size_t i = numPixels; i < N; ++i)
{
if (!row[i].IsZero())
{
if (rightHasATerm)
printf(" + ");

#if SHOW_MATHJAX_EQUATIONS()
if (row[i].IsOne())
printf("C_{%zu}", i - numPixels);
else if (row[i].IsMinusOne())
printf("-C_{%zu}", i - numPixels);
else if (row[i].IsWholeNumber())
printf("%iC_{%zu}", row[i].m_numerator, i - numPixels);
else if (row[i].m_numerator == 1)
printf("C_{%zu}/%i", i - numPixels, row[i].m_denominator);
else
printf("C_{%zu} * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
#else
if (row[i].IsOne())
printf("C%zu", i - numPixels);
else if (row[i].IsMinusOne())
printf("-C%zu", i - numPixels);
else if (row[i].IsWholeNumber())
printf("%iC%zu", row[i].m_numerator, i - numPixels);
else if (row[i].m_numerator == 1)
printf("C%zu/%i", i - numPixels, row[i].m_denominator);
else
printf("C%zu * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
#endif
rightHasATerm = true;
}
}

#if SHOW_MATHJAX_EQUATIONS()
printf("\\n");
#else
printf("n");
#endif
}
}

template <size_t M, size_t N, typename LAMBDA>
bool SolveMatrixAndPrintEquations (
TMatrix<M, N>& augmentedMatrix,
size_t numPixels,
std::unordered_set<size_t>& freeVariables,
LAMBDA& pixelIndexToCoordinates
)
{
#if SHOW_EQUATIONS_BEFORE_SOLVE()
printf("   Initial Equations:n");
PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);
printf("   Solved Equations:n");
#endif

// put augmented matrix into rref
GaussJordanElimination(augmentedMatrix);

// Print equations
PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);

// Get free variables and check for control point constraint
bool constraintFound = false;
for (const TVector<N>& row : augmentedMatrix)
{
bool leftHasATerm = false;
for (size_t i = 0; i < numPixels; ++i)
{
if (!row[i].IsZero())
{
if (leftHasATerm)
freeVariables.insert(i);
else
leftHasATerm = true;
}
}

bool rightHasATerm = false;
for (size_t i = numPixels; i < N; ++i)
{
if (!row[i].IsZero())
rightHasATerm = true;
}

if (!leftHasATerm && rightHasATerm)
constraintFound = true;
}

printf("  %zu free variables.n", freeVariables.size());

if (constraintFound)
{
printf("  Constraint Found.  This configuration doesn't work for the general case!nn");
return false;
}

return true;
}

float lerp (float t, float a, float b)
{
return a * (1.0f - t) + b * t;
}

template <size_t NUMPIXELS, size_t NUMCONTROLPOINTS, size_t NUMEQUATIONS>
void FillInPixelsAndControlPoints (
std::array<float, NUMPIXELS>& pixels,
std::array<float, NUMCONTROLPOINTS>& controlPoints,
const TMatrix<NUMEQUATIONS, NUMPIXELS+ NUMCONTROLPOINTS>& augmentedMatrix,
const std::unordered_set<size_t>& freeVariables)
{
// come up with random values for the control points and free variable pixels
static std::random_device rd;
static std::mt19937 mt(rd());
static std::uniform_real_distribution<float> dist(-10.0f, 10.0f);
for (float& cp : controlPoints)
cp = dist(mt);
for (size_t var : freeVariables)
pixels[var] = dist(mt);

// fill in the non free variable pixels per the equations
for (const TVector<NUMPIXELS + NUMCONTROLPOINTS>& row : augmentedMatrix)
{
// the first non zero value is the non free pixel we need to set.
// all other non zero values are free variables that we previously calculated values for
bool foundPixel = false;
size_t pixelIndex = 0;
for (size_t i = 0; i < NUMPIXELS; ++i)
{
if (!row[i].IsZero())
{
// we are setting the first pixel we find
if (!foundPixel)
{
pixelIndex = i;
foundPixel = true;
}
// subtract out all free variables which is the same as moving them to the right side of the equation
else
{
pixels[pixelIndex] -= pixels[i] * float(row[i].m_numerator) / float(row[i].m_denominator);
}
}
}

// if there is no pixel value to set on the left side of the equation, ignore this row
if (!foundPixel)
continue;

// add in the values from the right side of the equation
for (size_t i = NUMPIXELS; i < NUMPIXELS + NUMCONTROLPOINTS; ++i)
{
if (!row[i].IsZero())
pixels[pixelIndex] += controlPoints[i - NUMPIXELS] * float(row[i].m_numerator) / float(row[i].m_denominator);
}
}
}

size_t TextureCoordinateToPixelIndex2d (size_t width, size_t height, size_t y, size_t x)
{
return y * width + x;
};

void PixelIndexToTextureCoordinate2d (size_t width, size_t height, size_t pixelIndex, size_t& y, size_t& x)
{
x = pixelIndex % width;
y = pixelIndex / width;
}

size_t TextureCoordinateToPixelIndex3d (size_t width, size_t height, size_t depth, size_t z, size_t y, size_t x)
{
return
z * width * height +
y * width +
x;
};

void PixelIndexToTextureCoordinate3d (size_t width, size_t height, size_t depth, size_t pixelIndex, size_t& z, size_t& y, size_t& x)
{
x = pixelIndex % width;

pixelIndex = pixelIndex / width;

y = pixelIndex % height;

pixelIndex = pixelIndex / height;

z = pixelIndex;
}

void PiecewiseCurveTime (float time, size_t numCurves, size_t& outCurveIndex, float& outTime)
{
time *= float(numCurves);
outCurveIndex = size_t(time);

if (outCurveIndex == numCurves)
{
outCurveIndex = numCurves - 1;
outTime = 1.0f;
}
else
{
outTime = std::fmodf(time, 1.0f);
}
}

//===================================================================================================================================
//                                                       2D Textures / Quadratic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For each additional curve, add two points to the end like this:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 = C3                        1
//  P11 + P20 = 2 * C4              3 4
//  P21 = C5                        5
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * 3 control points per curve.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * there are 3 equations per curve, so 3 rows in the augmented matrix per curve.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadratic (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = N / 3;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 3;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s +
coefficients[offset + 1] * s * t * 2.0f +
coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadratic (float totalTime, const std::array<float, N>& pixels)
{
const size_t c_numCurves = (N / 2) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

float row0 = lerp(t, pixels[startRow * 2], pixels[startRow * 2 + 1]);
float row1 = lerp(t, pixels[(startRow + 1) * 2], pixels[(startRow + 1) * 2 + 1]);
return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_numPixels = c_imageWidth * c_imageHeight;
const size_t c_numControlPoints = NUMCURVES * 3;
const size_t c_numEquations = NUMCURVES * 3;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&](size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
};
auto pixelIndexToCoordinates = [&](size_t pixelIndex, char pixelCoords[10])
{
size_t y, x;
PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
sprintf(pixelCoords, "%zu%zu", y, x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this yx coordinate pattern:
//   00
//   01 10
//   11
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 3;
switch (i % 3)
{
case 0:
{
row[TextureCoordinateToPixelIndex(curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(curveIndex + 0, 1)] = CRationalNumber(1, 2);
row[TextureCoordinateToPixelIndex(curveIndex + 1, 0)] = CRationalNumber(1, 2);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

{
printf("Testing 2D Textures / Quadratic Curvesnn");

system("pause");
}

//===================================================================================================================================
//                                    2D Textures / Quadratic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For second curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 + P21 = 2 * C3              2 5
//  P20 = C4                        4
//
//  --- For third curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//  P30 P31
//
//  P00 = C0
//  P01 + P10 = 2 * C1
//  P11 = C2
//
//  P10 + P21 = 2 * C3
//  P20 = C4
//
//  P21 + P30 = 2 * C5
//  P31 = C6
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 1 + NumCurves*2.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + NumCurves*2.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadraticC0 (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = (N - 1) / 2;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 2;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s +
coefficients[offset + 1] * s * t * 2.0f +
coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadraticC0 (float totalTime, const std::array<float, N>& pixels)
{
const size_t c_numCurves = (N / 2) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

// Note we flip x axis direction every odd row to get the zig zag
float horizT = (startRow % 2) == 0 ? t : 1.0f - t;

float row0 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
++startRow;
float row1 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_numPixels = c_imageWidth * c_imageHeight;
const size_t c_numControlPoints = 1 + NUMCURVES * 2;
const size_t c_numEquations = 1 + NUMCURVES * 2;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t y, x;
PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
sprintf(pixelCoords, "%zu%zu", y, x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation has a pattern like this:
//   00
//   01 10
//
// But, pattern index is added to the y index.
// Also, the x coordinates flip from 0 to 1 on those after each pattern.
// Also, left side coefficients must add up to 1.

size_t patternIndex = i / 2;
size_t xoff = patternIndex % 2 == 1;
size_t xon = patternIndex % 2 == 0;
switch (i % 2)
{
case 0:
{
row[TextureCoordinateToPixelIndex(patternIndex + 0, xoff)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(patternIndex + 0, xon)] = CRationalNumber(1, 2);
row[TextureCoordinateToPixelIndex(patternIndex + 1, xoff)] = CRationalNumber(1, 2);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

{
printf("nTesting 2D Textures / Quadratic Curves with C0 continuitynn");

system("pause");
}

//===================================================================================================================================
//                                             3D Textures / Cubic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 5 6
//  P111 = C3                       7
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 7 8
//  P111 = C3                       9
//
//  P010 = C4                       2
//  P011 + P020 + P110 = 3 * C5     3 4 8
//  P021 + P111 + P120 = 3 * C6     5 9 10
//  P121 = C7                       11
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubic (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = N / 4;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 4;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s * s +
coefficients[offset + 1] * s * s * t * 3.0f +
coefficients[offset + 2] * s * t * t * 3.0f +
coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubic (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
const size_t c_numCurves = (N / 4) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

//    rowZYX
float row00x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
float row01x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
float row0yx = lerp(t, row00x, row01x);

float row10x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
float row11x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
float row1yx = lerp(t, row10x, row11x);

return lerp(t, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubic ()
{
const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = NUMCURVES * 4;
const size_t c_numEquations = NUMCURVES * 4;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this zyx coordinate pattern:
//   000
//   001 010 100
//   011 101 110
//   111
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

float value1 = EvaluateBernsteinPolynomial3DCubic(t, controlPoints);
float value2 = EvaluateLinearInterpolation3DCubic(t, pixels, TextureCoordinateToPixelIndex);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubics ()
{
printf("nTesting 3D Textures / Cubic Curvesnn");

Test3DCubic<1>();
Test3DCubic<2>();
Test3DCubic<3>();
Test3DCubic<4>();

system("pause");
}

//===================================================================================================================================
//                                         3D Textures / Cubic Curves Multiple Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
// This is the same as 3D Textures / Cubic Curves, but there is a second curve stored by flipping x coordinates.
//
//  --- Other details:
//
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + (NumCurves/2).
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t HALFNUMCURVES>
void Test3DCubicMulti ()
{
const size_t NUMCURVES = HALFNUMCURVES * 2;
const size_t c_imageWidth = 2;
const size_t c_imageHeight = HALFNUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = NUMCURVES * 4;
const size_t c_numEquations = NUMCURVES * 4;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the first set of equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations / 2; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation goes in this zyx coordinate pattern:
//   000
//   001 010 100
//   011 101 110
//   111
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// create the second set of equations
for (size_t i = 0; i < c_numEquations / 2; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i + c_numEquations / 2];

// left side of the equation goes in this zyx coordinate pattern, which is the same as above but x axis flipped.
//   001
//   000 011 101
//   010 100 111
//   110
// But, curve index is added to the y index.
// Also, left side coefficients must add up to 1.
size_t curveIndex = i / 4;
switch (i % 4)
{
case 0:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 3);
break;
}
case 3:
{
row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 1);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i + c_numEquations / 2] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates);
}

void Test3DCubicsMulti ()
{
printf("nTesting 3D Textures / Cubic Curves with Multiple Curvesnn");

Test3DCubicMulti<1>();

system("pause");
}

//===================================================================================================================================
//                                       3D Textures / Cubic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  P011 + P110 + P121 = 3 * C4
//  P010 + P021 + P110 = 3 * C5
//  P020 = C6
//
//  --- For third curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//  P030 P031    P130 P131
//
//  P000 = C0
//  P001 + P010 + P100 = 3 * C1
//  P011 + P101 + P110 = 3 * C2
//  P111 = C3
//
//  P011 + P110 + P121 = 3 * C4
//  P010 + P021 + P110 = 3 * C5
//  P020 = C6
//
//  P021 + P030 + P120 = 3 * C7
//  P031 + P121 + P130 = 3 * C8
//  P131 = C9
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//
//  * control points: 1 + 3 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + 3 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubicC0 (float totalTime, const std::array<float, N>& coefficients)
{
const size_t c_numCurves = (N-1) / 3;

float t;
size_t startCurve;
PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

size_t offset = startCurve * 3;

float s = 1.0f - t;
return
coefficients[offset + 0] * s * s * s +
coefficients[offset + 1] * s * s * t * 3.0f +
coefficients[offset + 2] * s * t * t * 3.0f +
coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubicC0 (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
const size_t c_numCurves = (N / 4) - 1;

float t;
size_t startRow;
PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

// Note we flip x and z axis direction every odd row to get the zig zag

//    rowZYX
float xzT = (startRow % 2) == 0 ? t : 1.0f - t;
float row00x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
float row01x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
float row0yx = lerp(t, row00x, row01x);

float row10x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
float row11x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
float row1yx = lerp(t, row10x, row11x);

return lerp(xzT, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubicC0 ()
{

const size_t c_imageWidth = 2;
const size_t c_imageHeight = NUMCURVES + 1;
const size_t c_imageDepth = 2;
const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
const size_t c_numControlPoints = 1 + NUMCURVES * 3;
const size_t c_numEquations = 1 + NUMCURVES * 3;

// report values for this test
printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

// lambdas to convert between pixel index and texture coordinates
auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
{
return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
};
auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
{
size_t z, y, x;
PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
};

// create the equations
TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
for (size_t i = 0; i < c_numEquations; ++i)
{
TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

// left side of the equation has a pattern like this:
//   000
//   001 010 100
//   011 101 110
//
// But, pattern index is added to the y index.
// Also, the x and z coordinates flip from 0 to 1 on those after each pattern.
// Also, left side coefficients must add up to 1.
size_t patternIndex = i / 3;
size_t xz0 = patternIndex % 2 == 1;
size_t xz1 = patternIndex % 2 == 0;
switch (i % 3)
{
case 0:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz0)] = CRationalNumber(1, 1);
break;
}
case 1:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz0)] = CRationalNumber(1, 3);
break;
}
case 2:
{
row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
row[TextureCoordinateToPixelIndex(xz1, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
break;
}
}

// right side of the equation is identity
row[c_numPixels + i] = CRationalNumber(1);
}

// solve the matrix if possible and print out the equations
std::unordered_set<size_t> freeVariables;
if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
return;

// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate
std::array<float, c_numPixels> pixels = { 0 };
std::array<float, c_numControlPoints> controlPoints = { 0 };
FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

// do a number of samples of each method at the same time values, and report the largest difference (error)
float largestDifference = 0.0f;
for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
{
float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

float value1 = EvaluateBernsteinPolynomial3DCubicC0(t, controlPoints);
float value2 = EvaluateLinearInterpolation3DCubicC0(t, pixels, TextureCoordinateToPixelIndex);

largestDifference = std::max(largestDifference, std::abs(value1 - value2));
}
printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubicsC0 ()
{

printf("nTesting 3D Textures / Cubic Curves with C0 continuitynn");

Test3DCubicC0<1>();
Test3DCubicC0<2>();
Test3DCubicC0<3>();
Test3DCubicC0<4>();
Test3DCubicC0<5>();
Test3DCubicC0<6>();

system("pause");
}

//===================================================================================================================================
//                                                                 main
//===================================================================================================================================

int main (int agrc, char **argv)
{
Test3DCubics();
Test3DCubicsMulti();
Test3DCubicsC0();

return 0;
}


# Orthogonal Projection Matrix Plainly Explained

“Scratch a Pixel” has a really nice explanation of perspective and orthogonal projection matrices.

It inspired me to make a very simple / plain explanation of orthogonal projection matrices that hopefully will help them be less opaque for folks and more intuitive.

Original article: Scratch A Pixel: The Perspective and Orthographic Projection Matrix

# Let’s Get To It!

The whole purpose of an orthogonal matrix is to take x,y and z as input and output x,y and z such that valid points on the screen will have x,y,z values between -1 and 1.

If we transform a point and get an x,y or z that is outside of that range, we know the point is outside of the screen either because it’s too far left, right, up or down, or because it’s too close or too far on the z axis.

Let’s think about how we’d do this, thinking only about the x coordinate for now.

To map some range of x values from -1 to 1, we’ll need to decide on what x value maps to -1 and what x value maps to 1. We’ll call these “left” and “right”.

Given a left and right value, and an x value we want to map to the range, perhaps the most straight forward way to do it would be this:

$XOut = \frac{X-Left}{Right-Left} * 2 - 1$

The division calculates the percentage of how far X is between left and right. Multiplying that by 2 and subtracting 1 changes it so instead of valid points being from 0 to 1 (aka from 0% to 100%), they are instead between -1 and 1.

Let’s change this formula so that there is one term that is multiplied by X and another term that has everything else. (Wondering why? It’s because I’m cheating and know the final form. Don’t feel bad if it isn’t intuitive why we’d do this!)

$\frac{X-Left}{Right-Left} * 2 - 1 =\\ \\ \frac{2(X-Left)}{Right-Left} - 1 =\\ \\ \frac{2X-2*Left}{Right-Left} - 1 =\\ \\ \frac{2X-2*Left}{Right-Left} - \frac{Right-Left}{Right-Left} =\\ \\ \frac{2X-2*Left-(Right-Left)}{Right-Left} =\\ \\ \frac{2X-2*Left-Right+Left}{Right-Left} =\\ \\ \frac{2X-Left-Right}{Right-Left} =\\ \\ \frac{2X}{Right-Left} - \frac{Right+Left}{Right-Left} =\\ \\ \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left}\\$

Setting up the formula this way allows us to transform the x component of an (x,y,z,1) point using a dot product:

$(x,y,z,1) \cdot (\frac{2}{Right-Left},0,0,-\frac{Right+Left}{Right-Left}) = \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left}$

A dot product is what happens during matrix multiplication, so if we put this into a 4×4 matrix, we get the same result. Let’s check that out.

We start with an identity matrix. If we use it to transform an (x,y,z,1) point, we get the same point as output aka nothing happens.

$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Now let’s put the x transform we came up with into the matrix:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ -\frac{Right+Left}{Right-Left} & 0 & 0 & 1 \\ \end{bmatrix}$

If we use that matrix to transform an (x,y,z,1) point, it will transform our x component as we described (valid ranges of x that are between left and right will be between -1 and 1), while leaving the other components of the point alone.

As you might imagine, it’s pretty simple to get our formulas for y and z as well. Starting with the x formula, we can just change x with y and z, and right/left with top/bottom and far/near.

$XOut = \frac{2}{Right-Left}X - \frac{Right+Left}{Right-Left} \\ \\ YOut = \frac{2}{Top-Bottom}Y - \frac{Top+Bottom}{Top-Bottom} \\ \\ ZOut = \frac{2}{Far-Near}Z - \frac{Far+Near}{Far-Near}$

We can put those into our matrix to get a full orthographic projection matrix.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{2}{Far-Near} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

There we go, that’s all there is to making an orthographic projection matrix. It’s whole purpose is to convert x,y,z values to be between -1 and 1 so that the GPU knows whether points are inside our outside the screen – and thus whether they need to be clipped or not.

## Variations

While the projection matrix we made is a valid orthographic projection matrix in OpenGL, we actually need a tweak for it to be valid for DirectX. The reason for this is because while in OpenGL the clip space for z is between -1 and 1, it’s actually between 0 and 1 for DirectX!

If you leave off the *2-1 for the z formula, but leave it for x and y, you’ll end up with a matrix like this one:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{1}{Near-Far} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Near}{Near-Far} & 1 \\ \end{bmatrix}$

Another variation you’ll see is a version where the camera is centered on the origin for the x and y axis. In other words, left = -right, and top = -bottom. When that is true, right+left and top+bottom become zero which simplifies the matrix to this:

$\begin{bmatrix} \frac{2}{Width} & 0 & 0 & 0 \\ 0 & \frac{2}{Height} & 0 & 0 \\ 0 & 0 & \frac{2}{Far-Near} & 0 \\ 0 & 0 & -\frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

Another variation you’ll see is that the matrix is transposed. You’ll see this when switching between pre and post multiplication, or when switching from column major matrices to row matrices. Either is valid and it’s basically just a notation and convention thing. Here is the origional matrix we made transposed.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & -\frac{Right+Left}{Right-Left} \\ 0 & \frac{2}{Top-Bottom} & 0 & -\frac{Top+Bottom}{Top-Bottom} \\ 0 & 0 & \frac{2}{Far-Near} & -\frac{Far+Near}{Far-Near} \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Lastly, the above matrices were all for a “left handed” system. That means that it assumes the positive x axis goes to the right, the positive y axis goes up, and the positive z goes into your screen (aka, the camera is looking down the positive z axis). Positive Z values will map to the valid -1 to 1 range, while negative z values will be outside the valid range.

A variation on the orthographic projection matrix we made that you’ll see is the matrix being a “right handed” matrix which is the same as the left handed matrix, except that the positive z axis goes out from your screen (aka the camera is looking down the negative z axis). Negative Z values will map to the valid -1 to 1 range, while positive z values will be outside the valid range.

To switch the handedness of the matrix, you just flip the sign of the element at (3,3), so here is our original orthographic projection matrix, but converted to right handed instead of left handed.

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & -\frac{2}{Far-Near} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

You may also just see the denominator changed from $Far-Near$ to $Near-Far$ which has the same effect, and would give you something like this:

$\begin{bmatrix} \frac{2}{Right-Left} & 0 & 0 & 0 \\ 0 & \frac{2}{Top-Bottom} & 0 & 0 \\ 0 & 0 & \frac{2}{Near-Far} & 0 \\ -\frac{Right+Left}{Right-Left} & - \frac{Top+Bottom}{Top-Bottom} & - \frac{Far+Near}{Far-Near} & 1 \\ \end{bmatrix}$

Fun trivia: the term “sinister” comes from latin, meaning “left handed”. So, when talking to someone about their graphics engine, you can ask them whether or not they use sinister projection 😛

Scratch A Pixel: The Perspective and Orthographic Projection Matrix

D3DXMatrixOrthoRH (DirectX) – shows the resulting matrix. Also links to left handed and off center variants.

glOrtho (OpenGL) – shows resulting matrix.

# Multivariable Dual Numbers & Automatic Differentiation

In a previous post I showed how to use dual numbers to be able to get both the value and derivative of a function at the same time:
Dual Numbers & Automatic Differentiation

That post mentions that you can extend it to multivariable functions but doesn’t explain how. This post is that explanation, including simple working C++ code!

Extending this to multivariable functions is useful for ray marching, calculating analytical surface normals and also likely useful for training a neural network if you want an alternative to back propagation. I’m not sure about the efficiency comparison of this versus back propagation but I intend on looking into it (:

# How Does it Work?

It turns out to be really simple to use dual numbers with multivariable functions. The end result is that you want a partial derivative for each variable in the equation, so to do that, you just have a dual number per variable, and process the entire equation for each of those dual numbers separately.

We’ll work through an example. Let’s find the partial derivatives of x and y of the function $3x^2-2y^3$, at input (5,2).

We’ll start by finding the derivative of x, and then the derivative of y.

# Example: df/dx

We start by making a dual number for our x value, remembering that the real part is the actual value for x, and the dual part is the derivative of x, which is 1:

$5+1\epsilon$

or:

$5+\epsilon$

We multiply that value by itself to get the $x^2$ value, keeping in mind that $\epsilon^2$ is zero:
$(5+\epsilon)*(5+\epsilon)= \\ 25+10\epsilon+\epsilon^2= \\ 25+10\epsilon \\$

Next we need to multiply that by 3 to get the $3x^2$ term:

$3*(25+10\epsilon) = 75+30\epsilon$

Putting that aside for a moment, we need to make the $2y^3$ term. We start by making our y value:

$2+0\epsilon$

or:

$2$

If you are wondering why it has a zero for the epsilon term, it’s because when we are calculating the partial derivative of x, y is a constant, so has a derivative of zero.

Next, we multiply this y value by itself twice to get the $y^3$ value:

$2*2*2=8$

We then multiply it by 2 to get the $2y^3$ term:

$8*2=16$

Now that we have our two terms, we subtract the y term from the x term to get our final result:

$75+30\epsilon-16 = \\ 59+30\epsilon$

This result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of x at that point is 30.

That checks out, let’s move on to the derivative of y!

# Example: df/dy

Calculating the derivative of y is very similar to calculating the derivative of x, except that it’s the x term that has an epsilon value (derivative) of 0, instead of the y term. The y term has the epsilon value of 1 this time as well. We’ll work through it to see how it plays out.

First up, we need to make the value for x:

$5+0\epsilon$

or:

$5$

Next we square it and multiply it by 3 to get the $3x^2$ term:

$5*5*3=75$

Next we need to make the value for y, remembering that we use an epsilon value of 1 since the derivative of y is 1 this time around.

$2+\epsilon$

We cube that value and multiply by 2 to get the $2y^3$ term:
$2*(2+\epsilon)*(2+\epsilon)*(2+\epsilon)= \\ 2*(2+\epsilon)*(4+4\epsilon+\epsilon^2)= \\ 2*(2+\epsilon)*(4+4\epsilon)= \\ 2*(8+12\epsilon+4\epsilon^2)= \\ 2*(8+12\epsilon)= \\ 16+24\epsilon$

Now we subtract the y term from the x term to get the final result:

$75 - (16+24\epsilon)= \\ 59-24\epsilon$

This result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of y at that point is -24.

That also checks out, so we got the correct value and partial derivatives for the equation.

# Reducing Redundancy

There was quite a bit of redundancy when working through the x and y derivatives wasn’t there? Increasing the number of variables will increase the amount of redundancy too, so it doesn’t scale up well.

Luckily there is a way to address this. Basically, instead of making two dual numbers which have two items, you make them share the real value (since it’s the same for both, as is the work to make it) and append the dual values for x and y to it.

$x'=(a+b\epsilon) \\ y'=(a+b\epsilon)$

You have:

$(a+b\epsilon_x+c\epsilon_y)$

Then, in your math or in your program, you treat it as if it’s two different dual numbers packed into one. This lets you do the work for the real number once instead of twice, but still lets you do your dual number work for each variable independently.

While it’s probably easiest to think of these as two dual numbers packed into one value, there is actually a mathematical basis for it as well, which may or may not surprise you.

Check out what happens when we multiply two of these together, keeping in mind that multiplying ANY two epsilon values together becomes zero, even if they are different epsilons:

$(a+b\epsilon_x+c\epsilon_y) * (d+e\epsilon_x+f\epsilon_y)= \\ ad + ae\epsilon_x + af\epsilon_y + bd\epsilon_x + be\epsilon_x^2 + bf\epsilon_x\epsilon_y + cd\epsilon_y + ce\epsilon_x\epsilon_y + cf\epsilon_y^2= \\ ad + ae\epsilon_x + af\epsilon_y + bd\epsilon_x + cd\epsilon_y= \\ ad + (ae+bd)\epsilon_x + (af+cd)\epsilon_y$

The interesting thing is that the above result gives you the same values as if you did the same work for two dual numbers individually.

Let’s see this three component dual number in action by re-doing the example again. Note that this pattern scales up to ANY number of variables!

# Example: Both Derivatives (Gradient Vector)

Our goal is to calculate the value and partial derivatives of the function $3x^2-2y^3$ at location (5,2).

First we make our x value:

$5 + 1\epsilon_x + 0\epsilon_y$

or:

$5 + \epsilon_x$

We square that and multiply it by 3 to get our $3x^2$ term:

$3*(5 + \epsilon_x)*(5 + \epsilon_x)= \\ 3*(25+10\epsilon_x+\epsilon_x^2)= \\ 3*(25+10\epsilon_x)= \\ 75+30\epsilon_x$

Next, we make our y value:

$2 + 0\epsilon_x + 1\epsilon_y$

or:

$2 + \epsilon_y$

We cube it and multiply it by 2 to get our $2x^3$ term:

$16+24\epsilon_y$

Lastly we subtract the y term from the x term to get our final answer:

$(75+30\epsilon_x) - (16+24\epsilon_y)= \\ 59+30\epsilon_x-24\epsilon_y$

The result says that $3x^2-2y^3$ has a value of 59 at location (5,2), and that the derivative of x at that point is 30, and the derivative of y at that point is -24.

Neat, right?!

# Example Code

Here is the example code output:

Here is the code that generated it:

#include <stdio.h>
#include <cmath>
#include <array>
#include <algorithm>

#define PI 3.14159265359f

#define EPSILON 0.001f  // for numeric derivatives calculation

template <size_t NUMVARIABLES>
class CDualNumber
{
public:

// constructor to make a constant
CDualNumber (float f = 0.0f) {
m_real = f;
std::fill(m_dual.begin(), m_dual.end(), 0.0f);
}

// constructor to make a variable value.  It sets the derivative to 1.0 for whichever variable this is a value for.
CDualNumber (float f, size_t variableIndex) {
m_real = f;
std::fill(m_dual.begin(), m_dual.end(), 0.0f);
m_dual[variableIndex] = 1.0f;
}

// storage for real and dual values
float							m_real;
std::array<float, NUMVARIABLES> m_dual;
};

//----------------------------------------------------------------------
// Math Operations
//----------------------------------------------------------------------
template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator + (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real + b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] + b.m_dual[i];
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator - (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real - b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] - b.m_dual[i];
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator * (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real * b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_real * b.m_dual[i] + a.m_dual[i] * b.m_real;
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> operator / (const CDualNumber<NUMVARIABLES> &a, const CDualNumber<NUMVARIABLES> &b)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = a.m_real / b.m_real;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = (a.m_dual[i] * b.m_real - a.m_real * b.m_dual[i]) / (b.m_real * b.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> sqrt (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
float sqrtReal = sqrt(a.m_real);
ret.m_real = sqrtReal;
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = 0.5f * a.m_dual[i] / sqrtReal;
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> pow (const CDualNumber<NUMVARIABLES> &a, float y)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = pow(a.m_real, y);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = y * a.m_dual[i] * pow(a.m_real, y - 1.0f);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> sin (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = sin(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] * cos(a.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> cos (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = cos(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = -a.m_dual[i] * sin(a.m_real);
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> tan (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = tan(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] / (cos(a.m_real) * cos(a.m_real));
return ret;
}

template <size_t NUMVARIABLES>
inline CDualNumber<NUMVARIABLES> atan (const CDualNumber<NUMVARIABLES> &a)
{
CDualNumber<NUMVARIABLES> ret;
ret.m_real = tan(a.m_real);
for (size_t i = 0; i < NUMVARIABLES; ++i)
ret.m_dual[i] = a.m_dual[i] / (1.0f + a.m_real * a.m_real);
return ret;
}

// templated so it can work for both a CDualNumber<1> and a float
template <typename T>
inline T SmoothStep (const T& x)
{
return x * x * (T(3.0f) - T(2.0f) * x);
}

//----------------------------------------------------------------------
// Test Functions
//----------------------------------------------------------------------

void TestSmoothStep (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// calculate value and derivative using dual numbers
CDualNumber<1> y = SmoothStep(x);

// calculate numeric derivative using central differences
float derivNumeric = (SmoothStep(input + EPSILON) - SmoothStep(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 6.0f * input - 6.0f * input * input;

// show value and derivatives
printf("(smoothstep) y=3x^2-2x^3  (x=%0.4f)n", input);
printf("  y = %0.4fn", y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

void TestTrig (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// sin
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = sin(x);

// calculate numeric derivative using central differences
float derivNumeric = (sin(input + EPSILON) - sin(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = cos(input);

// show value and derivatives
printf("sin(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// cos
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = cos(x);

// calculate numeric derivative using central differences
float derivNumeric = (cos(input + EPSILON) - cos(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = -sin(input);

// show value and derivatives
printf("cos(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// tan
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = tan(x);

// calculate numeric derivative using central differences
float derivNumeric = (tan(input + EPSILON) - tan(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.0f / (cos(input)*cos(input));

// show value and derivatives
printf("tan(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// atan
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = atan(x);

// calculate numeric derivative using central differences
float derivNumeric = (atan(input + EPSILON) - atan(input - EPSILON)) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.0f / (1.0f + input * input);

// show value and derivatives
printf("atan(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}
}

void TestSimple (float input)
{
// create a dual number as the value of x
CDualNumber<1> x(input, 0);

// sqrt
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = CDualNumber<1>(3.0f) / sqrt(x);

// calculate numeric derivative using central differences
float derivNumeric = ((3.0f / sqrt(input + EPSILON)) - (3.0f / sqrt(input - EPSILON))) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = -3.0f / (2.0f * pow(input, 3.0f / 2.0f));

// show value and derivatives
printf("3/sqrt(%0.4f) = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}

// pow
{
// calculate value and derivative using dual numbers
CDualNumber<1> y = pow(x + CDualNumber<1>(1.0f), 1.337f);

// calculate numeric derivative using central differences
float derivNumeric = ((pow(input + 1.0f + EPSILON, 1.337f)) - (pow(input + 1.0f - EPSILON, 1.337f))) / (2.0f * EPSILON);

// calculate actual derivative
float derivActual = 1.337f * pow(input + 1.0f, 0.337f);

// show value and derivatives
printf("(%0.4f+1)^1.337 = %0.4fn", input, y.m_real);
printf("  dual# dy/dx = %0.4fn", y.m_dual[0]);
printf("  actual dy/dx = %0.4fn", derivActual);
printf("  numeric dy/dx = %0.4fnn", derivNumeric);
}
}

void Test2D (float inputx, float inputy)
{
// create dual numbers as the value of x and y
CDualNumber<2> x(inputx, 0);
CDualNumber<2> y(inputy, 1);

// z = 3x^2 - 2y^3
{
// calculate value and partial derivatives using dual numbers
CDualNumber<2> z = CDualNumber<2>(3.0f) * x * x - CDualNumber<2>(2.0f) * y * y * y;

// calculate numeric partial derivatives using central differences
auto f = [] (float x, float y) {
return 3.0f * x * x - 2.0f * y * y * y;
};
float derivNumericX = (f(inputx + EPSILON, inputy) - f(inputx - EPSILON, inputy)) / (2.0f * EPSILON);
float derivNumericY = (f(inputx, inputy + EPSILON) - f(inputx, inputy - EPSILON)) / (2.0f * EPSILON);

// calculate actual partial derivatives
float derivActualX = 6.0f * inputx;
float derivActualY = -6.0f * inputy * inputy;

// show value and derivatives
printf("z=3x^2-2y^3 (x = %0.4f, y = %0.4f)n", inputx, inputy);
printf("  z = %0.4fn", z.m_real);
printf("  dual# dz/dx = %0.4fn", z.m_dual[0]);
printf("  dual# dz/dy = %0.4fn", z.m_dual[1]);
printf("  actual dz/dx = %0.4fn", derivActualX);
printf("  actual dz/dy = %0.4fn", derivActualY);
printf("  numeric dz/dx = %0.4fn", derivNumericX);
printf("  numeric dz/dy = %0.4fnn", derivNumericY);
}
}

void Test3D (float inputx, float inputy, float inputz)
{
// create dual numbers as the value of x and y
CDualNumber<3> x(inputx, 0);
CDualNumber<3> y(inputy, 1);
CDualNumber<3> z(inputz, 2);

// w = sin(x*cos(2*y)) / tan(z)
{
// calculate value and partial derivatives using dual numbers
CDualNumber<3> w = sin(x * cos(CDualNumber<3>(2.0f)*y)) / tan(z);

// calculate numeric partial derivatives using central differences
auto f = [] (float x, float y, float z) {
return sin(x*cos(2.0f*y)) / tan(z);
};
float derivNumericX = (f(inputx + EPSILON, inputy, inputz) - f(inputx - EPSILON, inputy, inputz)) / (2.0f * EPSILON);
float derivNumericY = (f(inputx, inputy + EPSILON, inputz) - f(inputx, inputy - EPSILON, inputz)) / (2.0f * EPSILON);
float derivNumericZ = (f(inputx, inputy, inputz + EPSILON) - f(inputx, inputy, inputz - EPSILON)) / (2.0f * EPSILON);

// calculate actual partial derivatives
float derivActualX = cos(inputx*cos(2.0f*inputy))*cos(2.0f * inputy) / tan(inputz);
float derivActualY = cos(inputx*cos(2.0f*inputy)) *-2.0f*inputx*sin(2.0f*inputy) / tan(inputz);
float derivActualZ = sin(inputx * cos(2.0f * inputy)) / -(sin(inputz) * sin(inputz));

// show value and derivatives
printf("w=sin(x*cos(2*y))/tan(z) (x = %0.4f, y = %0.4f, z = %0.4f)n", inputx, inputy, inputz);
printf("  w = %0.4fn", w.m_real);
printf("  dual# dw/dx = %0.4fn", w.m_dual[0]);
printf("  dual# dw/dy = %0.4fn", w.m_dual[1]);
printf("  dual# dw/dz = %0.4fn", w.m_dual[2]);
printf("  actual dw/dx = %0.4fn", derivActualX);
printf("  actual dw/dy = %0.4fn", derivActualY);
printf("  actual dw/dz = %0.4fn", derivActualZ);
printf("  numeric dw/dx = %0.4fn", derivNumericX);
printf("  numeric dw/dy = %0.4fn", derivNumericY);
printf("  numeric dw/dz = %0.4fnn", derivNumericZ);
}
}

int main (int argc, char **argv)
{
TestSmoothStep(0.5f);
TestSmoothStep(0.75f);
TestTrig(PI * 0.25f);
TestSimple(3.0f);
Test2D(1.5f, 3.28f);
Test3D(7.12f, 8.93f, 12.01f);
return 0;
}


# Closing

One of the neatest things about dual numbers is that they give precise results. They are not approximations and they are not numerical methods, unlike the central differences method that I compared them to in the example program (More info on numerical derivatives here: Finite Differences). Using dual numbers gives you exact derivatives, within the limitations of (eg) floating point math.

It turns out that backpropagation (the method that is commonly used to train neural networks) is just steepest gradient descent. You can read about that here: Backpropogation is Just Steepest Descent with Automatic Differentiation

That makes me wonder how dual numbers would do in run time speed compared to back propagation as well as numerical methods for getting the gradient to adjust a neural network during training.

If I had to guess, I’d say that dual numbers may be slightly slower than backpropagation, but not as slow as numerical methods which are going to be much, much slower. We’ll see though. It may be much easier to implement neural network learning using dual numbers compared to backpropagation, so may be worth an exploration and write up, even if only to make neural networks a little bit more accessible to people.

Comments, corrections, etc? Let me know in the comments below, or contact me on twitter at @Atrix256

# Raytracing Reflection, Refraction, Fresnel, Total Internal Reflection, and Beer’s Law

This post talks about how to render images like the below in real time using ray tracing. Some realism in the images come from reflection and refraction, but the real icing on the cake comes from Fresnel, total internal reflection and Beer’s law. We’ll look at the contributions of these features individually and talk about how to properly combine them for the greatest and most realistic results (:

The renderings come from a shadertoy that accompanies this post: Shadertoy: Reflect Refract TIR Fresnel RayT

My motivation for learning more about this stuff is that I’m starting to make a marble madness inspired game, and I’m planning to do hybrid rendering between rasterization and ray based techniques.

This post will focus more on how to make these features work for you in ray tracing, and less about the reasons for why they work the way they do. This post is more about practical implementations, and less about rigorous explanations.

If you have any questions, comments, corrections or additions, feel free to leave in the comments section at the bottom, or feel free to hit me up on twitter at @Atrix256.

This post is assumes you know how to do basic raytracing with ambient, diffuse and specular lighting, like the image below, which we are going to start with:

# Reflection

The first thing to talk about is reflection. More specifically we are going to be talking about SPECULAR reflection.

Specular is defined by the dictionary to mean “of, relating to, or having the properties of a mirror.”, so what we normally think of as reflection (like from a mirror) is in fact specular reflection.

There is non specular reflection, aka diffuse reflection, which just means that light is reflected off of a surface in a non mirror like way. This is accomplished with diffuse lighting where commonly we dot product the normal of a surface with the direction from the surface to the light to know how much to light that point on the surface.

If specular reflection is how mirrors reflect, and diffuse reflection is how regular diffuse surfaces work, then you might wonder what specular lighting is all about.

A specular highlight is actually just a cheap approximation to doing a mirror like specular reflection of a light source, so it is a cheap kind of specular reflection.

Let’s talk about how to do real mirror like specular reflection in a ray tracer.

When light hits a surface, some amount of the light will be reflected, and some amount of the light will be transmitted into the object. Transmitted light is used for the diffuse shading.

As you might imagine, the amount of light reflected plus the amount of light transmitted must equal the total amount of light that hit the surface. (note that some transmitted energy may be absorbed and given off as heat, or the object may be glowing, so may give off more light than received but let’s ignore that stuff for now.)

A common way to deal with this is to define a reflectivity of a surface as the amount of light it reflects in percent and use 100% minus that amount as the transmitted amount.

You might say that 2% of light that hits a surface reflects. That means that 98% of the light that hits a surface is transmitted and used for the diffuse shading.

When shading a point on the surface, you would calculate both the reflected color at that point on the surface, and the diffuse shaded color at that point, but you multiply the reflected color by 0.02 and the diffuse shaded color by 0.98 and add them together. Doing this gives a result like the below:

The higher the reflectivity percent, the more reflection you get, and the less diffuse shading you get. Turning down reflectivity has the opposite effect.

How do you calculate the reflected color of a point on a surface though?

You simply calculate the ray that reflected off of the surface, and calculate the color of what that ray hit.

To calculate a reflected ray, you need to know the direction that the ray was traveling when it hit the surface (The incident direction), you need to know the location that the ray hit the surface (the surface location), and you need to know the normal of the surface at the intersection point.

If you have those things you can calculate the reflected ray direction as this:

$ReflectRayLocation = SurfaceLocation \\ ReflectRayDirection = IncidentDirection - 2 * SurfaceNormal * DotProduct(IncidentDirection, SurfaceNormal)$

Note that in hlsl and glsl there is a function called “reflect” which takes the incident direction, and the surface normal, and returns you the reflected direction.

Doing the above is mathematically correct, but if you then try to raytrace from that ray, you may hit the same object you are reflecting off. One way to fight that problem is to push the ray positin a small amount away from the surface to make sure it misses it. You can do that for example like this:

$ReflectRayLocation = ReflectRayLocation + ReflectRayDirection * 0.01$

There are other ways to make sure you don’t hit the same object again, but pushing the ray away from the object a small amount really does work pretty nicely in practice.

# Refraction

I mentioned in the last section that whatever light wasn’t reflected when it hit a surface was transmitted. I also said that the transmitted light was used for diffuse shading.

In reality, it’s passed through the “bidirectional scattering distribution function” aka BSDF. You may have heard the term “bidirectional reflection distribution function” aka BRDF. A BRDF only deals with the hemisphere (half a sphere) that surrounds the surface normal. The BSDF on the other hand deals with the full sphere surrounding a surface normal so BRDF is a subset of what is possible with the BSDF.

Because the BSDF deals with an entire sphere, that means that it can handle reflection (specular and non specular) like the BRDF can, but it can also deal with transparency and refraction, where some of the light travels THROUGH an object.

In a path tracer where everything is physically accurate and mathematically precise, we would be interested in dealing with a BSDF and integrating over the sphere, but since we are working with a ray tracer, our physical accuracy needs are a lot lower – we only want a result that looks plausible.

What we are going to do for our transparency is calculate a direction that the ray is going to travel through the object, ray trace that ray to get a color, and use the transmitted light (the portion of light that isn’t reflected) as a multiplier of that color, that we add to the reflected amount of light.

If we have an object with 10% reflectivity, and 90% transmittance, but use that transmitted light for transparency, we’ll have a rendering like below:

Now that we have transparency, let’s talk about refraction. Refraction is a physical phenomenon where light bends (“changes speed” i guess but that sounds a bit suspicious for light), as it hits a boundary between two different surfaces.

Every material has a refractive index, and in fact, may have different refractive indices per light frequency. For our purposes, we’ll assume surfaces have the same refractive index for every frequency of light. There’s a list of refractive indices for a lot of different materials here: Index of refraction

How a ray bends when it refracts depends on the ration of the refractive index that it’s leaving to the refractive index that it’s entering. AKA outside/inside.

HLSL and GLSL have a function called refract which take the incident vector, the surface normal vector, and that ration of refractive indices, and return the refracted ray.

When you do a raytrace down the refracted ray, you will have the same problem as when tracing the reflected ray, that you may hit the same surface you just hit again erroneously. To help that, you once again just move the ray slightly away from the surface.

$RefractRayLocation = RefractRayLocation + RefractRayDirection * 0.01$

Here is a rendering where the sphere has 10% reflectivity, 90% transmittance, an air refractive index of 1.0, and a refractive index of 1.125 for the sphere. You can see how the light bends as it goes through the object and looks pretty neat!

# Fresnel

There is an interesting property in our world: If you look at something straight on, it’s the least reflective it will be. If you turn it nearly 90 degrees where it’s nearly edge on, it will be the most reflective it can be. Many surfaces will become almost perfectly reflective when you view them almost edge on. Go try it out with a wall in your house or a glass or other things.

Weird huh? That’s called Fresnel, and is something we can also make use of in ray tracing to get a more realistic image.

Instead of just always using the reflectivity amount of the surface, we use the Fresnel equation to figure out how much reflectivity an object should have based on the view angle versus the surface normal, so that when it’s more edge on it becomes more reflective. At minimum the reflectivity will be the reflectivity of the surface, and at maximum the reflectivity will be 100%.

The amount we transmit for either refraction or diffuse will be 100% minus however much percentage is reflective.

Here is the image showing normal reflection again:

Here is the image with Fresnel:

It looks quite a bit better with fresnel doesn’t it?!

Here’s a GLSL function of Schlick’s Fresnel approximation function. Notice that it takes the surface normal and incident vector, as well as the refractive index being left (n1) and the refractive index being entered (n2):

float FresnelReflectAmount (float n1, float n2, vec3 normal, vec3 incident)
{
// Schlick aproximation
float r0 = (n1-n2) / (n1+n2);
r0 *= r0;
float cosX = -dot(normal, incident);
if (n1 > n2)
{
float n = n1/n2;
float sinT2 = n*n*(1.0-cosX*cosX);
// Total internal reflection
if (sinT2 > 1.0)
return 1.0;
cosX = sqrt(1.0-sinT2);
}
float x = 1.0-cosX;
float ret = r0+(1.0-r0)*x*x*x*x*x;

// adjust reflect multiplier for object reflectivity
ret = (OBJECT_REFLECTIVITY + (1.0-OBJECT_REFLECTIVITY) * ret);
return ret;
}


Our tale of reflection is complete, so let’s go back to refraction / transparency and finish up the last two items.

# Total Internal Reflection

The way that the Fresnel equation works, it’s possible that when moving from a material with a higher index of refraction to a lower one, that the amount of refraction can actually drop to zero percent. In this case, the light doesn’t exit the higher refractive index object and instead ONLY reflects back into the object, because the reflective amount becomes 100%.

When this happens, it’s called “Total Internal Reflection” for hopefully obvious reasons (:

There isn’t a whole lot to say about this, because you can see that this is even accounted for in the GLSL Fresnel function from the last section.

However, if you are ever under water in a swimming pool and look up to see the water surface looking like a mirror, that is total internal reflection in action.

You can also see it in the render below, where you can see reflections on the inside of the walls of the object, especially on the bottom (floor) of the object:

# Beer’s Law

Beer’s law is the last item to talk about, and relates to transparent surfaces. Beer’s law deals with light being absorbed over distance as it travels through a material.

Beer’s law is why a thin piece of jello is mostly colorless, but a thicker piece of jello has a much richer and deeper color.

Here’s a cube with beer’s law absorbing red and green light over distance. You should notice that where the light travels less distance through the cube that it’s not as blue, because not as much red and green light has been absorbed:

To apply beer’s law, you first figure out how long a ray has traveled through the absorbing material (by tracing the ray inside the object to see where it hits the back side). You then do this calculation:

vec3 color = RayTrace(rayPos, rayDir);
vec3 absorb = exp(-OBJECT_ABSORB * absorbDistance);
color *= absorb;


OBJECT_ABSORB is an RGB value that describes how much of each color channel absorbs over distance. For example, a value of (8.0, 2.0, 0.1) would make red get absorbed the fastest, then green, then blue, so would result in a blueish, slightly green color object.

# Putting it All Together

Now that we have the individual components worked out let’s talk about how to put it together.

Firstly, when a ray hits any surface, you need to use the Fresnel equation to get the multiplier for the reflected and transmitted light.

Next you calculate the reflected and transmitted light by recursively raytracing. The transmitted light is either used for the diffuse shading, the transparency/refracted ray, or some combination of those two (technically, it’s all up to the BSDF we are approximating).

Then, you multiply the reflected light by the reflected amount from the Fresnel equation, and the transmitted amount by 100%-reflectionAmount and add the results together.

(Quick side note, if you have emissive color on the surface aka the object glows, you would also add that in here).

Since raytracing is recursive, you would do this each time a ray intersected with an object. In other words, each intersection causes the ray to split into two rays.

As you can imagine, this can make the rendering quite complex, especially on the GPU where you can’t even do real recursion.

One way to help limit the recursiveness a bit is when you are calculating your reflection and transmittance amounts, you can choose a threshold like say 1% where if the reflection is under 1% it clamps it to 0%, and if it’s greater than 99% it clamps it at 100%. You can then choose not to recurse down a specific ray if the ray’s multiplier is 0. The end result will be that reflections or transmittance rays that don’t contribute much to the end result won’t be followed at all.

If you are willing to sacrifice some visual quality to not have to split your ray into two at each object intersection, you could also figure out if reflection or transmittance has the higher multiplier, and make the ray only follow one of the paths. If you were doing this in a path tracer, you could choose which one to follow randomly, using the multiplier as a weight for the random selection.

The problem in both of these two optimizations is that the multiplier is only half of the information though so may incorrectly choose the less meaningful contribution. The other half of the information is the color of the ray if you followed it. The reason for this is that you might have a low multiplier for a really bright spot (caustics have this problem commonly!), or vice versa you may have a high multiplier for a dull featureless spot. With path tracing, if you take enough samples, it all washes out in the averages, and with ray tracing, maybe you accept that it will do the wrong thing sometimes, to stay a real time algorithm, but it’s important to know how this type of choice can fail for you. (Side note, this sort of stuff is what importance sampling in path tracing is all about – trying to make rays follow more meaningful paths to get better results quicker).

When doing real time raytracing you also often have to decide how many times you want to allow a ray to bounce around. In the shadertoy that goes with this post, that parameter is MAX_RAY_BOUNCES and I have it set to 10.

Interestingly, setting it to 1 has no visible impact on the sphere at all, which is a nice improvement. For the box, a value of 3 seems to be the maximum it needs. 3 also seems to be the magic number for the geometric gem type shape.

So, 10 is overkill, but i left it at that in case people play with parameters and change them to values which would require more bounces.

Lastly I wanted to mention that in the scene that I rendered, I did a small “trick” to make it so I didn’t need to do full recursive splitting of rays at each intersection. I did this by making it so the main object in the center of the scene was the only object that had reflection.

This way, I only need to split the ray into two if i hit the main object. Furthermore, when I’m splitting the ray at the main object, the ray that gets the color for the outside world (versus the inside of the object) is a single non recursive ray cast since it can’t hit anything reflective. The result is that at each intersection of the sphere, i do a simple non recursive ray cast for the ray that is going outside of the main object, then i continue the iterative ray on the inside of the object until i run out of bounces.

Doing this causes a recursive process to become an iterative one, which is much friendlier on the gpu.

Below is the final render again from the shadertoy. The parameters are:

• The refractive index of the air is 1.00029
• The refractive index inside the objects are 1.125
• Reflectivity is 1%
• The absorption for beers law is (8.0, 8.0, 3.0)

# Incremental Least Squares Surface and Hyper-Volume Fitting

The last post showed how to fit a $y=f(x)$ equation to a set of 2d data points, using least squares fitting. It allowed you to do this getting only one data point at a time, and still come up with the same answer as if you had all the data points at once, so it was an incremental, or “online” algorithm.

This post generalizes that process to equations of any dimension such as $z=f(x,y)$, $w=f(x,y,z)$ or greater.

Below is an image of a surface that is degree (2,2). This is a screenshot taken from the interactive webgl2 demo I made for this post: Least Squares Surface Fitting

# How Do You Do It?

The two main players from the last post were the ATA matrix and the ATY vector. These two could be incrementally updated as new data points came in, which would allow you to do an incremental (or “online”) least squares fit of a curve.

When working with surfaces and volumes, you have the same things basically. Both the ATA matrix and the ATY vector still exist, but they contain slightly different information – which I’ll explain lower down. However, the ATY vector is renamed, since in the case of a surface it should be called ATZ, and for a volume it should be called ATW. I call it ATV to generalize it, where v stands for value, and represents the last component in a data point, which is the output value we are trying to fit given the input values. The input values are the rest of the components of the data point.

At the end, you once again need to solve the equation $A^TA*c=A^Tv$ to calculate the coefficients (named c) of the equation.

It’s all pretty similar to fitting a curve, but the details change a bit. Let’s work through an example to see how it differs.

# Example: Bilinear Surface Fitting

Let’s fit 4 data points to a bilinear surface, otherwise known as a surface that is linear on each axis, or a surface of degree(1,1):
(0,0,5)
(0,1,3)
(1,0,8)
(1,1,2)

Since we are fitting those data points with a bilinear surface, we are looking for a function that takes in x,y values and gives as output the z coordinate. We want a surface that gives us the closest answer possible (minimizing the sum of the squared difference for each input data point) for the data points we do have, so that we can give it other data points and get z values as output that approximate what we expect to see for that input.

A linear equation looks like this, with coefficients A and B:
$y=Ax+B$

Since we want a bilinear equation this time around, this is the equation we are going to end up with, after solving for the coefficients A,B,C,D:
$y=Axy+Bx+Cy+D$

The first step is to make the A matrix. In the last post, this matrix was made up of powers of the x coordinates. In this post, they are actually going to be made up of the permutation of powers of the x and y coordinates.

Last time the matrix looked like this:
$A = \begin{bmatrix} x_0^0 & x_0^1 & x_0^2 \\ x_1^0 & x_1^1 & x_1^2 \\ x_2^0 & x_2^1 & x_2^2 \\ x_3^0 & x_3^1 & x_3^2 \\ \end{bmatrix}$

This time, the matrix is going to look like this:
$A = \begin{bmatrix} x_0^0y_0^0 & x_0^0y_0^1 & x_0^1y_0^0 & x_0^1y_0^1 \\ x_1^0y_1^0 & x_1^0y_1^1 & x_1^1y_1^0 & x_1^1y_1^1 \\ x_2^0y_2^0 & x_2^0y_2^1 & x_2^1y_2^0 & x_2^1y_2^1 \\ x_3^0y_3^0 & x_3^0y_3^1 & x_3^1y_3^0 & x_3^1y_3^1 \\ \end{bmatrix}$

Simplifying that matrix a bit, it looks like this:
$A = \begin{bmatrix} 1 & y_0 & x_0 & x_0y_0 \\ 1 & y_1 & x_1 & x_1y_1 \\ 1 & y_2 & x_2 & x_2y_2 \\ 1 & y_3 & x_3 & x_3y_3 \\ \end{bmatrix}$

To simplify it even further, there is one row in the A matrix per data point, where the row looks like this:
$\begin{bmatrix} 1 & y & x & xy \\ \end{bmatrix}$

You can see that every permutation of the powers of x and y for each data point is present in the matrix.

The A matrix for our data points is this:
$A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix}$

Next we need to calculate the ATA matrix by multiplying the transpose of that matrix, by that matrix.

$A^TA = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix} = \begin{bmatrix} 4 & 2 & 2 & 1 \\ 2 & 2 & 1 & 1 \\ 2 & 1 & 2 & 1 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix}$

Taking the inverse of that matrix we get this:

$(A^TA)^{-1} = \begin{bmatrix} 1 & -1 & -1 & 1 \\ -1 & 2 & 1 & -2 \\ -1 & 1 & 2 & -2 \\ 1 & -2 & -2 & 4 \\ \end{bmatrix}$

Next we need to calculate the ATV vector (formerly known as ATY). We calculate that by multiplying the transpose of the A matrix by the Z values:

$A^TV = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} * \begin{bmatrix} 5 \\ 3 \\ 8 \\ 2 \\ \end{bmatrix} = \begin{bmatrix} 18 \\ 5 \\ 10 \\ 2 \\ \end{bmatrix}$

Lastly we multiply the inversed ATA matrix by the ATV vector to get our coefficients.

$\begin{bmatrix} 1 & -1 & -1 & 1 \\ -1 & 2 & 1 & -2 \\ -1 & 1 & 2 & -2 \\ 1 & -2 & -2 & 4 \\ \end{bmatrix} * \begin{bmatrix} 18 \\ 5 \\ 10 \\ 2 \\ \end{bmatrix} = \begin{bmatrix} 5 \\ -2 \\ 3 \\ -4 \\ \end{bmatrix}$

In the last post, the coefficients we got out were in x power order, so the first (top) was for the $x^0$ term, the next was for the $x^1$ term etc.

This time around, the coefficients are in the same order as the permutations of the powers of x and y:
$\begin{bmatrix} 1 & y & x & xy \\ \end{bmatrix}$

That makes our final equation this:
$z = -4xy+3x-2y+5$

If you plug in the (x,y) values from the data set we fit, you’ll see that you get the corresponding z values as output. We perfectly fit the data set!

The process isn’t too different from last post and not too difficult either right?

Let’s see if we can generalize and formalize things a bit.

# Some Observations

Firstly you may be wondering how we come up with the correct permutation of powers of our inputs. It actually doesn’t matter so long as you are consistent. You can have your A matrix rows have the powers in any order, so long as all orders are present, and you use the same order in all operations.

Regarding storage sizes needed, the storage of surfaces and (hyper) volumes are a bit different and generally larger than curves.

To see how, let’s look at the powers of the ATA matrix of a bilinear surface, using the ordering of powers that we used in the example:

$\begin{bmatrix} x^0y^0 & x^0y^1 & x^1y^0 & x^1y^1 \\ x^0y^1 & x^0y^2 & x^1y^1 & x^1y^2 \\ x^1y^0 & x^1y^1 & x^2y^0 & x^2y^1 \\ x^1y^1 & x^1y^2 & x^2y^1 & x^2y^2 \\ \end{bmatrix}$

Let’s rewrite it as just the powers:

$\begin{bmatrix} 00 & 01 & 10 & 11 \\ 01 & 02 & 11 & 12 \\ 10 & 11 & 20 & 21 \\ 11 & 12 & 21 & 22 \\ \end{bmatrix}$

And the permutation we used as just powers to help us find the pattern in the powers of x and y in the ATA matrix:
$\begin{bmatrix} 00 & 01 & 10 & 11 \\ \end{bmatrix}$

Can you find the pattern of the powers used at the different spots in the ATA matrix?

I had to stare at it for a while before I figured it out but it’s this: For the i,j location in the ATA matrix, the powers of x and y are the powers of x and y in the i permutation added to the powers of x and y in the j permutation.

For example, $A^TA_{0,2}$ has xy powers of 10. Permutation 0 has powers of 0,0 and permutation 2 has powers of 1,0, so we add those together to get powers 1,0.

Another example, $A^TA_{2,3}$ has xy powers of 21. Permutation 2 has powers of 1,0 and permutation 3 has powers 1,1. Adding those together we get 2,1 which is correct.

That’s a bit more complex than last post, not too much more difficult to construct the ATA matrix directly – and also construct it incrementally as new data points come in!

How many unique values are there in the ATA matrix though? We need to know this to know how much storage we need.

In the last post, we needed (degree+1)*2–1 values to store the unique ATA matrix values. That can also be written as degree*2+1.

It turns out that when generalizing this to surfaces and volumes, that we need to take the product of that for each axis.

For instance, a surface has ((degreeX*2)+1)*((degreeY*2)+1) unique values. A volume has ((degreeX*2)+1)*((degreeY*2)+1)*((degreeZ*2)+1) unique values.

The pattern continues for higher dimensions, as well as lower, since you can see how in the curve case, it’s the same formula as it was before.

For the same ATA matrix size, a surface has more unique values than a curve does.

As far as what those values actually are, they are the full permutations of the powers of a surface (or hyper volume) that is one degree higher on each axis. For a bilinear surface, that means the 9 permutations of x and y for powers 0,1 and 2:
$x^0y^0,x^0y^1,x^0y^2,x^1y^0,x^1y^1,x^1y^2,x^2y^0,x^2y^1,x^2y^2$
Or simplified:
$1,y,y^2,x,xy,xy^2,x^2,x^2y,x^2y^2$

For the bilinear case, The ATV vector is the sums of the permutations of x,y multiplied by z, for every data point. In other words, you add this to ATV for each data point:
$\begin{bmatrix} z & yz & xz & xyz \\ \end{bmatrix}$

How much storage space do we need in general for the ATV vector then? it’s the product of (degree+1) for each axis.

For instance, a surface has (degreeX+1)*(degreeY+1) values in ATV, and a volume has (degreeX+1)*(degreeY+1)*(degreeZ+1).

You may also be wondering how many data points are required minimum to fit a curve, surface or hypervolume to a data set. The answer is that you need as many data points as there are terms in the polynomial. We are trying to solve for the polynomial coefficients, so there are as many unknowns as there are polynomial terms.

How many polynomial terms are there? There are as many terms as there are permutations of the axes powers involved. In other words, the size of ATV is also the minimum number of points you need to fit a curve, surface, or hypervolume to a data set.

# Measuring Quality of a Fit

You are probably wondering if there’s a way to calculate how good of a fit you have for a given data set. It turns out that there are a few ways to calculate a value for this.

The value I use in the code below and in the demos is called $R^2$ or residue squared.

First you calculate the average (mean) output value from the input data set.

Then you calculate SSTot which is the sum of the square of the mean subtracted from each input point’s output value. Pseudo code:

SSTot = 0;
for (point p in points)
SSTot += (p.out - mean)^2;


You then calculate SSRes which is the sum of the square of the fitted function evaluated at a point, subtracted from each input points’ output value. Pseudo code:

SSRes= 0;
for (point p in points)
SSRes += (p.out - f(p.in))^2;


The final value for R^2 is 1-SSRes/SSTot.

The value is nice because it’s unitless, and since SSRes and SSTot is a sum of squares, SSRes/SSTot is basically the value that the fitting algorithm minimizes. The value is subtracted from 1 so that it’s a fit quality metric. A value of 0 is a bad fit, and a value of 1 is a good fit and generally it will be between those values.

# Example Code

Here is a run from the sample code:

And here is the source code:

#include <stdio.h>
#include <array>

#define FILTER_ZERO_COEFFICIENTS true // if false, will show terms which have a coefficient of 0

//====================================================================
template<size_t N>
using TVector = std::array<float, N>;

template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

//====================================================================
// Specify a degree per axis.
// 1 = linear, 2 = quadratic, etc
template <size_t... DEGREES>
class COnlineLeastSquaresFitter
{
public:
COnlineLeastSquaresFitter ()
{
// initialize our sums to zero
std::fill(m_SummedPowers.begin(), m_SummedPowers.end(), 0.0f);
std::fill(m_SummedPowersTimesValues.begin(), m_SummedPowersTimesValues.end(), 0.0f);
}

// Calculate how many summed powers we need.
// Product of degree*2+1 for each axis.
template <class T>
constexpr static size_t NumSummedPowers(T degree)
{
return degree * 2 + 1;
}
template <class T, class... DEGREES>
constexpr static size_t NumSummedPowers(T first, DEGREES... degrees)
{
return NumSummedPowers(first) * NumSummedPowers(degrees...);
}

// Calculate how many coefficients we have for our equation.
// Product of degree+1 for each axis.
template <class T>
constexpr static size_t NumCoefficients(T degree)
{
return (degree + 1);
}
template <class T, class... DEGREES>
constexpr static size_t NumCoefficients(T first, DEGREES... degrees)
{
return NumCoefficients(first) * NumCoefficients(degrees...);
}

// Helper function to get degree of specific axis
static size_t Degree (size_t axisIndex)
{
static const std::array<size_t, c_dimension-1> c_degrees = { DEGREES... };
return c_degrees[axisIndex];
}

// static const values
static const size_t c_dimension = sizeof...(DEGREES) + 1;
static const size_t c_numCoefficients = NumCoefficients(DEGREES...);
static const size_t c_numSummedPowers = NumSummedPowers(DEGREES...);

// Typedefs
typedef TVector<c_numCoefficients> TCoefficients;
typedef TVector<c_dimension> TDataPoint;

// Function for converting from an index to a specific power permutation
static void IndexToPowers (size_t index, std::array<size_t, c_dimension-1>& powers, size_t maxDegreeMultiply, size_t maxDegreeAdd)
{
for (int i = c_dimension-2; i >= 0; --i)
{
size_t degree = Degree(i) * maxDegreeMultiply + maxDegreeAdd;
powers[i] = index % degree;
index = index / degree;
}
}

// Function for converting from a specific power permuation back into an index
static size_t PowersToIndex (std::array<size_t, c_dimension - 1>& powers, size_t maxDegreeMultiply, size_t maxDegreeAdd)
{
size_t ret = 0;
for (int i = 0; i < c_dimension - 1; ++i)
{
ret *= Degree(i) * maxDegreeMultiply + maxDegreeAdd;
ret += powers[i];
}
return ret;
}

// Add a datapoint to our fitting
{
// Note: It'd be a good idea to memoize the powers and calculate them through repeated
// multiplication, instead of calculating them on demand each time, using std::pow.

// add the summed powers of the input values
std::array<size_t, c_dimension-1> powers;
for (size_t i = 0; i < m_SummedPowers.size(); ++i)
{
IndexToPowers(i, powers, 2, 1);
for (size_t j = 0; j < c_dimension - 1; ++j)
}

// add the summed powers of the input value, multiplied by the output value
for (size_t i = 0; i < m_SummedPowersTimesValues.size(); ++i)
{
IndexToPowers(i, powers, 1, 1);
float valueAdd = dataPoint[c_dimension - 1];
for (size_t j = 0; j < c_dimension-1; ++j)
}
}

// Get the coefficients of the equation fit to the points
bool CalculateCoefficients (TCoefficients& coefficients) const
{
// make the ATA matrix
std::array<size_t, c_dimension - 1> powersi;
std::array<size_t, c_dimension - 1> powersj;
std::array<size_t, c_dimension - 1> summedPowers;
TMatrix<c_numCoefficients, c_numCoefficients> ATA;
for (size_t j = 0; j < c_numCoefficients; ++j)
{
IndexToPowers(j, powersj, 1, 1);

for (size_t i = 0; i < c_numCoefficients; ++i)
{
IndexToPowers(i, powersi, 1, 1);

for (size_t k = 0; k < c_dimension - 1; ++k)
summedPowers[k] = powersi[k] + powersj[k];

size_t summedPowersIndex = PowersToIndex(summedPowers, 2, 1);
ATA[j][i] = m_SummedPowers[summedPowersIndex];
}
}

// solve: ATA * coefficients = m_SummedPowers
// for the coefficients vector, using Gaussian elimination.
coefficients = m_SummedPowersTimesValues;
for (size_t i = 0; i < c_numCoefficients; ++i)
{
for (size_t j = 0; j < c_numCoefficients; ++j)
{
if (ATA[i][i] == 0.0f)
return false;

float c = ((i == j) - ATA[j][i]) / ATA[i][i];
coefficients[j] += c*coefficients[i];
for (size_t k = 0; k < c_numCoefficients; ++k)
ATA[j][k] += c*ATA[i][k];
}
}

// Note: this is the old, "bad" way to solve the equation using matrix inversion.
// It's a worse choice for larger matrices, and surfaces and volumes use larger matrices than curves in general.
/*
// Inverse the ATA matrix
TMatrix<c_numCoefficients, c_numCoefficients> ATAInverse;
if (!InvertMatrix(ATA, ATAInverse))
return false;

// calculate the coefficients
for (size_t i = 0; i < c_numCoefficients; ++i)
coefficients[i] = DotProduct(ATAInverse[i], m_SummedPowersTimesValues);
*/

return true;
}

private:
//Storage Requirements:
// Summed Powers = Product of degree*2+1 for each axis.
// Summed Powers Times Values = Product of degree+1 for each axis.
TVector<c_numSummedPowers>		m_SummedPowers;
TVector<c_numCoefficients>		m_SummedPowersTimesValues;
};

//====================================================================
char AxisIndexToLetter (size_t axisIndex)
{
// x,y,z,w,v,u,t,....
if (axisIndex < 3)
return 'x' + char(axisIndex);
else
return 'x' + 2 - char(axisIndex);
}

//====================================================================
template <class T, size_t M, size_t N>
float EvaluateFunction (const T& fitter, const TVector<M>& dataPoint, const TVector<N>& coefficients)
{
float ret = 0.0f;
for (size_t i = 0; i < coefficients.size(); ++i)
{
float term = coefficients[i];

// then the powers of the input variables
std::array<size_t, T::c_dimension - 1> powers;
fitter.IndexToPowers(i, powers, 1, 1);
for (size_t j = 0; j < powers.size(); ++j)
term *= (float)std::pow(dataPoint[j], powers[j]);

// add this term to our return value
ret += term;
}
return ret;
}

//====================================================================
template <size_t... DEGREES>
void DoTest (const std::initializer_list<TVector<sizeof...(DEGREES)+1>>& data)
{
// say what we are are going to do
printf("Fitting a function of degree (");
for (size_t i = 0; i < COnlineLeastSquaresFitter<DEGREES...>::c_dimension - 1; ++i)
{
if (i > 0)
printf(",");
printf("%zi", COnlineLeastSquaresFitter<DEGREES...>::Degree(i));
}
printf(") to %zi data points: n", data.size());

// show input data points
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
{
printf("  (");
for (size_t i = 0; i < dataPoint.size(); ++i)
{
if (i > 0)
printf(", ");
printf("%0.2f", dataPoint[i]);
}
printf(")n");
}

// fit data
COnlineLeastSquaresFitter<DEGREES...> fitter;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)

// calculate coefficients if we can
COnlineLeastSquaresFitter<DEGREES...>::TCoefficients coefficients;
bool success = fitter.CalculateCoefficients(coefficients);
if (!success)
{
printf("Could not calculate coefficients!nn");
return;
}

// print the polynomial
bool firstTerm = true;
printf("%c = ", AxisIndexToLetter(sizeof...(DEGREES)));
bool showedATerm = false;
for (int i = (int)coefficients.size() - 1; i >= 0; --i)
{
// don't show zero terms
if (FILTER_ZERO_COEFFICIENTS && std::abs(coefficients[i]) < 0.00001f)
continue;

showedATerm = true;

// show an add or subtract between terms
float coefficient = coefficients[i];
if (firstTerm)
firstTerm = false;
else if (coefficient >= 0.0f)
printf(" + ");
else
{
coefficient *= -1.0f;
printf(" - ");
}

printf("%0.2f", coefficient);

std::array<size_t, COnlineLeastSquaresFitter<DEGREES...>::c_dimension - 1> powers;
fitter.IndexToPowers(i, powers, 1, 1);

for (size_t j = 0; j < powers.size(); ++j)
{
if (powers[j] > 0)
printf("%c", AxisIndexToLetter(j));
if (powers[j] > 1)
printf("^%zi", powers[j]);
}
}
if (!showedATerm)
printf("0");
printf("n");

// Calculate and show R^2 value.
float rSquared = 1.0f;
if (data.size() > 0)
{
float mean = 0.0f;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
mean += dataPoint[sizeof...(DEGREES)];
mean /= data.size();
float SSTot = 0.0f;
float SSRes = 0.0f;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
{
float value = dataPoint[sizeof...(DEGREES)] - mean;
SSTot += value*value;

value = dataPoint[sizeof...(DEGREES)] - EvaluateFunction(fitter, dataPoint, coefficients);
SSRes += value*value;
}
if (SSTot != 0.0f)
rSquared = 1.0f - SSRes / SSTot;
}
printf("R^2 = %0.4fnn", rSquared);
}

//====================================================================
int main (int argc, char **argv)
{
// bilinear - 4 data points
DoTest<1, 1>(
{
TVector<3>{ 0.0f, 0.0f, 5.0f },
TVector<3>{ 0.0f, 1.0f, 3.0f },
TVector<3>{ 1.0f, 0.0f, 8.0f },
TVector<3>{ 1.0f, 1.0f, 2.0f },
}
);

// biquadratic - 9 data points
DoTest<2, 2>(
{
TVector<3>{ 0.0f, 0.0f, 8.0f },
TVector<3>{ 0.0f, 1.0f, 4.0f },
TVector<3>{ 0.0f, 2.0f, 6.0f },
TVector<3>{ 1.0f, 0.0f, 5.0f },
TVector<3>{ 1.0f, 1.0f, 2.0f },
TVector<3>{ 1.0f, 2.0f, 1.0f },
TVector<3>{ 2.0f, 0.0f, 7.0f },
TVector<3>{ 2.0f, 1.0f, 9.0f },
TVector<3>{ 2.0f, 2.5f, 12.0f },
}
);

// trilinear - 8 data points
DoTest<1, 1, 1>(
{
TVector<4>{ 0.0f, 0.0f, 0.0f, 8.0f },
TVector<4>{ 0.0f, 0.0f, 1.0f, 4.0f },
TVector<4>{ 0.0f, 1.0f, 0.0f, 6.0f },
TVector<4>{ 0.0f, 1.0f, 1.0f, 5.0f },
TVector<4>{ 1.0f, 0.0f, 0.0f, 2.0f },
TVector<4>{ 1.0f, 0.0f, 1.0f, 1.0f },
TVector<4>{ 1.0f, 1.0f, 0.0f, 7.0f },
TVector<4>{ 1.0f, 1.0f, 1.0f, 9.0f },
}
);

// trilinear - 9 data points
DoTest<1, 1, 1>(
{
TVector<4>{ 0.0f, 0.0f, 0.0f, 8.0f },
TVector<4>{ 0.0f, 0.0f, 1.0f, 4.0f },
TVector<4>{ 0.0f, 1.0f, 0.0f, 6.0f },
TVector<4>{ 0.0f, 1.0f, 1.0f, 5.0f },
TVector<4>{ 1.0f, 0.0f, 0.0f, 2.0f },
TVector<4>{ 1.0f, 0.0f, 1.0f, 1.0f },
TVector<4>{ 1.0f, 1.0f, 0.0f, 7.0f },
TVector<4>{ 1.0f, 1.0f, 1.0f, 9.0f },
TVector<4>{ 0.5f, 0.5f, 0.5f, 12.0f },
}
);

// Linear - 2 data points
DoTest<1>(
{
TVector<2>{ 1.0f, 2.0f },
TVector<2>{ 2.0f, 4.0f },
}
);

// Quadratic - 4 data points
DoTest<2>(
{
TVector<2>{ 1.0f, 5.0f },
TVector<2>{ 2.0f, 16.0f },
TVector<2>{ 3.0f, 31.0f },
TVector<2>{ 4.0f, 16.0f },
}
);

// Cubic - 4 data points
DoTest<3>(
{
TVector<2>{ 1.0f, 5.0f },
TVector<2>{ 2.0f, 16.0f },
TVector<2>{ 3.0f, 31.0f },
TVector<2>{ 4.0f, 16.0f },
}
);

system("pause");
return 0;
}


# Closing

The next logical step here for me would be to figure out how to break the equation for a surface or hypervolume up into multiple equations, like you’d have with a tensor product surface/hypervolume equation. It would also be interesting to see how to convert from these multidimensional polynomials to multidimensional Bernstein basis functions, which are otherwise known as Bezier rectangles (and Bezier hypercubes i guess).

The last post inverted the ATA matrix and multiplied by ATY to get the coefficients. Thanks to some feedback on reddit, I found out that is NOT how you want to solve this sort of equation. I ended up going with Gaussian elimination for this post which is more numerically robust while also being less computation to calculate. There are other options out there too that may be even better choices. I’ve found out that in general, if you are inverting a matrix in code, or even just using an inverted matrix that has already been given to you, you are probably doing it wrong. You can read more about that here: John D. Cook: Don’t invert that matrix.

I didn’t go over what to do if you don’t have enough data points because if you find yourself in that situation, you can either decrease the degree of one of the axes, or you could remove and axis completely if you wanted to. It’s situational and ambiguous what parameter to decrease when you don’t have enough data points to fit a specific curve or hypervolume, but it’s still possible to decay the fit to a lower degree or dimension if you hit this situation, because you will already have all the values you need in the ATA matrix values and the ATV vector. I leave that to you to decide how to handle it in your own usage cases. Something interesting to note is that ATA[0][0] is STILL the count of how many data points you have, so you can use this value to know how much you need to decay your fit to be able to handle the data set.

In the WebGL2 demo I mention, I use a finite difference method to calculate the normals of the surface, however since the surface is described by a polynomial, it’d be trivial to calculate the coefficients for the equations that described the partial derivatives of the surface for each axis and use those instead.

I also wanted to mention that in the case of surfaces and hypervolumes it’s still possible to get an imperfect fit to your data set, even though you may give the exact minimum required number of control points. The reason for this is that not all axes are necesarily created equal. If you have a surface of degree (1,2) it’s linear on the x axis, but quadratic on the y axis, and requires a minimum of 6 data points to be able to fit a data set. As you can imagine, it’s possible to give data points such that the data points ARE NOT LINEAR on the x axis. When this happens, the surface will not be a perfect fit.

Lastly, you may be wondering how to fit data sets where there is more than one output value, like an equation of the form $(z,w)=f(x,y)$.

I’m not aware of any ways to least square fit that as a whole, but apparently a common practice is to fit one equation to z and another to w and treat them independently. There is a math stack exchange question about that here: Math Stack Exchange: Least square fitting multiple values

Here is the webgl demo that goes with this post again:
Least Squares Surface Fitting

Thanks for reading, and please speak up if you have anything to add or correct, or a comment to make!

# Incremental Least Squares Curve Fitting

This Post In Short:

• Fit a curve of degree N to a data set, getting data points 1 at a time.
• Storage Required: 3*N+2 values.
• Update Complexity: roughly 3*N+2 additions and multiplies.
• Finalize Complexity: Solving Ax=b where A is an (N+1)x(N+1) matrix and b is a known vector. (Sample code inverts A matrix and multiplies by b, Gaussian elimination is better though).
• Simple C++ code and HTML5 demo at bottom!

I was recently reading a post from a buddy on OIT or “Order Independent Transparency” which is an open problem in graphics:
Fourier series based OIT and why it won’t work

In the article he talks about trying to approximate a function per pixel and shows the details of some methods he tried. One of the difficulties with the problem is that during a render you can get any number of triangles affecting a specific pixel, but you need a fixed and bounded size amount of storage per pixel for those variable numbers of data points.

That made me wonder: Is there an algorithm that can approximate a data set with a function, getting only one data point at a time, and end up with a decent approximation?

It turns out that there is one, at least one that I am happy with: Incremental Least Squares Curve Fitting.

While this perhaps doesn’t address all the problems that need addressing for OIT specifically, I think this is a great technique for programming in general, and I’m betting it still has it’s uses in graphics, for other times when you want to approximate a data set per pixel.

We’ll work through a math oriented way to do it, and then we’ll convert it into an equivalent and simpler programmer friendly version.

At the bottom of the post is some simple C++ that implements everything we talk about and the image below is a screenshot of an an interactive HTML5 demo I made: Least Squares Curve Fitting

# Mathy Version

Math Stack Exchange: Creating a function incrementally

I have to admit, I’m not so great with matrices outside of the typical graphics/gamedev usage cases of transormation and related, so it took me a few days to work through it and understand it all. If reading that answer made your eyes go blurry, give my explanation a shot. I’m hoping I gave step by step details enough such that you too can understand what the heck he was talking about. If not, let me know where you got lost and I can explain better and update the post.

The first thing we need to do is figure out what degree of a function we want to approximate our data with. For our example we’ll pick a degree 2 function, also known as a quadratic function. That means that when we are done we will get out a function of the form below:

$y=ax^2+bx+c$

We will give data points to the equation and it will calculate the values of a,b and c that approximate our function by minimizing the sum of the squared distance from each point to the curve.

We’ll deal with regular least squared fitting before moving onto incremental, so here’s the data set we’ll be fitting our quadratic curve to:

$(1,5),(2,16),(3,31),(4,50)$

The x values in my data set start at 1 and count up by 1, but that is not a requirement. You can use whatever x and y values you want to fit a curve to.

Next we need to calculate the matrix $A$, where $A_{jk} = x_j^k$ and the matrix has NumDataPoints rows and Degree+1 columns. It looks like the below for a quadratic curve fitting 4 data points:

$A = \begin{bmatrix} x_0^0 & x_0^1 & x_0^2 \\ x_1^0 & x_1^1 & x_1^2 \\ x_2^0 & x_2^1 & x_2^2 \\ x_3^0 & x_3^1 & x_3^2 \\ \end{bmatrix}$

When we plug in our specific x values we get this:

$A = \begin{bmatrix} 1^0 & 1^1 & 1^2 \\ 2^0 & 2^1 & 2^2 \\ 3^0 & 3^1 & 3^2 \\ 4^0 & 4^1 & 4^2 \\ \end{bmatrix}$

Calculating it out we get this:

$A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \\ \end{bmatrix}$

Next we need to calculate the matrix $A^TA$, which we do below by multiplying the transpose of A by A:

$A^TA = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ \end{bmatrix} * \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \\ \end{bmatrix} = \begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

Next we need to find the inverse of that matrix to get $(A^TA)^{-1}$. The inverse is:

$(A^TA)^{-1} = \begin{bmatrix} 31/4 & -27/4 & 5/4 \\ -27/4 & 129/20 & -5/4 \\ 5/4 & -5/4 & 1/4 \\ \end{bmatrix}$

The next thing we need to calculate is $A^TY$, which is the transpose of A multiplied by all of the Y values of our data:

$A^TY = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ \end{bmatrix} * \begin{bmatrix} 5 \\ 16 \\ 31 \\ 50 \\ \end{bmatrix} = \begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

And finally, to calculate the coefficients of our quadratic function, we need to calculate $(A^TA)^{-1}*A^TY$:

$(A^TA)^{-1}*A^TY = \begin{bmatrix} 31/4 & -27/4 & 5/4 \\ -27/4 & 129/20 & -5/4 \\ 5/4 & -5/4 & 1/4 \\ \end{bmatrix} * \begin{bmatrix} 102 \\ 330 \\ 1148 \\ \end{bmatrix} = \begin{bmatrix} -2 & 5 & 2 \\ \end{bmatrix}$

Those coefficients are listed in power order of x, so the first value -2 is the coefficient for x^0, 5 is the coefficient for x^1 and 2 is the coefficient for x^2. That gives us the equation:

$y=2x^2+5x-2$

If you plug in the x values from our data set, you’ll find that this curve perfectly fits all 4 data points.

It won’t always be (and usually won’t be) that a resulting curve matches the input set for all values. It just so happened that this time it does. The only guarantee you’ll get when fitting a curve to the data points is that the squared distance of the point to the curve (distance on the Y axis only, so vertical distance), is minimized for all data points.

Now that we’ve worked through the math, let’s make some observations and make it more programmer friendly.

# Making it Programmer Friendly

Let’s look at the $A^TA$ matrix again:

$\begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

One thing you probably noticed right away is that it’s symmetric across the diagonal. Another thing you may have noticed is that there are only 5 unique values in that matrix.

As it turns out, those 5 values are just the sum of the x values, when those x values are raised to increasing powers.

• If you take all x values of our data set, raise them to the 0th power and sum the results, you get 4.
• If you take all x values of our data set, raise them to the 1st power and sum the results, you get 10.
• If you take all x values of our data set, raise them to the 2nd power and sum the results, you get 30.
• If you take all x values of our data set, raise them to the 3rd power and sum the results, you get 100.
• If you take all x values of our data set, raise them to the 4th power and sum the results, you get 354.

Further more, the power of the x values in each part of the matrix is the zero based x axis index plus the zero based y axis index. Check out what i mean below, which shows which power the x values are taken to before being summed for each location in the matrix:

$\begin{bmatrix} 0 & 1 & 2 \\ 1 & 2 & 3 \\ 2 & 3 & 4 \\ \end{bmatrix}$

That is interesting for two reasons…

1. This tells us that we only really need to store the 5 unique values, and that we can reconstruct the full matrix later when it’s time to calculate the coefficients.
2. It also tells us that if we’ve fit a curve to some data points, but then want to add a new data point, that we can just raise the x value of our new data point to the different powers and add it into these 5 values we already have stored. In other words, the $A^TA$ matrix can be incrementally adjusted as new data comes in.

This generalizes beyond quadratic functions too luckily. If you are fitting your data points with a degree N curve, the $A^TA$ matrix will have N+1 rows, and N+1 columns, but will only have (N+1)*2-1 unique values stored in it. Those values will be the sum of the x values taken from the 0th power up to the (N+1)*2-2th power.

As a concrete example, a cubic fit will have an $A^TA$ array that is 4×4, which will only have 7 unique values stored in it. Those values will be the x values raised to the 0th power and summed, all the way up to the x values raised to the 6th power and summed.

So, the $A^TA$ matrix has a fixed storage amount of (degree+1)*2 – 1 values, and it can be incrementally updated.

That is great, but there is another value we need to look at too, which is the $A^TY$ vector. Let’s see that again:

$\begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

There are some patterns to this vector too luckily. You may have noticed that the first entry is the sum of the Y values from our data set. It’s actually the sum of the y values multiplied by the x values raised to the 0th power.

The next number is the sum of the y values multiplied by the x values raised to the 1st power, and so on.

To generalize it, each entry in that vector is the sum of taking the x from each data point, raising it to the power that is the index in the vector, and multiplying it by the y value.

• Taking each data point’s x value, raising it to the 0th power, multiplying by the y value, and summing the results gives you 102.
• Taking each data point’s x value, raising it to the 1st power, multiplying by the y value, and summing the results gives you 330.
• Taking each data point’s x value, raising it to the 2nd power, multiplying by the y value, and summing the results gives you 1148.

So, this vector is incrementally updatable too. When you get a new data point, for each entry in the vector, you take the x value to the specific power, multiply by y, and add that result to the entry in the vector.

This generalizes for other curve types as well. If you are fitting your data points with a degree N curve, the $A^TY$ vector will have N+1 entries, corresponding to the powers: 0,1,…N.

As a concrete example, a cubic fit will have an $A^TY$ vector of size 4, corresponding to the powers: 0,1,2,3.

Combining the storage needs of the values needed for the $A^TA$ matrix, as well as the values needed for the $A^TY$ vector, the amount of storage we need for a degree N curve fit is 3*N+2 values.

# Algorithm Summarized

Here is a summary of the algorithm:

1. First decide on the degree of the fit you want. Call it N.
2. Ensure you have storage space for 3*N+2 values and initialize them all to zero. These represent the (N+1)*2-1 values needed for the $A^TA$ matrix values, as well as the N+1 values needed for the $A^TY$ vector.
3. For each data point you get, you will need to update both the $A^TA$ matrix values, as well as the $A^TY$ vector valuess. (Note that there are not the same number of values in ATA and ATY!)
• for(i in ATA) ATA[i] += x^i
• for(i in ATY) ATY[i] += x^i*y
4. When it’s time to calculate the coefficients of your polynomial, convert the ATA values back into the $A^TA$ matrix, invert it and multiply that by the $A^TY$ value.

Pretty simple right?

# Not Having Enough Points

When working through the mathier version of this algorithm, you may have noticed that if we were trying to fit using a degree N curve, that we needed N+1 data points at minimum for the math to even be able to happen.

So, you might ask, what happens in the real world, such as in a pixel shader, where we decide to do a cubic fit, but end up only getting 1 data point, instead of the 4 minimum that we need?

Well, first off, if you use the programmer friendly method of incrementally updating ATA and ATY, you’ll end up with an uninvertible matrix (0 determinant), but that doesn’t really help us any besides telling us when we don’t have enough data.

There is something pretty awesome hiding here though. Let’s look at the ATA matrix and ATY values from our quadratic example again.

$A^TA = \begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

$A^TY = \begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

The above values are for a quadratic fit. What if we wanted a linear fit instead? Well… the upper left 2×2 matrix in ATA is the ATA matrix for the linear fit! Also, the first two values in the ATY vector is the ATY vector if we were doing a linear fit.

$A^TA = \begin{bmatrix} 4 & 10 \\ 10 & 30 \\ \end{bmatrix}$

$A^TY = \begin{bmatrix} 102 & 330 \\ \end{bmatrix}$

You can verify that the linear fit above is correct if you want, but let’s take it down another degree, down to approximating the fit with a point. They become scalars instead of matrices and vectors:

$A^TA = 4 \\ A^TY = 102$

If we take the inverse of ATA and multiply it by ATY, we get:

$1/4 * 102 = 25.5$

if you average the Y values of our input data, you’ll find that it is indeed 25.5, so we have verified that it does give us a degree 0 fit.

This is neat and all, but how can we KNOW if we’ve collected enough data or not? Do we just try to invert our ATA matrix, and if it fails, try one degree lower, repeatedly, until we succeed or fail at a degree 0 approximation? Do we maybe instead store a counter to keep track of how many points we have seen?

Luckily no, and maybe you have already put it together. The first value in the ATA array actually TELLS you how many points you have been given. You can use that to decide what degree you are going to have to actually fit the data set to when it’s time to calculate your coefficients, to avoid the uninvertible matrix and still get your data fit.

# Interesting Tid Bits

Something pretty awesome about this algorithm is that it can work in a multithreaded fashion very easily. One way would be to break apart the work into multiple job threads, have them calculate ATA and ATY independently, and then sum them all together on the main thread. Another way to do it would be to let all threads share the same ATA and ATY storage, but to use an atomic add operation to update them.

Going the atomic add route, I think this could be a relatively GPU friendly algorithm. You could use actual atomic operations in your shader code, or you could use alpha blending to add your result in.

Even though we saw it in the last section, I’ll mention it again. If you do a degree 0 curve fit to data points (aka fitting a point to the data), this algorithm is mathematically equivalent to just taking the average y value. The ATA values will have a single value which is the sum of the x values to the 0th degree, so will be the count of how many x items there are. The ATY values will also have only a single value, which will be the sum of the x^0*y values, so will be the sum of the y values. Taking the inverse of our 1×1 ATA matrix will give us one divided by how many items there are, so when we multiply that by the ATA vector which only has one item, it will be the same as if we divided our Y value sum by how many data points we had. So, in a way, this algorithm seems to be some sort of generalization of averaging, which is weird to me.

Another cool thing: if you have the minimum number of data points for your degree (aka degree+1 data points) or fewer, you can actually use the ATA and ATY values to get back your ORIGINAL data points – both the X and the Y values! I’ll leave it as an exercise for you, but if you look at it, you will always have more equations than you do unknowns.

If reconstructing the original data points is important to you, you could also have this algorithm operate in two modes.

Basically, always use the ATA[0] storage space to count the number of data points you’ve been given, since that is it’s entire purpose. You can then use the rest of the storage space as RAW data storage for your 2d input values. As soon as adding another value would cause you to overflow your storage, you could process your data points into the correct format of storing just ATA and ATY values, so that from then on, it was an approximation, instead of explicit point storage.

When decoding those values, you would use the ATA[0] storage space to know whether the rest of the storage contained ATA and ATY values, or if they contained data points. If they contained data points, you would also know how many there were, and where they were in the storage space, using the same logic to read data points out as you used to put them back in – basically like saying that the first data point goes immediately after ATA[0], the second data point after that, etc.

The last neat thing, let’s say that you are in a pixel shader as an exmaple, and that you wanted to approximate 2 different values for each pixel, but let’s say that the X value was always the same for these data sets – maybe you are approximating two different values over depth of the pixel for instance so X of both data points is the depth, but the Y values of the data points are different.

If you find yourself in a situation like this, you don’t actually need to pay the full cost of storage to have a second approximation curve.

Since the ATA values are all based on powers of the x values only, the ATA values would be the same for both of these data sets. You only need to pay the cost of the ATY values for the second curve.

This means that fitting a curve costs an initial 3*degree+2 in storage, but each additional curve only costs degree+1 in storage.

Also, since the ATA storage for a curve of degree N also contains the same values used for a curve of degree N-1, N-2, etc, you don’t have to use the same degree when approximating multiple values using the same storage. Your ATA just has to be large enough to hold the largest degree curve, and then you can have ATY values that are sized to the degree of the curve you want to use to approximate each data set.

This way, if you have limited storage, you could perhaps cubically fit one data set, and then linearly fit another data set where accuracy isn’t as important.

For that example, you would pay 11 values of storage for the cubic fit, and then only 2 more values of storage to have a linear fit of some other data.

# Example Code

There is some example code below that implements the ideas from this post.

The code is meant to be clear and readable firstly, with being a reasonably decent implementation second. If you are using this in a place where you want high precision and/or high speeds, there are likely both macro and micro optimizations and code changes to be made. The biggest of these is probably how the matrix is inverted.

You can read more on the reddit discussion: Reddit: Incremental Least Squares Curve Fitting

Here’s a run of the example code:

Here is the example code:

#include <stdio.h>
#include <array>

//====================================================================
template<size_t N>
using TVector = std::array<float, N>;

template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

template<size_t N>
using TSquareMatrix = TMatrix<N,N>;

typedef TVector<2> TDataPoint;

//====================================================================
template <size_t N>
float DotProduct (const TVector<N>& A, const TVector<N>& B)
{
float ret = 0.0f;
for (size_t i = 0; i < N; ++i)
ret += A[i] * B[i];
return ret;
}

//====================================================================
template <size_t M, size_t N>
void TransposeMatrix (const TMatrix<M, N>& in, TMatrix<N, M>& result)
{
for (size_t j = 0; j < M; ++j)
for (size_t k = 0; k < N; ++k)
result[k][j] = in[j][k];
}

//====================================================================
template <size_t M, size_t N>
void MinorMatrix (const TMatrix<M, N>& in, TMatrix<M-1, N-1>& out, size_t excludeI, size_t excludeJ)
{
size_t destI = 0;
for (size_t i = 0; i < N; ++i)
{
if (i != excludeI)
{
size_t destJ = 0;
for (size_t j = 0; j < N; ++j)
{
if (j != excludeJ)
{
out[destI][destJ] = in[i][j];
++destJ;
}
}
++destI;
}
}
}

//====================================================================
template <size_t M, size_t N>
float Determinant (const TMatrix<M,N>& in)
{
float determinant = 0.0f;
TMatrix<M - 1, N - 1> minor;
for (size_t j = 0; j < N; ++j)
{
MinorMatrix(in, minor, 0, j);

float minorDeterminant = Determinant(minor);
if (j % 2 == 1)
minorDeterminant *= -1.0f;

determinant += in[0][j] * minorDeterminant;
}
return determinant;
}

//====================================================================
template <>
float Determinant<1> (const TMatrix<1,1>& in)
{
return in[0][0];
}

//====================================================================
template <size_t N>
bool InvertMatrix (const TSquareMatrix<N>& in, TSquareMatrix<N>& out)
{
// calculate the cofactor matrix and determinant
float determinant = 0.0f;
TSquareMatrix<N> cofactors;
TSquareMatrix<N-1> minor;
for (size_t i = 0; i < N; ++i)
{
for (size_t j = 0; j < N; ++j)
{
MinorMatrix(in, minor, i, j);

cofactors[i][j] = Determinant(minor);
if ((i + j) % 2 == 1)
cofactors[i][j] *= -1.0f;

if (i == 0)
determinant += in[i][j] * cofactors[i][j];
}
}

// matrix cant be inverted if determinant is zero
if (determinant == 0.0f)
return false;

// calculate the adjoint matrix into the out matrix
TransposeMatrix(cofactors, out);

// divide by determinant
float oneOverDeterminant = 1.0f / determinant;
for (size_t i = 0; i < N; ++i)
for (size_t j = 0; j < N; ++j)
out[i][j] *= oneOverDeterminant;
return true;
}

//====================================================================
template <>
bool InvertMatrix<2> (const TSquareMatrix<2>& in, TSquareMatrix<2>& out)
{
float determinant = Determinant(in);
if (determinant == 0.0f)
return false;

float oneOverDeterminant = 1.0f / determinant;
out[0][0] =  in[1][1] * oneOverDeterminant;
out[0][1] = -in[0][1] * oneOverDeterminant;
out[1][0] = -in[1][0] * oneOverDeterminant;
out[1][1] =  in[0][0] * oneOverDeterminant;
return true;
}

//====================================================================
template <size_t DEGREE>  // 1 = linear, 2 = quadratic, etc
class COnlineLeastSquaresFitter
{
public:
COnlineLeastSquaresFitter ()
{
// initialize our sums to zero
std::fill(m_SummedPowersX.begin(), m_SummedPowersX.end(), 0.0f);
std::fill(m_SummedPowersXTimesY.begin(), m_SummedPowersXTimesY.end(), 0.0f);
}

{
// add the summed powers of the x value
float xpow = 1.0f;
for (size_t i = 0; i < m_SummedPowersX.size(); ++i)
{
m_SummedPowersX[i] += xpow;
xpow *= dataPoint[0];
}

// add the summed powers of the x value, multiplied by the y value
xpow = 1.0f;
for (size_t i = 0; i < m_SummedPowersXTimesY.size(); ++i)
{
m_SummedPowersXTimesY[i] += xpow * dataPoint[1];
xpow *= dataPoint[0];
}
}

bool CalculateCoefficients (TVector<DEGREE+1>& coefficients) const
{
// initialize all coefficients to zero
std::fill(coefficients.begin(), coefficients.end(), 0.0f);

// calculate the coefficients
return CalculateCoefficientsInternal<DEGREE>(coefficients);
}

private:

template <size_t EFFECTIVEDEGREE>
bool CalculateCoefficientsInternal (TVector<DEGREE + 1>& coefficients) const
{
// if we don't have enough data points for this degree, try one degree less
if (m_SummedPowersX[0] <= EFFECTIVEDEGREE)
return CalculateCoefficientsInternal<EFFECTIVEDEGREE - 1>(coefficients);

// Make the ATA matrix
TMatrix<EFFECTIVEDEGREE + 1, EFFECTIVEDEGREE + 1> ATA;
for (size_t i = 0; i < EFFECTIVEDEGREE + 1; ++i)
for (size_t j = 0; j < EFFECTIVEDEGREE + 1; ++j)
ATA[i][j] = m_SummedPowersX[i + j];

// calculate inverse of ATA matrix
TMatrix<EFFECTIVEDEGREE + 1, EFFECTIVEDEGREE + 1> ATAInverse;
if (!InvertMatrix(ATA, ATAInverse))
return false;

// calculate the coefficients for this degree. The higher ones are already zeroed out.
TVector<EFFECTIVEDEGREE + 1> summedPowersXTimesY;
std::copy(m_SummedPowersXTimesY.begin(), m_SummedPowersXTimesY.begin() + EFFECTIVEDEGREE + 1, summedPowersXTimesY.begin());
for (size_t i = 0; i < EFFECTIVEDEGREE + 1; ++i)
coefficients[i] = DotProduct(ATAInverse[i], summedPowersXTimesY);
return true;
}

// Base case when no points are given, or if you are fitting a degree 0 curve to the data set.
template <>
bool CalculateCoefficientsInternal<0> (TVector<DEGREE + 1>& coefficients) const
{
if (m_SummedPowersX[0] > 0.0f)
coefficients[0] = m_SummedPowersXTimesY[0] / m_SummedPowersX[0];
return true;
}

// Total storage space (# of floats) needed is 3 * DEGREE + 2
// Where y is number of values that need to be stored and x is the degree of the polynomial
TVector<(DEGREE + 1) * 2 - 1>   m_SummedPowersX;
TVector<DEGREE + 1>             m_SummedPowersXTimesY;
};

//====================================================================
template <size_t DEGREE>
void DoTest(const std::initializer_list<TDataPoint>& data)
{
printf("Fitting a curve of degree %zi to %zi data points:n", DEGREE, data.size());

COnlineLeastSquaresFitter<DEGREE> fitter;

// show data
for (const TDataPoint& dataPoint : data)
printf("  (%0.2f, %0.2f)n", dataPoint[0], dataPoint[1]);

// fit data
for (const TDataPoint& dataPoint : data)

// calculate coefficients if we can
TVector<DEGREE+1> coefficients;
bool success = fitter.CalculateCoefficients(coefficients);
if (!success)
{
printf("ATA Matrix could not be inverted!n");
return;
}

// print the polynomial
bool firstTerm = true;
printf("y = ");
bool showedATerm = false;
for (int i = (int)coefficients.size() - 1; i >= 0; --i)
{
// don't show zero terms
if (std::abs(coefficients[i]) < 0.00001f)
continue;

showedATerm = true;

// show an add or subtract between terms
float coefficient = coefficients[i];
if (firstTerm)
firstTerm = false;
else if (coefficient >= 0.0f)
printf(" + ");
else
{
coefficient *= -1.0f;
printf(" - ");
}

printf("%0.2f", coefficient);

if (i > 0)
printf("x");

if (i > 1)
printf("^%i", i);
}
if (!showedATerm)
printf("0");
printf("nn");
}

//====================================================================
int main (int argc, char **argv)
{
// Point - 1 data points
DoTest<0>(
{
TDataPoint{ 1.0f, 2.0f },
}
);

// Point - 2 data points
DoTest<0>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
}
);

// Linear - 2 data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
}
);

// Linear - 3 colinear data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 6.0f },
}
);

// Linear - 3 non colinear data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 5.0f },
}
);

// Quadratic - 3 colinear data points
DoTest<2>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 6.0f },
}
);

// Quadratic - 3 data points
DoTest<2>(
{
TDataPoint{ 1.0f, 5.0f },
TDataPoint{ 2.0f, 16.0f },
TDataPoint{ 3.0f, 31.0f },
}
);

// Cubic - 4 data points
DoTest<3>(
{
TDataPoint{ 1.0f, 5.0f },
TDataPoint{ 2.0f, 16.0f },
TDataPoint{ 3.0f, 31.0f },
TDataPoint{ 4.0f, 16.0f },
}
);

// Cubic - 2 data points
DoTest<3>(
{
TDataPoint{ 1.0f, 7.0f },
TDataPoint{ 3.0f, 17.0f },
}
);

// Cubic - 1 data point
DoTest<3>(
{
TDataPoint{ 1.0f, 7.0f },
}
);

// Cubic - 0 data points
DoTest<3>(
{
}
);

system("pause");
return 0;
}


# Feedback

There’s some interesting feedback on twitter.

Here’s an interactive demo to let you get a feel for how least squares curve fitting behaves:
Least Squares Curve Fitting

A good online polynomial curve fitting calculator

By the way, the term for an algorithm which works incrementally by taking only some of the data at a time is called an “online algorithm”. If you are ever in search of an online algorithm to do X (whatever X may be), using this term can be very helpful when searching for such an algorithm, or when asking people if such an algorithm exists (like on stack exchange). Unfortunately, online is a bit overloaded in the modern world, so it can also give false hits (;
Wikipedia: Online algorithm

# Evaluating Points on Analytical Surfaces and in Analytical Volumes Using the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension shows how to use the technique to evaluate points on surfaces and inside of volumes, where those surfaces and volumes are defined either by Bezier curves or polynomials (Tensor products of polynomials to be more specific).

As an example of what this post will allow you to do:

• By taking a single sample of a 3d RGBA volume texture, you’ll be able to get a bicubic interpolated value (a bicubic surface).
• Alternately, taking a single sample of a 3d RGBA volume texture will allow you to get a linear interpolation between two biquadratic surfaces (a linear/biquadratic volume).
• This post also covers how to extend this to higher degree surfaces and volumes.

Here are two images generated by the WebGL2 demos I made for this post which utilize this technique for rendering surfaces, fog volumes, and solid volumes. (link to the demos at bottom of post!)

All textures are size 2 on each axis which makes it a cache friendly technique (you can grow the texture sizes for piecewise curves/surfaces/volumes though). It leverages the hardware interpolation which makes it a relatively computationally inexpensive technique, and it supports all polynomials within the limitations of floating point math, so is also very flexible and expressive. You could even extend this to rational polynomial surfaces and volumes which among other things would allow perfect representations of conic sections.

The animated Bezier curve images in this post came from wikipedia. Go have a look and drop them a few bucks if you find wikipedia useful!
Wikipedia: Bézier curve

# Curves

If you’ve read my curve paper and understand the basics you can skip this section and go onto the section “Before Going Into Surfaces”.

Let’s talk about how to store curves of various degrees in textures and evaluate points on them using the GPU Texture sampler. We’ll need this info when we are working with surfaces and volumes because a higher degree curve is dual to a section of lower degree surface or an even lower degree volume.

The three ways we’ll be talking about controlling the order of curves are:

1. Texture Dimensionality – 1d texture vs 2d texture vs 3d texture vs 4d texture.
2. Number of Color Channels – How many color channels are used? R? RG? RGB? RGBA?
3. Multiple Texture Samples – Doing multiple texture reads.

## Texture Dimensionality

By texture dimensionality I mean how many dimensions the texture has. In all cases, the size of the texture is going to be 2 on each axis.

Starting with a 1d texture, we have a single texture coordinate (u) to sample along. As we change the u value from 0 to 1, we are just linearly interpolating between the two values. A 1d texture that has 2 pixels in it can store a degree 1 curve, also known as a linear Bezier curve. With linear texture sampling, the GPU hardware will do this linear interpolation for you.

The equation for linear interpolation between two values A and B which are at t=0 and t=1 respectively is:
$A*(1-t) + B*t$

Here’s the 1d texture:

Here’s a linear curve:

Going to a 2d texture it gets more interesting. We now have two texture coordinates to sample along (u,v). Using linear sampling, the hardware will do bilinear interpolation (linear interpolation across each axis) to get the value at a specific (u,v) texture coordinate.

Here is the equation for bilinear interpolation between 4 values A,B,C,D which are at texture coordinates (0,0), (1,0), (0,1), (1,1) respectively, being sampled at (u,v):

$(A*(1-u) + B*u)*(1-v) + (C*(1-u) + D*u) * v$

That equation interpolates from A to B by u (x axis), and from C to D by u (x axis), and then interpolates from the first result to the second by v (y axis). Note that it doesn’t actually matter which axis is interpolated by first. An equivelant equation would be one that interpolates from A to B by v (y axis) and from B to C by v (y axis) and then between those results by u (x axis).

With that equation, something interesting starts to happen if you use the same value (t) for u and v, expand and simplify, and end up at this equation:

$A*(1-t)^2 + (B+C)*(1-t)t + Dt^2$

That equation is very close to the quadratic Bezier formula, which is below:

$A*(1-t)^2 + B*2(1-t)t + Ct^2$

To get to that equation, we just make B and C the same value (B), and rename D to C since that letter is unused. This tells us how we need to set up our 2d texture such that when we sample along the diagonal, we get the correct point on our quadratic Bezier curve:

Here’s a quadratic Bezier curve in action. You can see how it is a linear interpolation between two linear interpolations, just like taking a bilinearly interpolated sample on our texture is.

Taking this to a 3d texture, we now have three texture coordinates to sample along (u,v,w). Again, with linear sampling turned on, the hardware will do trilinear interpolation to get the value at a specific (u,v,w) texture coordinate.

If we follow the same process as the 2d texture, we will wind up with the equation for a cubic Bezier curve:

$A*(1-t)^3 + B*3(1-t)^2t + C*3(1-t)t^2 + Dt^3$

Here’s how the texture is laid out:

Here’s a cubic Bezier curve in action, where you can see 3 levels of linear interpolations, just like how trilinear interpolation works:

While I have never used a 4d texture it appears that directx supports them and there looks to be an OpenGL extension to support them as well.

If we took this to a 4d texture, we would end up with the equation for a quartic curve. If you have trouble visualizing what a 4d texture even looks like, you aren’t alone. You have four texture coordinates to sample along (u,v,w,t). When you sample it, there are two 3d volume textures that are sampled at (u,v,w), resulting in two values as a result. These values are then interpolated by t to give you the final value. A fourth dimensional texture lookup is just an interpolation between 2 three dimensional texture lookups. That is true of all dimensional texture lookups in fact. An N dimensional texture lookup is just the linear interpolation between two N-1 dimensional texture lookups. For example, a three dimensional texture lookup is just an interpolation between 2 two dimensional texture lookups. This “hierchical interpolation” is the link I noticed between texture interpolation and the De Casteljau algorithm, since that is also a hierchical interpolation algorithm, just with fewer values interpolated between.

Here’s how the 4d texture is laid out:

Here’s the quartic Bezier equation, which is what you get the answer to if you sample a 4d texture at (t,t,t,t):

$A*(1-t)^4 + B*4(1-t)^3t + C*6(1-t)^2t^2 + D*4(1-t)t^3 + Et^4$

Here’s a quartic Bezier curve in action, showing 4 levels of linear interpolation, just like how quadrilinear interpolation works with 4d textures:

So, the bottom line of this section is that if we sample along the diagonal of an N dimensional texture which has one color channel, we will get points on a degree N curve.

## Number of Color Channels

Another way we can control the degree of a curve stored in a texture is by the number of color channels that are stored in the texture.

In the section above we showed a 1d texture that stored a linear curve. it had only one color channel:

Let’s add another color channel. A,B will be stored in the red channel, and B,C will be stored in the green channel:

When we read that texture at location (t), we will get the following values:

1. R: The linear interpolation between A and B at time t.
2. G: The linear interpolation between B and C at time t.

Now, if we just lerp between R and G in our shader, for time t, we will get the point at time t, on the cubic Bezier curve defined by control points A,B,C.

Pretty cool right?

What happens if we add another color channel, blue?

Well, when we sample the texture at time t, we get the following values:

1. R: The linear interpolation between A and B at time t.
2. G: The linear interpolation between B and C at time t.
3. B: The linear interpolation between C and D at time t.

We can combine these values using the quadratic Bezier curve formula, as if these were each a control point:

$R*(1-t)^2 + G*2(1-t)t + Bt^2$

The result we get is a point on the CUBIC curve defined by the four control points A,B,C,D.

In the previous section, it took a 3d volume texture to calculate a cubic curve. In this section we were able to do it with a 1d RGB texture, but it came at the cost of of having to do some calculation in the shader code after sampling the texture to combine the color channels and get the final result.

How exactly does adding a color channel affect the degree though? Each color channel added increases the degree by 1.

You can see this is true by seeing in the last section how a 3 dimensional texture can evaluate a cubic, and a 4 dimensional texture can evalaute a quartic, but the 4th dimensional texture was just two 3 dimensional textures. Adding a second color channel just doubles the size of your data (and adding two tripples, and adding three quadruples), so having a 3d volume texture that has two color channels is the same as having a 4d volume texture with a single color channel. In both cases, you are just interpolating between two 3d texture samples.

## Multiple Texture Samples

Multiple texture samples is the last way to control curve degree that we are going to talk about.

Taking extra texture samples is a lot like adding color channels.

If you have a 1d RGB texture, you get a result of 3 lerps – R,G,B – which you can use to calculate a cubic curve point (order 3). If you take a second sample, you get R0,G0,B0,R1,G1,B1 which is a result of 6 lerps, which gives you a point on a sextic curve (order 6).

If you have a 2d RGBA texture, you get the result of 4 quadratic interpolations – R,G,B,A – which gives you an order 5 curve point. Taking another texture read gives you 8 quadratic interpolation results, which you can put together to make an order 9 curve point. Taking a third texture read would get you up to order 13.

Just like adding color channels, taking extra texture samples requires you to combine the multiple results in your shader, which increases computational cost.

Besides that, you are also doing more texture reads, which can be another source of performance loss. The textures are small (up to 2x2x2x2) so are texture cache friendly, but if you have multiple textures, it could start to add up I’m sure.

IMO this option should be avoided in favor of the others, when possible.

# Before Going Into Surfaces

Before we start on surfaces, I want to mention a few things.

Even though we’ve been talking about Bezier curves specifically, a previous post explained how to convert any polynomial from power basis form into Bernstein basis form (aka you can turn any polynomial into a Bezier curve that is exactly equivelant). So, this generalizes to polynomials, and even rational polynomials if you do division in your shader code, but I’ll point you towards that post for more information on that: Evaluating Polynomials with the GPU Texture Sampler.

You can also extend the above for piecewise curves easily enough. You just set up a different curve (or surface or volume, as we describe below) for different ranges of your parameter space values. From time 0 to 1, you may use one texture, and from time 1 to 2, you may use another. Better yet, you would store both curves in a single texture, and just make the texture be a little larger, instead of having two separate textures.

Also, many other types of curves – B-splines, nurbs – can be broken down exactly into piecewise Bezier curves (rational, if the source curve is rational). Check these links for more info:
Algorithms for B-Spline Curves
Wikipedia: De Boor’s Algorithm.

# Surfaces

Finally onto surfaces!

I’m going to show how to extend the curve calculation technique to calculating points on Bezier rectangles. A Bezier rectangle is a rectangular surface which has one or more bezier curves across the X axis and one or more bezier curves across the Y axis. The degree of the curve on each axis doesn’t need to match so it could be quadratic on one axis and cubic on the other as an example.

To actually evaluate a point on the surface at location (u,v), you evaluate a point on each x axis curve for time u, and then you use those resulting values as control points in another curve that you evaluate at time v.

Just like linear interpolation, it doesn’t matter which axis you evaluate first for a Bezier rectangle surface so you could switch the order of the axis evaluation if you want to.

The image above shows a bicubic surface, the blue lines show the x axis cubic bezier curves, while the yellow lines show the y axis cubic bezier curves. Those lines are called “isolines” or “isocurves”. The 16 control points are shown in magenta.

Another name for a Bezier rectangle is a tensor product surface. This is a more generalized term as it isn’t limited to Bezier curves.

Note: there is another type of Bezier surface called a Bezier Triangle but I haven’t worked much with them so can’t say if any of these techniques work with them or not. It would be interesting to explore how these techniques apply to Bezier triangles, if at all.

Hopefully it should come as no surprise that a 2d texture using regular bilinear interpolation is in fact a Bezier rectangle which is linear on the x axis and linear on the y axis. It has a degree of (1,1) and is stored in a 2d texture (2×2 pixels), where the four control points are just stored in the four pixels. You just sample the texture at (u,v) to get that point on the surface. Pretty simple stuff.

Order (1,1) Bezier Rectangle:

Something interesting to note is that while the isolines (edges) of the rectangle are linear, the surface itself is curved. In fact, we know that the diagonal of this surface is in fact a quadratic Bezier curve because we calculate curves by sampling along the diagonal! (if the middle corners are different, it’s the same as if they were both replaced with the average of their values).

There are other ways to store this Degree (1,1) surface in a texture besides how i described. You could also have a 1 dimensional texture with two color channels, where you sample it along the u axis, and then interpolate your R and G values, using the v axis value. This would come at the cost of doing a lerp in the shader code, instead of having the texture sampler hardware do it for you.

Now that the simplest case is out of the way, how about the next simplest? What if we want a surface where we linearly interpolate between two quadratic curves? That is, what if we want to make a degree (2,1) Bezier rectangle?

Order (2,1) Bezier Rectangle:

Well if you think about it geometrically, we can store a quadratic curve in a 2d texture (2×2) with a single color channel. To linearly interpolate between two of those, we need two of those to interpolate between. So, we need a 3d texture, since that is just an interpolation between two 2d textures.

When we sample that texture, we use the coordinates (u,u,v). That will make it quadratic in u, but linear in v.

Stepping up the complexity again, what if we wanted to make a biquadratic surface – aka degree (2,2)?

Order (2,2) Bezier Rectangle:

Well, to make a quadratic curve we need 3 control points, so for a biquadratic surface we need 3 quadratic curves to quadratically interpolate between.

One way to do this would be with a 4d texture, sampling along (u,u,v,v) to make it quadratic in both u and v.

But, because 4d textures are kind of exotic and may not be supported, we can achieve this by instead having a 3d texture with two color channels: R,G.

When we sample that texture, we sample at (u,u,v) to get two values: R,G. Next we linearly interpolate from R to G using v. This makes us quadratic in both u and v.

There are other ways to encode this surface as well, but i’ll leave that to you to think about if you want to (:

Lastly, what if we wanted a bicubic surface? A cubic curve has 4 control points, so we need 4 cubic curves to cubically interpolate between to make our final surface.

Order (3,3) Bezier Rectangle:

Thinking back to the first section, a 3d texture can evaluate a cubic curve. Since we need four cubic curves, let’s just use all four color channels RGBA. We would sample our texture at (u,u,u) to get four cubic curves in RGBA and then would use the cubic Bezier formula to combine those four values using v into our final result.

# Surfaces Generalized

Generalizing surface calculations a bit, there are basically two steps.

First is you need to figure out what your requirements for the x axis is as far as texture storage for the desired degree you want. From there, you figure out what degree you want on your y axis, and that degree is what you multiply the x axis texture storage requirements for.

It can be a little bit like tetris trying to figure out how to fit various degree surfaces into various texture sizes and layouts, but it gets easier with a little practice.

It’s also important to remember that the x axis being the first axis is by convention only. It could easily be the y axis that defines the texture storage requirements, and is multiplied by the degree of the x axis.

# Volumes

Volumes aren’t a whole lot more complex than surfaces, but they are a lot hungrier for texture space and linear interpolations!

Extending the generalization of surfaces, you once again figure out requirements for the x axis, multiply those by the degree of the y axis, and then multiply that result by the degree of the z axis.

The simplest case for volumes is the trilinear case, aka the Degree (1,1,1) Bezier rectangle.

Order (1,1,1) Bezier Cube:

It’s a bit difficult to understand what’s going on in that picture by seeing the data as just fog density, so the demos let you specify a surface threshold such that if the fog is denser than that amount, it shows it as a surface. Here is the same trilinear Bezier volume with a surface threshold.

Order (1,1,1) Bezier Cube:

You just store your 8 values in the 8 corners of the 2x2x2 texture cube, and sample at (u,v,w) to get your trilinear result.

The next simplest case is that you want to quadratically interpolate between two linear surfaces – a Degree (1,1,2) Bezier rectangle.

Order (1,1,2) Bezier Cube:

To do this, you need 3 bilinear surfaces to interpolate between.

One way to do this would be to have a 2d Texture with R,G,B color channels. Sample the texture at (u,v), then quadratically interpolate R,G,B using w.

Another way to do this would be to have a 3d texture with R and G channels. When sampling, you sample the 3d texture at (u,v,w) to get your R and G results. You then linearly interpolate from R to G by w to get the final value.

Yet another way to do this would be to use a 4d texture if you have support for it, and sample along (u,v,w,w) to get your curve point using only hardware interpolation.

The next simplest volume type is a linear interpolation between two biquadratic surfaces – a Degree (2,2,1) Bezier rectangle.

Order (2,2,1) Bezier Cube:

From the surfaces section, we saw we could store a biquadratic surface in a 3d texture using two color channels R,G. After sampling at (u,u,v) you interpolate from R to G by v.

To make a volume that linearly interpolates between two biquadratic surfaces, we need two biquadratic surfaces, so need to double the storage we had before.

We can use a 3d texture with 4 color channels to make this happen by storing the first biquadratic in R,G and the second in B,A, sampling this texture at (u,u,v). Next, we interpolate between R and G by v, and also interpolate between B and A by v. Lastly, we linearly interpolate between those two results using w.

The next higher surface would be a triquadratic volume, which is degree (2,2,2). Since you can store a biquadratic surface in a 3d 2x2x2 texture with two color channels, and a triquadratic volume needs 3 of those, we need a 3d texture 6 color channels. Since that doesn’t exist, we could do something like store 2 of the quadratic surfaces in a 2x2x2 RGBA texture, and the other quadratic surface in a 2x2x2 RG texture. We would take two texture samples and combine the 6 results into our final value.

Tricubic is actually pretty simple to conceptualize luckily. We know that we can store a bicubic surface in a 3d 2x2x2 RGBA texture. We also know that we would need 4 of those if we want to make a tricubic volume. So, we could do 4 texture reads (one for each of our bicubic surfaces) and then combine those 4 samples across w to get our final volume value.

# Closing

Hopefully you were able to follow along and see that this stuff is potentially pretty powerful.

Some profiling needs to be done to better understand the performance characteristics of using the texture sampler in this way, versus other methods of curve, surface and volume calculation. I have heard that even when your texture samples are in the texture cache, that it can still take like ~100 cycles to get the information back on a texture read. That means that this is probably not going to be as fast as using shader instructions to calculate the points on the curve. However, if you are compute bound and can offload some work to the texture sampler, or if you are already using a texture to store 1d/2d/3d data (or beyond) that you can aproximate with this technique, that you will have a net win.

One thing I really like about this is that it makes use of non programmable hardware to do useful work. It feels like if you were compute bound, that you could offload some work to the texture sampler if you had some polynomials to evaluate (or surfaces/volumes to sample), and get some perf back.

I also think this could possibly be an interesting way to make concise representations (and evaluation) of non polygonal models. I imagine it would have to be piecewise to make things that look like real world objects, but you do have quite a bit of control with Bezier curves, surfaces and volumes, especially if you use rational ones by doing a divide in your shader.

Here’s a few specifica areas I think this technique could help out with:

• Higher order texture interpolation with fewer samples – You’d have to preprocess textures and would spend more memory on them, but it may be worth while in some situations for higher quality results with a single texture read.
• 2D signed distance field rendering – SDF textures are great for making pseudo vector art. They do break down in some cases and at some magnification levels though. It would be interesting to see if using this technique could improve things either with higher order interpolation, or maybe by encoding (signed) distances differently. Possibly also just useful for describing 2d vector art in a polynomial form?
• 3d signed distance field rendering – Ray marching can make use of signed distance fields to render 3d objects. It can also make use of functions which can only give you inside or outside status based on a point. It would be interesting to explore encoding and decoding both of these types of functions within textures using this technique, to sample shapes during ray marching.

If you are interested in the above, or curious to learn more, here are some good links!

If you have any questions, corrections, feedback, ideas for extensions, etc please let me know! You can leave a comment below, or contact me on twitter at @Atrix256.

## Feedback / Ideas

@anders_breakin had some ideas that could possibly pan out:

1. The derivative of a Bezier curve is another Bezier curve (Derivatives of a Bézier Curve). You could encode the derivative curve(s) in a texture and use that to get the normal instead of using the central differences method. That might give higher quality normals, but should also decrease the number of texture reads needed to get the normal.
2. If you want more accuracy, you may subdivide the curve into more numerous piecewise curves. The texture interpolator only has 8 bits of decimal precision (X.8 fixed point) when interpolating, but if you give it less of the curve/surface/volume to interpolate over at a time, it seems like that would result in more effective precision.

@Vector_GL suggested reading the values in the vertex shader and using the results in the pixel shader. I think something like this could work where you read the control points in the VS, and pass them to the PS, which would then be able to ray march the tensor product surface by evaluating it without texture reads. So long as you have fewer VS instances than PS instances (the triangles are not subpixel!) that this could be an interesting thing to try. It doesn’t take advantage of the texture interpolator, but maybe there would be a way to combine the techniques. If not, this still seems very pragmatic.

I was thinking maybe this could be done via “rasterization” by drawing a bunch of unit cubes and having the PS do the ray marching. With some careful planning, you could probably use Z-testing on this too, to quickly cull hidden pixels without having to ray march them.

# Failed Experiment: The GPU Texture Sampler is Turing Complete But That Fact is Pretty Useless

While it’s true that the GPU texture sampler can evaluate digital logic circuits, it turns out there’s a much better and simpler way to evaluate logic with textures. That better and simpler way isn’t even that useful unfortunately!

This post will show the path I took from the initially intriguing possibilities to the more mundane final answer. You may be able to see mistakes in my reasoning along the way, or be able to get to the punch line sooner (:

This was meant to be an extension to a paper I wrote talking about how you can evaluate Bezier curves by storing only the control points in a texture and then sampling along the texture diagonal:
GPU Texture Sampler Bezier Curve Evaluation

The ideas from this post started with a tweet from @marcosalvi:

Because the last post showed how to evaluate arbitrary polynomials using the texture sampler, and digital circuits can be described as as polynomials in Algebraic Normal Form (ANF), that means we can use the texture sampler to evaluate digital logic circuits. Let’s check it out!

First up, we need to be able to convert logic into ANF. Oddly enough, I already have a post about how to do that, with working C++ source code, so go check it out: Turning a Truth Table Into A digital Circuit (ANF).

As an example, let’s work with a circuit that takes 3 input bits, and adds them together to make a 2 bit result. We’ll need one ANF expression per output bit. $O_0$ will be the 1’s place output bit (least significant bit), and $O_1$ will be the 2’s place output bit (most significant bit). Our 3 input bits will be u,v,w.

$O_0 = u \oplus v \oplus w$
$O_1 = uv \oplus vw \oplus uw$

If we want to use our polynomial evaluation technique, we need equations that are univariate (one variable) instead of multivariate (multiple variables). We can try just using a single variable x in place of u,v and w. Remember that in ANF, you work with polynomials mod 2 (aka $\mathbb{Z}_2$), and that XOR ($\oplus$) is addition while AND is multiplication. This gives the formulas below:

$O_0 = x + x + x = 3x$
$O_1 = xx + xx + xx = 3x^2$

The next thing we need to use the technique is to know the Bezier control points that make a Bezier curve that is equivalent to this polynomial. Since we have 3 input variables into our digital circuit, if they were all 3 multiplied together (ANDed together), we would have a cubic equation, so we need to convert those polynomials to cubic Bernstein basis polynomials. We can use the technique from the last post to get the control points of that equivalent curve.

$O_0 \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 1 & 2 & 3 \\ 3 & 3 / 3 = 1 & 1 & 1 & \\ 0 & 0 / 3 = 0 & 0 & & \\ 0 & 0 / 1 = 0 & & & \\ \end{array}$

$O_1 \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 0 & 1 & 3 \\ 0 & 0 / 3 = 0 & 1 & 2 & \\ 3 & 3 / 3 = 1 & 1 & & \\ 0 & 0 / 1 = 0 & & & \\ \end{array}$

Now that we have our control points, we can set up our textures to evaluate our two cubic Bezier curves (one for $O_0$, one for $O_1$). We’ll need to use 3d textures and we’ll need to set up the control points like the below, so that when we sample along the diagonal of the texture we get the points on our curves.

The picture below shows where each control point goes, to set up a cubic Bezier texture. The blue dot is the origin (0,0,0) and the red dot is the extreme value of the cube (1,1,1). The grey line represents the diagonal that we sample along.

Coincidentally, our control points for the $O_0$ curve are actually 0,1,2,3 so that cube above is what our 3d texture needs to look like for the $O_0$ curve.

Below is what the $O_1$ curve’s 3d texture looks like. Note that in reality, we could store these both in a single 3d texture, just use say the red color channel for $O_0$ and the green color channel for $O_1$.

Now that we have our textures set up let’s try it out. Let’s make a table where we have our three input bits, and we use those as texture coordinates in our 3d textures (the texture cubes above) to see what values we get. (Quick note – things are slightly simplified here vs reality. The pixel’s actual value is at a half pixel offset from the texture coordinates, so we’d be sampling between (0.5,0.5,0.5) and (1.5,1.5,1.5) instead of from (0,0,0) to (1,1,1), but we can ignore that detail for now to make this stuff clearer.)

$\begin{array}{|c|c|c|c|c|} \hline u & v & w & O_1 & O_0 \\ \hline 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 2 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 & 2 \\ 1 & 1 & 0 & 1 & 2 \\ 1 & 1 & 1 & 3 & 3 \\ \hline \end{array}$

Now, let’s modulus the result by 2 since ANF expects to work mod 2 ($\mathbb{Z}_2$ to be more precise), and put the decimal value of the result next to it.

$\begin{array}{|c|c|c|c|c|c|} \hline u & v & w & O_1 \% 2 & O_0 \% 2 & Result \\ \hline 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 & 0 & 2 \\ 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 2 \\ 1 & 1 & 0 & 1 & 0 & 2 \\ 1 & 1 & 1 & 1 & 1 & 3 \\ \hline \end{array}$

It worked! The result value is the count of the input bits set to 1.

Unfortunately we have a problem. When we converted the multivariate equation into a univariate equation, we just replaced u,v,w with x. This is only valid if the function is symmetric – if u,v,w can be interchanged with each other and not affect the result of the function. This bit adding digital circuit we made happened to have that property, but most digital circuits do not have that property – most of the time, not all input bits are treated equal. If we made a circuit that added two 2-bit numbers and have a 3-bit result for instance, the high bits of the input numbers have a very different meaning than the low bits and this technique falls apart. (Quick note – we are actually doing the reverse of the polynomial blossoming thing i mentioned in the last post. Blossoming is the act of taking a univariate function and breaking it into a multivariate function that is linear in each variable. The term is called symmetric multiaffine equation if you want to find out more about that.)

This turns out not to be a deal breaker though because it turns out we didn’t have to do a lot of the work that we did to get these volume textures. It turns out we don’t need to calculate the Bezier curve control points, and we don’t even need to make an ANF expression of the digital circuit we want to evaluate.

Let’s recap what we are trying to do. We have 3 input values which are either 0 or 1, we have a 3d texture which is 2x2x2, and we are ultimately using those 3 input values as texture coordinates (u,v,w) to do a lookup into a texture to get a single bit value out.

Here’s a big aha moment. We are just making a binary 3d lookup table, so can take our truth table of whatever it is we are trying to do, and then directly make the final 3d textures described above.

Not only does it work for the example we gave, with a lot less effort and math, it also works for the broken case I mentioned of the function not being symmetric, and not all input bits being equal.

Something else to note is that because we are only sampling at 0 or 1, we don’t need linear texture interpolation at all and can use nearest neighbor (point) sampling on our textures for increased performance. Also because the texture data is just a binary 0 or 1, we could use 1 bit textures.

The second aha moment comes up when you realize that all we are doing is taking some number of binary input bits, using those as texture coordinates, and then looking up a value in a texture.

You can actually use a 1D texture for this!

You take your input bits and form an integer, then look up the value at that pixel location. You build your texture lookup table using this same mapping.

So… it turns out this technique led to a dead end. It was just extra complexity to do nothing special.

Before it all fell apart, I was also thinking this might be a good avenue for doing homomorphic encryption on the GPU, but I don’t believe this aids that at all. (Super Simple Symmetric Leveled Homomorphic Encryption Implementation)

# But Wait – Analog Valued Logic?

One thought I had while all this was unraveling was that maybe this was still useful, because if you put an analog value in (not a 0 or 1, but say 0.3), that maybe this could be used as a sort of “Fuzzy Logic” type logic evaluation.

Unfortunately, it looks like that doesn’t work either!

You can see how it breaks down and some more info here:
Computer Science Stack Exchange: Using analog values with Algebraic Normal Form?

# Oh Well

Sometimes when exploring new frontiers (even if they are just new to us) we hit dead ends, our ideas fail etc. It happens. It’s part of the learning process, and also is useful sometimes to know what doesn’t work and why, instead of just always knowing what DOES work.

Anyways… posts on using the texture sampler for calculating points on data surfaces and data volumes are coming next (:

To give a brief taste of how that is going to play out:

• Doing a single texture read of a 3d RGBA texture can give you a triquadratic interpolated value.
• Alternately, doing a single texture read of a 3d RGBA texture can give you a bicubic interpolated value.

# Evaluating Polynomials with the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

I’ve been thinking about the items in the “future work” section and found some interesting things regarding polynomials, logic gates, surfaces and volumes. This is the first post, which deals with evaluating polynomials.

# Evaluating Polynomials

One of the main points of my paper was that N-linear interpolation (linear, bilinear, trilinear, etc) can be used to evaluate the De Casteljau algorithm since both things are just linear interpolations of linear interpolations. (Details on bilinear interpolation here: Bilinear Filtering & Bilinear Interpolation).

This meant that it was also able to calculate Bernstein Polynomials (aka the algebraic form of Bezier curves), since Bernstein polynomials are equivalent to the De Casteljau algorithm.

I started looking around to see what would happen if you messed around with the De Casteljau algorithm a bit, like interpolate at one level by
$t^2$ or $t*0.5+0.5$ or by a constant or by another variable completely. My hope was that I’d be able to make the technique more generic and open it up to a larger family of equations, so people weren’t limited to just Bernstein polynomials.

That opened up a pretty deep rabbit hole on polynomial blossoming and something called Symmetric Multiaffine Functions. There are some great links in the answer here:
Math Stack Exchange: Modifying and Generalizing the De Casteljau Algorithm

In the end, it turned out to be pretty simple though. It turns out that any polynomial can be converted back and forth from “Power Basis” (which looks like $Ax^2+Bx+C$) to “Bernstein Basis” (which looks like $A(1-t)^2+B(1-t)t+Ct^2$) so long as they are the same degree.

This isn’t the result I was expecting but it is a nice result because it’s simple. I think there is more to be explored by sampling off the diagonal, and using different t values at different stages of interpolation, but this result is worth sharing.

By the way, you could also use curve fitting to try and approximate a higher degree function with a lower degree one, but for this post, I’m only going to be talking about exact conversion from Bernstein polynomials to Power polynomials.

Since we can convert power basis polynomials to Bernstein polynomials, and the technique already works for Bernstein polynomials, that means that if we have some random polynomial, say $y=2x^3+4x+2$, that we can make this technique work for that too. The technique got a little closer to arbitrary equation evaluation. Neat!

# Converting Power Basis to Bernstein Basis

I found the details of the conversion process at Polynomial Evaluation and Basis Conversion which was linked to by Math Stack Exchange: Convert polynomial curve to Bezier Curve control points.

This is best explained working through examples, so let’s start by converting a quadratic polynomial from power basis to Bernstein basis.

$y=2x^2+8x+3$

The first thing we do is write the coefficients vertically, starting with the $x^0$ coefficient, then the $x^1$ coefficient and continuing on to the highest value $x^n$:

$\begin{array}{c} 3 \\ 8 \\ 2 \\ \end{array}$

Next, we need to divide by the Binomial Coefficients (aka the row of Pascal’s Triangle which has the same number of items as we have coefficients). In this case we need to divide by: 1,2,1.

$\begin{array}{c|c} 3 & 3 / 1 = 3 \\ 8 & 8 / 2 = 4 \\ 2 & 2 / 1 = 2 \\ \end{array}$

Now we generate a difference table backwards. it’s hard to explain what that is in words, but if you notice, each value is the sum of the value to the left of it, and the one below that.

$\begin{array}{c|c|c|c} 3 & 3 / 1 = 3 & 7 & 13 \\ 8 & 8 / 2 = 4 & 6 & \\ 2 & 2 / 1 = 2 & & \\ \end{array}$

We are all done. The control points for the Bezier curve are on the top row (ignoring the left most column). They are 3,7,13 which makes it so we have the following two equations being equal. The first is in power basis, the second is in Bernstein basis.

$y=2x^2+8x+3$
$y=3(1-x)^2+14(1-x)x+13x^2$

Note: don’t forget that Bezier curves multiply the control points by the appropriate row in Pascal’s triangle. That’s where the 14 comes from in the middle term of the Bernstein polynomial. We are multiplying the control points 3,7,13 by the row in Pascal’s triangle 1,2,1 to get the final coefficients of 3,14,13.

Let’s have Wolfram Alpha help us verify that they are equal.

Yep, they are equal! If you notice the legend of the graph, wolfram actually converted the Bernstein form back to power basis, and you can see that they are exactly equivalent.

You can also write the Bernstein form like the below, which i prefer, using $t$ instead of $x$ and also setting $s=1-t$.

$y=3s^2+14st+13t^2$

Cubic Function

A cubic function is not that much harder than a quadratic function. After this, you should see the pattern and be able to convert any degree easily.

$y=5x^3+9x-4$

Again, the first thing we do is write the coefficients vertically, starting with the constant term. Note that we don’t have an $x^2$ term, so it’s coefficient is 0.

$\begin{array}{c} -4 \\ 9 \\ 0 \\ 5 \\ \end{array}$

We next divide by the Pascal’s triangle row 1,3,3,1.

$\begin{array}{c|c} -4 & -4 / 1 = -4 \\ 9 & 9 / 3 = 3 \\ 0 & 0 / 3 = 0 \\ 5 & 5 / 1 = 5 \\ \end{array}$

Now, make the difference table going backwards again:

$\begin{array}{c|c|c|c|c} -4 & -4 / 1 = -4 & -1 & 2 & 10 \\ 9 & 9 / 3 = 3 & 3 & 8 & \\ 0 & 0 / 3 = 0 & 5 & & \\ 5 & 5 / 1 = 5 & & & \\ \end{array}$

Our Bezier control points are along the top: -4,-1,2,10. Keeping in mind that the coefficients for a cubic bezier curve are multiplied by 1,3,3,1 we can make the Bernstein form and put it next to our original formula:

$y=5x^3+9x-4$
$y=-4(1-x)^3-3(1-x)^2x+6(1-x)x^2+10x^3$

Let’s check in wolfram alpha again:
Wolfram Alpha: graph y=5x^3+9x-4, y=-4(1-x)^3-3x(1-x)^2+6x^2(1-x)+10x^3, from 0 to 1

And here it is in the cleaner form:

$y=-4s^3-3s^2t+6st^2+10t^3$

## Some Notes On Calculating Polynomials with the Texture Sampler

You may notice that in the comparison graphs i only plotted the graphs from 0 to 1 on the x axis (aka the t axis). The equations are actually equivalent outside of that range as well, but the technique from my paper only works from the 0 to 1 range because it relies on built in hardware pixel interpolation. This may sound like a big limitation, but if you know the minimum and maximum value of x that you want to plug into your equation at runtime, you can convert your x into a percent between those values, get the resulting polynomial, convert it to Bernstein form, set up the texture, and then at runtime convert your input parameter into that percent when you do the lookup. In other words, you squeeze the parts of the function you care about into the 0 to 1 range.

Another issue you will probably hit is that standard RGBA8 textures have only 8 bits per channel and can only store values between 0 and 1. Since the texture is supposed to be storing your control points, that is bad news.

One way to get around this is to find the largest coefficient value and divide the others by this value. This will put the coefficients into the 0 to 1 range, which will be able to be stored in your texture. After sampling the texture, you multiply the result by that scaling value to get the correct answer.

Scaling won’t help having both negative and positive coefficients though. To handle negative coefficients, you could map the 0-1 space to be from -1 to 1, similar to how we often do it with normal maps and other signed data stored in textures. After doing the lookup you’d have to unmap it too of course.

You could also solve negative values and scaling problems by squishing the y axis into the 0 to 1 space by subtracting the minimum and dividing by the maximum minus the minimum, similarly to how we squished the x range into 0 to 1.

If you instead move to an RGBAF32 texture, you’ll have a full 32 bit float per color channel and won’t have problems with either large values or negative values. You will still have to deal with x only going from 0 to 1 though.

I also want to mention that the hardware texture interpolation works in a X.8 fixed point format. There are more details in my paper, but that means that you’ll get some jagged looking artifacts on your curve instead of a smoothly varying value. If that is a problem for you in practice, my paper talks about a few ways to mitigate that issue.

Before moving on, I wanted to mention that it’s easy to support rational polynomials using this method as well. A rational polynomial is when you divide one polynomial by another polynomial, and relates to rational Bezier curves, where you divide one curve by another curve (aka you give weights to control points). Rational curves are more powerful and in fact you can perfectly represent sine and cosine with a quadratic rational polynomial. More info on that in my paper.

To calculate rational polynomials, you just encode the numerator polynomial in one color channel, and the denominator polynomial in another color channel. After you sample the texture and get the result of your calculation, you divide the numerator value by the denominator value. It costs one division in your shader code, but that’s pretty cheap for the power it gives you!

Regarding the texture size requirements to store a polynomial of a specific degree…

Every dimension of the texture, and every color channel in that texture, adds a degree.

However, to get the benefit of the degree increase from the color channel, you need to do a little more math in the shader – check my paper for more details!

So, if you wanted to store a quadratic polynomial in a texture, you would need either a 2d texture with 1 color channel, or you could do it with a 1d texture that had 2 color channels.

If you wanted to store a cubic polynomial in a texture, you could use a 3d texture with 1 color channel, or a 2d texture with two color channels (there would be some waste here) or a 1d texture with three color channels.

For a polynomial that had a maximum degree term of 6, you could use a 3d volume texture that had 3 color channels: RGB.

If you need to evaluate a very high degree polynomial, you can actually take multiple texture samples and combine them.

For instance, if you had a 2d texture with a single color channel, you could do a single texture read to get a quadratic.

If you linearly interpolated between those two quadratics, you would end up with a cubic.

That isn’t a very high degree curve but is easier to grasp how they combine.

Taking this up to RGBA 3d volume textures, a single texture read will get you a curve of degree 6. If you do another read, it will take it to degree 7. Another read gets you to 8, another to 9, etc.

With support for 4d textures, an RGBA texture read would give you a degree 7 curve. Another read would boost it to 8, another to 9, another to 10, etc.

Regarding the specific sizes of the textures, in all cases the texture size is “2” on each dimension because we are always just linearly interpolating within a hyper cube of pixel values. You can increase the size of the texture for piecewise curves, check out the paper for more details on that and other options.

## Closing

Hopefully you found this useful or interesting!

There may not have been much new information in here for the more math inclined people, but I still think it’s worth while to explicitly show how the technique works for both Bernstein polynomials as well as the more common power basis polynomials.

I still think it would be interesting to look at what happens when you sample off of the diagonal, and also what happens if you use different values at different stages of the interpolation. As an example, instead of just looking up a texture at (t,t) for the (u,v) value to get a quadratic curve point, what if we look up by (t,t^2)? At first blush, it seems like by doing that we may be able to boost a curve to a higher degree, maybe at the cost of some reduced flexibility for the specific equations we can evaluate?

Next up I’ll be writing up some more extensions to the paper involving logic gates, surfaces, and volumes.

Have any feedback, questions or interesting ideas? Let me know!