Generating Blue Noise Textures With Void And Cluster

Blue noise comes in two main flavors: sample points and masks. I’ll be calling blue noise masks “blue noise textures” in this post.

The code that accompanies this post, implementing the void and cluster algorithm, and running tests on it, can be found at:
https://github.com/Atrix256/VoidAndCluster

Update: Jonathan Dupuy has a GPU implementation that runs much more quickly than this CPU implementation. https://github.com/jdupuy/BlueNoiseDitherMaskTiles

Blue Noise Sample Points

Blue noise sample points can be used when you are taking samples for integration, like when sampling shadow maps, taking AO samples, or when doing path tracing. Blue noise doesn’t converge as fast as other sampling patterns, but it is really good at hiding error by evenly distributing it, as if error diffusion had been done.

Here are some blue noise samples and their frequency magnitudes:

Here’s an example result of blue noise vs white noise sampling, from https://www.arnoldrenderer.com/research/dither_abstract.pdf

If you can afford enough samples to converge to the right result, you should do that, and you should use a sequence that converges quickly, such as Martin Robert’s R2 sequence (a generalization of the golden ratio, which is competitive with sobol!): http://extremelearning.com.au/unreasonable-effectiveness-of-quasirandom-sequences/

If you can’t afford enough samples to converge, that’s when you should reach for blue noise, to minimize the appearance of the error that will be in the resulting image.

To get blue noise samples, my go to method is “Mitchell’s Best Candidate Algorithm”:
https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/

That algorithm is easy to understand and quick to implement, and while it makes decent results, it isn’t the best algorithm. The best blue noise sample point algorithms all seem to be based on capacity constrained Voronoi diagrams (CCVD). One benefit of Mitchell’s best candidate algorithm though is that the sample points it makes are progressive. That means if you generate 1024 sample points, using the first N of them for any N will still be blue noise. With CCVD algorithms, the sample points aren’t blue unless you use them all. This notion of progressive samples is something that will come up again in this post later on.

Blue Noise Textures

The second kind of blue noise is blue noise masks or textures.

One thing blue noise textures are good for is dithering data: adding random noise before quantization. Quantization lets you store data in fewer bits, and dithering before quantization gives a result that makes it harder (sometimes virtually impossible) to tell that it is even using fewer bits.

Here is a blue noise texture and it’s DFT

This is an example of white vs blue noise used for dithering, from http://johanneskopf.de/publications/blue_noise/

Using blue noise gives much better results than white noise, for the same amount of error. It does this by more evenly distributing the error, which gives results as if there was error diffusion done, without actually having to do the error diffusion step. The results are especially good when you animate the noise well. For more information on that, check out these posts of mine.
Part 1: https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/
Part 2: https://blog.demofox.org/2017/11/03/animating-noise-for-integration-over-time-2-uniform-over-time/

If that’s an area that you are interested in, you should give this a read too, making sure to check out the triangular distributed noise portion!

Click to access banding_in_games.pdf

Interestingly though, blue noise textures can also be used for sampling as this paper talks about: https://www.arnoldrenderer.com/research/dither_abstract.pdf

Here is another link between the two types of blue noise, where it shows how to turn blue noise textures into blue noise sample points of the desired density (like for making sample density be based on importance / the PDF): http://drivenbynostalgia.com/#frs

For blue noise textures, my go to has been to get them from this website where the author used the void and cluster algorithm to generate blue noise textures: http://momentsingraphics.de/?p=127

It’s been on my todo list for a while to understand and implement the void and cluster algorithm myself so that if that website ever goes down, I wouldn’t lose my only source of “the good stuff”. This post is the result of me finally doing that!

It’s possible to make blue noise (and other colors of noise too) by repeatedly filtering white noise: https://blog.demofox.org/2017/10/25/transmuting-white-noise-to-blue-red-green-purple/

Doing it that way has some problems though depending on what it is you want to use the blue noise for: https://blog.demofox.org/2018/08/12/not-all-blue-noise-is-created-equal/

Using Both Types of Blue Noise Together

I tend to use blue noise a lot in my rendering techniques, because it’s a nice way to get the look of higher sample counts (the appearance of low error), without spending the cost of higher sample counts.

It’s fairly common that I will use both types of blue noise together even!

For example, if I’m taking multiple PCF samples of a shadow map, I’ll make some blue noise sampling points to sample the shadow map per pixel. I’ll also use a 64×64 blue noise texture tiled across the screen as a per pixel random number for rotating those sample points per pixel.

As a bonus, i’ll use the golden ratio (conjugate) to make the noise be low discrepancy over time, as well as being blue over space by doing this after reading the value from the blue noise texture:

blueNoiseValue = fract(blueNoiseValue + 0.61803398875f * (frameNumber % 64));

The end result is really nice, and looks as though many more samples were taken than actually were.

(Beware: doing random sampling in cache friendly ways can be a challenge though)

Void And Cluster

The void and cluster algorithm was introduced in 1993 by Robert Ulichney and was previously under patent, but it seems to have expired into the public domain now.

Here is the paper:

Click to access 1993-void-cluster.pdf

The algorithm has four steps:

  1. Generate Initial Binary Pattern – Make blue noise distributed sample points.
  2. Phase 1 – Make those points progressive.
  3. Phase 2 – Make more progressive sample points, until half of the pixels are sample points.
  4. Phase 3 – Make more progressive sample points, until all pixels are sample points.

Since the points are progressive, it means they have some ordering to them. This is called their rank. The final blue noise texture image shades pixels based on their rank: lower numbers are darker, while higher numbers are brighter. The end goal is that you should have (as much as possible) an even number of each pixel value; you want a flat histogram.

Something interesting is that Phase 1 turns non progressive blue noise sample points into progressive blue noise sample points. I don’t know of any other algorithms to do this, and it might be common knowledge/practice, but it seems like this would be useful for taking CCVD blue noise sample points which are not progressive and making them progressive.

Before we talk about the steps, let’s talk about how to find voids and clusters.

Finding Voids and Clusters

Each step of this algorithm needs to identify the largest void and/or the tightest cluster in the existing sample points.

The tightest cluster is defined as the sample point that has the most sample points around it.

The largest void is defined as the empty pixel location that has the most empty pixel locations around it.

How the paper finds voids and clusters is by calculating a type of “energy” score for each pixel, where the energy goes up as it has more 1’s around it, and it goes up as they are closer to it.

Let’s say we are calculating the total energy for a pixel P.

To do that, we’ll loop through every pixel and calculate how much energy they contribute to pixel P. Add these all up and that is the energy of the pixel P.

If we are looking at how much energy a specific pixel S contributes to pixel P, we use this formula:

Energy= e^(-distanceSquared/(2*sigma*sigma))

The void and cluster paper uses a “sigma” of 1.5 which means the divisor 4.5. The “free blue noise textures” implementation uses a sigma of 1.9 and i have to agree that I think that sigma gives better results. A sigma of 1.9 makes the divisor be 7.22.

Distance squared is the squared distance between S and P, but using a “torroidal” wrap around distance which makes the resulting texture tileable as a side effect of satisfying “frequency space needs” of the pattern needing to be tileable. (I wrote a post about how to calculate torroidal distance if you are curious! https://blog.demofox.org/2017/10/01/calculating-the-distance-between-points-in-wrap-around-toroidal-space/)

Now that you know how to calculate an energy for each pixel, the hard part is done!

The pixel that has the highest total energy is the tightest cluster.

The pixel that has the lowest total energy is the largest void.

Generate Initial Binary Pattern

To start out, you need to make a binary pattern that is the same size as your output image, and has no more than half of the values being ones. In the code that goes with this post, I set 1/10th of the values to ones.

Then, in a loop you:

1) Find the tightest cluster and remove the one.
2) Find the largest void and put the one there.

You repeat this until the tightest cluster found in step 1 is the same largest void found in step 2.

After this, the initial binary pattern will be blue noise distributed sample points but they aren’t progressive because they have no sense of ordering.

This current binary pattern is called the “prototype binary pattern” in the paper.

Phase 1 – Make The Points Progressive

Phase 1 makes the initial binary pattern progressive, by giving a rank (an ordering) to each point.

To give them ranks you do this in a loop:

1) Find the tightest cluster and remove it.
2) The rank for that pixel you just removed is the number of ones left in the binary pattern.

Continue this until you run out of points.

Phase 2 – Make Progressive Points Until Half Are Ones

Phase 2 resets back to the prototype binary pattern and then repeatedly adds new sample points to it, until half of the binary pattern is ones.

It does this in a loop:

1) Find the largest void and insert a one.
2) The rank for the pixel you just added is the number of ones in the binary pattern before you added it.

Continue until half of the binary pattern is ones.

Phase 3 – Make Progressive Points Until All Are Ones

Phase 3 continues adding progressive sample points until the entire binary pattern is made up of ones.

It does this by reversing the meaning of zeros and ones in the energy calculation and does this in a loop:

1) Find the largest CLUSTER OF ZEROS and insert a one.
2) The rank for the pixel you just added is the number of ones in the binary pattern before you added it.

Continue until the entire binary pattern is ones.

You now have a binary pattern that is full of ones, and you should have a rank for every pixel. The last step is to turn this into a blue noise texture.

Finalizing The Texture

To make a blue noise texture, you need to turn ranks into pixel values.

If your output texture is 8 bits, that means you need to turn your ranks that are 0 to N, into values that are 0 to 255. You want to do this in a way where as much as possible, you have an even number of every value from 0 to 255.

I do this by multiplying the rank by 256 and dividing by the number of pixels (which is also the number of ranks there are).

Optimization Attempts

I tried a variety of optimizations on this algorithm. Some worked, some didn’t.

What I ultimately came to realize is that this algorithm is not fast, and that when you are using it, it’s because you want high quality results. Because of this, the optimizations really ought to be limited to things that don’t hurt the quality of the output.

For instance, if the algorithm takes 10 minutes to run, you aren’t going to shave off 30 seconds to get results that are any less good. You are running a ~10 minute algorithm because you want the best results you can get.

Success: Look Up Table

The best optimization I did by far was to use a look up table for pixel energy.

I made a float array the same size as the output, and whenever I put a 1 in the binary pattern, i also added that pixel’s energy into the energy look up table for all pixels. When removing a 1, i subtracted that pixel’s energy from the look up table for all pixels.

Using a look up table, finding the highest number is all it takes to find the tightest cluster, and finding the lowest number is all it takes to find the largest void.

I further sped this up by using threaded work to update the pixels.

I first used std::thread but didn’t get much of a win because there are a lot of calls to updating the LUT, and all the time moved from doing LUT calculation to thread creates and destroys. Switching to open mp (already built into visual studio’s compiler!) my perf got a lot better because open mp uses thread pools instead of creating/destroying threads each invocation.

Failure: Limiting to 3 Sigma

The equation to calculate pixel energy is actually a Gaussian function. Knowing that Gaussians have 99.7% of their value withing +/- 3 sigma, i decided to try only considering pixels within a 3*sigma radius when calculating the energy for a specific pixel.

While this tremendously sped up execution time, it was a failure for quality because pixels were blind to other pixels that were farther than 6 pixels away.

In the early and late stages of this blue noise generation algorithm, the existing sample points are pretty sparse (farther apart) and you are trying to add new points that are most far apart from existing points. If the farthest away you can see is 6 pixels, but your blue noise texture is 512×512, that means samples are not going to be very well distributed (blue noise distributed) at those points in time.

That had the result of making it so low and high threshold values used on the blue noise texture gave very poor results. (Thanks to Mikkel Gjoel for pointing out these edges to test for blue noise textures!)

Didn’t Try: DFT/IDFT for LUT

The “free blue noise textures” website implementation does a neat trick when it wants to turn a binary sample pattern into a look up table as I described it.

It does a DFT on the binary sample pattern, multiplies each pixel by the frequency space representation of a gaussian function of the desired size, and then does an inverse DFT to get the result as if convolution had been done. This uses the convolution theorem (a multiplication in frequency space is convolution in image space) to presumably get the LUT made faster.

I didn’t try this, but it doesn’t affect quality as far as I can tell (comparing my results to his), and i’m pretty sure it could be a perf win, but not sure by how much.

How I’ve written the code, there are two specific times that i want to make a LUT from scratch, but many, many more times i just want an incremental update to the LUT (at 1024×1024 that’s roughly 1 million LUT updates I would want to do) so i didn’t pursue this avenue.

Probable Success: Mitchell’s Best Candidate

Noticing the result of “initial binary pattern” and “phase 1” was to make a progressive blue noise sample, I thought it might be a good idea to replace those with an algorithm to directly make progressive blue noise samples.

I grabbed my goto algorithm and gave it a shot and it is noticeably faster, and seems to give decent results, but it also gives DIFFERENT results. The difference is probably harmless, but out of abundance of caution for correctness, I wouldn’t consider this a for sure success without more analysis.

Results

The code that goes with this post can be found at: https://github.com/Atrix256/VoidAndCluster

Speed

At 256×256, on my machine with my implementation, it took 400.2 seconds to run the standard void and cluster algorithm. Using Mitchell’s best candidate instead of the initial binary pattern and phase 1, it took 348.2 seconds instead. That’s a 13% savings in time.

At 128×128, standard void and cluster took 24.3 seconds, while the Mitchell’s best candidate version took 21.2 seconds. That’s a 13% time savings again.

At 64×64, it was 1.3 seconds vs 1.1 seconds which is about 13% time savings again.

Quality

From top to bottom is void and cluster, void and cluster using mitchell’s algorithm, and the bottom is a blue noise texture taken from the “free blue noise textures” site.



Here the same textures are, going through a “threshold” test to make sure they make good quality blue noise of every possible density. Same order: void and cluster, void and cluster using mitchell’s algorithm, and the bottom one is taken from the other website. You can definitely see a difference in the middle animation vs the top and bottom, so using the best candidate algorithm changes things, but it seems to be better at low sample counts and maybe not as good at low to medium sample counts. Hard to say for sure…



Red Noise

Lately I’ve been having a fascination with red noise as well.

Where blue noise is random values that have neighbors being very different from each other, red noise is random values that have neighbors that are close to each other.

Red noise is basically a “random walk”, but stateless. If you had a red noise texture, you could look some number of pixels forward and would get the position of the random walk that many steps away from your current position.

The simplest Markov Chain Monte Carlo algorithms use a random walk to do their work, but I’m not sure that a higher quality random walk would be very helpful there.

I’m sure there must be some usage case for red noise but I haven’t found it yet.

In any case, I do believe that the void and cluster algorithm could be adapted to generate red noise, but I’m not sure the specifics of how you’d do that.

If i figure it out, i’ll make a follow up post and link from here. If you figure it out, please share details!

Some more cool stuff regarding noise colors is coming to this blog, so be on the lookout 🙂

Taking a Stroll Between The Pixels

This post relates to a paper I wrote which talks about (ab)using linear texture interpolation to calculate points on Bezier curves. Extensions generalize it to Bezier surfaces and (multivariate) polynomials. All that can be found here: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

The original observation was that if you sample along the diagonal of a 2×2 texture, that as output you get points on a quadratic Bezier curve with the control points of the curve being the values of the pixels like in the image below. When I say you get a quadratic Bezier curve, I mean it literally, and exactly. One way of looking at what’s going on is that the texture interpolation is literally performing the De Casteljau algorithm. (Note: if the “B” values are not equal in the setup below, the 2nd control point will be the average of these two values, which an extension abuses to fit more curves into a smaller number of pixels.)

An item that’s been on my todo list for a while is to look and see what happens when you sample off of the 45 degree diagonal between the pixel values. I was curious about questions like:

  • What if we sampled across a different line?
  • What if we samples across a quadratic curve like by having y=x^2?
  • What if we sampled on a circle or a sine wave?
  • How does the changed sampling patterns work in higher dimensions – like trilinear or quadrilinear interpolation?

After accidentally coming across the answer to the first question, it was time to look into the other ones too!

PS – if wondering “what use can any of this possibly have?” the best answer I have there is data compression for data on the GPU. If you can fit your data with piecewise rational polynomials, the ideas of this technique could be useful for storing that data in a concise way (pixels in a texture) that are also quickly and easily decoded by the GPU. The ideas from this post allows for more curve types when fitting and storing your data, beyond piecewise rational polynomials. It’s also possible to store higher order curves and surfaces into smaller amounts of texture data.

Quick Setup: Bilinear Interpolation Formula

Bilinear interpolation is available on modern GPUs as a way of getting sub-pixel detail. In the olden days, when zooming into a texture, the square pixels just got larger because nearest neighbor filtering was used. In modern times, when looking at the space between pixel values, bilinear interpolation is used to fill in the details better than nearest neighbor does.

You can describe bilinear interpolation as interpolating two values across the x axis and interpolating between the results across the y axis (reversing the order of axes also works). Mathematically, that can look like this:

z = (A(1-x) + Bx)(1-y) + (C(1-x)+Dx)y

Where x and y are values between 0 and 1 describing where the point is between the pixels, and A,B,C,D are the values of the 4 nearest pixels, which form a box around the point we are calculating. A = (0,0), B = (1,0), C = (0,1) and D = (1,1).

With some algebra, you can get that equation into a power series form which is going to be easier to work with in our experiments:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

For some deeper info on bilinear interpolation check out these links:
https://blog.demofox.org/2015/04/30/bilinear-filtering-bilinear-interpolation/
http://reedbeta.com/blog/quadrilateral-interpolation-part-1/
http://reedbeta.com/blog/quadrilateral-interpolation-part-2/
https://computergraphics.stackexchange.com/questions/7539/geometric-interpretation-of-this-bilinear-interpolation-equation/7541

Now that we have our formula, we can begin! 🙂

Sampling Along Other Lines

So, if we sample along the diagonal from A to D, we know that we get a quadratic equation out. What happens if we sample along other lines though?

My guess before I knew the answer to this was that since the 45 degree angle line is quadratic (degree 2), and that horizontal and vertical lines were linear (degree 1), that sampling along other lines must be a fractional degree polynomial between 1 and 2. It turns out that isn’t the answer, but I wonder if there’s a way to interpret the “real answer” as a fractional polynomial?

Anyways, wikipedia clued me in: https://en.wikipedia.org/wiki/Bilinear_interpolation#Nonlinear

The interpolant is linear along lines parallel to either the x or the y direction, equivalently if x or y is set constant. Along any other straight line, the interpolant is quadratic

What that means is that if you walk along a horizontal or vertical line, it’s going to be linear. Any other line will be quadratic.

Let’s try it out.

Remembering that the equation for a linear function is y=mx+b let’s literally replace y with mx+b and see what we get out.

So, we start with the power series bilinear interpolation polynomial:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

Which becomes this after substitution:

z = (A-B-C+D)x(mx+b) + (B-A)x + (C-A)(mx+b) + A

After some expansion and simplification we get this:

z = (Am-Bm-Cm+Dm)x^2+(Ab-Bb-Cb+Db+Cm-Am+B-A)x+Cb-Ab+A

This formula tells us the value we get if we have a bilinear interpolation of values A,B,C,D (aka a bilinear surface defined by those points), and we sample along the x,y line defined by y=mx+b.

It’s a very generalized function that’s hard to reason about much, but one thing is clear: it is a quadratic function! Whatever constant values you choose for A,B,C,D,m and b, you will get a quadratic polynomial (or lower degree, but never higher).

Here’s a shadertoy that shows curves generated by random sub pixel line segments on a random (white noise) RGB texture: https://www.shadertoy.com/view/XstBz7

(note that the rough edges of the curve are due to the fact that interpolation happens in X.8 fixed point format, so has pretty limited precision. Check the paper for more information and ways to address the issue.)

Let’s explore a bit by plugging in some values for m and b and see what happens for different types of lines.

m=0, b=0

Let’s see what happens when m is 0 and b is 0. In other words, lets see what happens when we sample along the line y=0.

Plugging those values in gives:

z = (B-A)x + A

interestingly, this is just a linear interpolation between A and B, which makes sense when looking at the graph of where we are sampling on the bilinear surface.

This goes along with what wikipedia told us: when one of the axes is constant (it’s a horizontal or vertical line) the result is linear.

m=1, b=0

Let’s try m = 1 and b = 0. That is the line: y=x. This graph shows where that is sampling from on the bilinear surface:

Plugging in the values gives us this equation:

z = (A-B-C+D)x^2+(C+B-2A)x+A

We get a quadratic out! This shouldn’t be too surprising. This is the original insight in the technique. This is also the formula for a quadratic Bezier curve with control points A, (B+C)/2, D.

m=2, b=1

Let’s try the line y=2x+1. Here’s the graph of where we are sampling on the bilinear surface:

Plugging in the values give us the equation:

z = (2A-2B-2C+2D)x^2+(C+D-2A)x+C

Once again we got a quadratic function when sampling along a line.

You might think it’s strange that the equation ends it “+C” instead of “+A”, but if you look at the graph it makes sense. The line literally starts at C when x is zero.

x=2u, y=3u

In the above examples we are only modifying the y variable, to be some function of x. What if we also want to modify the x variable?

One way to do this is to make a 3rd variable u that goes from 0 to 1. Then we can make x and y be based on that variable.

Let’s see what happens when we use these two equations:

y=2u

x=3u

That makes us sample this line on the bilinear surface.

Plugging the functions of u in for x and y we get:

z = (6A-6B-6C+6D)u^2+(2B+3C-5A)u+A

It’s still a quadratic!

What About a Quadratic Path?

So we now know that when moving along a straight line on a bilinear surface, that you will get a quadratic function as output, except in the case of the line being horizontal or vertical. Note: if the bilinear surface is a plane, all lines on that surface will be linear functions, so this is another way to get a linear result. It could also be degenerate and give you a point result. You will never get a cubic result (or higher) when going along a straight line though.

What would happen though if instead of sampling along straight lines, we sampled on other shapes, like quadratic curves?

y=x*x

Let’s start with the function y=x^2. The path that is sampled is:

Going back to the power series form of bilinear interpolation, let’s plug x^2 in for y and see what we get out.

The starting equation:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

becomes:

z = (A-B-C+D)x(x^2) + (B-A)x + (C-A)(x^2) + A

Which becomes:

z = (A-B-C+D)x^3 + (C-A)x^2 + (B-A)x + A

It’s a cubic equation!

Here is a shadertoy which follows this sampling path on random pixels: https://www.shadertoy.com/view/4sdBz7

Something neat about this example specifically is that a cubic equation has 4 coefficients, which are basically 4 control points. This example makes use of the values of the 4 pixels involved to come up with the 4 coefficients, so “doesn’t leave anything on the table” so to speak.

This is unlike sampling along line segments where you have 3 control points stored in 4 pixel values. One is a bit redundant in that case.

You can make use of that fact (I have for instance!), but sampling along a quadratic path to get a cubic curve feels like a natural fit.

x=u*u, y=u*u

Let’s see what happens when we move along both x and y quadratically.

Just like in the linear case, we have our 3rd variable u that goes from 0 to 1 and we have x and y be based on that variable. We will use these equations:

x=u^2

y=u^2

The sampling path looks like this:

When we plug those in we get this quartic function:

z = (A-B-C+D)u^4 + (B+C-2A)u^2 + A

You might be surprised to see what looks like a linear path. It’s just because at all times, x is the same value as y, even though they travel down the line non linearly.

Shadertoy: https://www.shadertoy.com/view/Xdtfz7

Higher Order Curves: x=3u^2, y=2u^4

Let’s get a little more wild, using these equations:

x=3u^2

y=u^4

Which makes a sampling path of this:

Plugging in the equations, the bilinear interpolation equation:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

becomes a hexic equation:

z = (3A-3B-3C+3D)u^6 + (C-A)u^4 + (3B-3A)u^2 + A

The shadertoy visualizes it on random pixels as per usual, but with u going from 0 to 1, it means that x goes from 0 to 3 (y is still 0-1), which makes some obvious discontinuities at the boundaries of pixels. In our pure math formulation, we wouldn’t have any of those, but since we are sampling a real texture, when we leave the safety of our (0,1) box, we enter a new box with different control points. https://www.shadertoy.com/view/4dtfz7

Trigonometric Function: y = sin(2*pi*x)

Let’s try y=sin(2\pi x), which takes this path on the bilinear surface:

The bilinear interpolation equation becomes a trigonometric polynomial:

z = (A-B-C+D)x*sin(2\pi x) + (B-A)x + (C-A)*sin(2\pi x) + A

That has disconuities in it when texture sampling again, due to leaving the original pixel region, so here’s a better looking shadertoy, which is for y=sin(2\pi x)*0.5+0.5. It scales and shifts the y values to be between 0 and 1. https://www.shadertoy.com/view/4stfz7

Circle

Lastly, here’s sampling on a circle.

x=sin(2 \pi u)*0.5+0.5

y=cos(2 \pi u)*0.5+0.5

It follows this path:

Plugging the functions into the power series bilinear equation gives:

z = (A-B-C+D)*(sin(2 \pi u)*0.5+0.5)*(cos(2 \pi u)*0.5+0.5) + (B-A)*(sin(2 \pi u)*0.5+0.5) + (C-A)*(cos(2 \pi u)*0.5+0.5) + A

Here’s the shadertoy: https://www.shadertoy.com/view/Xddfz7

Something neat about sampling in a circle is that it’s continuous – note how the left side of the curves line up with the right side seamlessly. That seems like a pretty useful property.

Moving On

We went off into the weeds a bit, but hopefully you can see how there are a ton of possibilities for encoding and decoding data in a very small number of pixels by carefully crafting the path you sample along.

Compared to the simple “sample along the diagonal” technique, there is some added complexity and shader instructions though. Namely, any work you do to modify x or y before passing them to the linear texture interpolator needs to happen in shader code. That means this technique takes more ALU, but can mean it takes even less texture memory than the other method.

The last question from the top of the post is “What does this all mean in higher dimensional interpolation, like trilinear or quadrilinear?”

Well, it works pretty much the same was as bilinear but there are more dimensions to work with.

We saw that in 2 dimensional bilinear interpolation that when we made x and y be functions (either of each other, or of a 3rd variable u), that the resulting polynomial had a degree that was the degree of x plus the degree of y.

In 3 dimensions with trilinear interpolation, the resulting polynomial would have a degree that is the degree of x, plus the degree of y, plus the degree of z.

In 4 dimensions with quadrilinear, add to that the degree of w.

Let’s consider the case when we don’t want a single curve though, but want a surface or (hyper) volume.

As we’ve seen in the extension dealing with surfaces and volumes, if you have a degree N polynomial, you can break it apart into a multivariate polynomial (aka a surface or hyper volume) so long as the sum of the degrees of each axis adds up to N.

It’s basically what we were just talking about but in reverse.

One thing I think would be interesting to explore further would be to see what the limitations are when you take this “too far”.

For instance, a 2×2 texture can give you a quadratic if you sample along any straight line in the uv coordinates. If you first put the u coordinate through a cubic function, and put the v coordinate through a different cubic function, I think you should be able to make a bicubic surface.

The surface will be constrained to a subset of what a general bicubic surface is able to be shaped like, but you will get a bicubic surface. (basically there will be implicit control points that you don’t have control over unless you add more pixels, and do more sampling, or higher dimensional linear interpolation)

I’d like to see what the constraints there are and see if there’s any chance of getting any real use out of something like that.

Anyhow, thanks for reading! Any ideas, corrections, usage cases you have, whatever, hit me up!

@Atrix256

Improved Storage Space Efficiency of GPU Texture Sampler Bezier Curve Evaluation

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves (also just polynomials in general as well as rational polynomials, and also surfaces and volumes made by tensor products). You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension improves on the efficiency of the storage space usage, allowing a higher density of curve data per pixel, but the post also talks about some caveats and limitations.

This post is divided into the following sections:

  1. Basic Idea of Extension
  2. 2D Texture / Quadratic Piecewise Curves
  3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity
  4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency
  5. Real World Limitations
  6. 3D Texture / Cubic Piecewise Curves
  7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?
  8. 3D Texture / Cubic Piecewise Curves – C0 Continuity
  9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency
  10. Generalizing The Unit Hyper Cube
  11. Closing
  12. Code

1. Basic Idea of Extension

Let’s talk about the base technique before going into the details of the extension.

The image below shows how bilinear interpolation across the diagonal between pixels can calculate points on curves. Bilinear interpolation is exactly equivalent to the De Casteljau algorithm when the u and v coordinate are the same value.

Linear interpolation between two values A and B at time t is done with this formula:
A(1-t) + Bt

I’ve found useful to replace (1-t) with it’s own symbol s. That makes it become this:
As + Bt

Now, if you bilinear interpolate between 4 values, you have two rows. One row has A,B in it and the other row has C,D in it. To bilinear interpolate between these four values at time (t,t), the formula is this:
(As + Bt)s + (Cs+Dt)t

If you expand that and collect like terms you come up with this equation:
As^2 + (B+C)st + Dt^2

At this point, the last step is to make B and C the same value (make them both into B) and then rename D to C since that letter is unused. The resulting formula turns out to be the formula for a quadratic Bezier curve. This shows that mathematically, bilinear interpolation can be made to be mathematically the same as the quadratic Bezier formula. (Note: there are extensions to get higher order curves and surfaces as well)
As^2 + 2Bst + Ct^2

However, for this extension we are going to take one step back to the prior equation:
As^2 + (B+C)st + Dt^2

What you may notice is that the two values in the corners of the 2×2 bilinear interpolation don’t have to be the exact value of the middle control point of the quadratic Bezier curve – they only have to AVERAGE to that value.

This is interesting because to encode two different piecewise quadratic curves (C0-C2 and C3-C5) into a 2d texture before this extension, I would do it like this:

A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3 \\ E = C_4 \\ F = C_5\\

That uses 8 pixels to store the 6 control points of the two quadratic curves.

With the ideas of this extension, one way it could look now is this:

A = C_0 \\ B + C = 2*C_1 : B = 2*C_1 - C_3 \\ D = C_2 \\ C = C_3 \\ D + E = 2*C_4 : E = 2*C_4 - C_2\\ F = C_5\\

The result is that 6 pixels are used instead of 8, for storing the 6 control points of the two quadratic curves.

That isn’t the only result though, so let’s explore the details (:

2. 2D Texture / Quadratic Piecewise Curves

Let’s start by more formally looking at the 2d texture / quadratic curve case.

We are going to number the pixels by their texture coordinate location (in the form of Pyx) instead of using letters. Later on that will help show a pattern of generalization. We are still using the same notation for control points where C0 is the first control point, C1 is the second control point and so on.

Looking at a single quadratic curve we have this texture which has these constraints on it’s pixel values:

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\

To analyze this, let’s make an augmented matrix. The left matrix is a 3×4 matrix where each column is a pixel and each row is the left side of the equation for a constraint. The right matrix is a 3×3 matrix where each column is a control point and each row is the right side of the equation for a constraint. The first row of the matrix is column labels to help see what’s going on more easily.

Note that i put my pixel columns and control point columns in ascending order in the matrix, but if you put them in a different order, you’d get the same (or equivalent) results as I did. It’s just my convention they are in this order.

\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]

The next step would be to put this matrix into reduced row echelon form to solve the equations to see what the values of the pixels need to be, but the matrix is in fact already in rref form! (For more information on rref, check out my last post: Solving N equations and N unknowns: The Fine Print (Gauss Jordan Elimination))

What we can see by looking at the rref of the matrix is that either P01 or P10 can be a free variable – meaning we can choose whatever value we want for it. After we choose a value for either of those variables (pixels), the rest of the pixels are fully defined.

Deciding that P10 is the free variable (just by convention that it isn’t the leading non zero value), the second equation (constraint) becomes P01 = 2*C1-P10.

If we choose the value C1 for P10, that means that P01 must equal C1 too (this is how the original technique worked). If we choose 0 for P10, that means that P01 must equal 2*C1. This is because P01 must always equal 2*C1-P10. We then are in the new territory of this extension, where the pixels representing the middle control point have some freedom about what values they can take on, so long as they average to the middle control point value.

Let’s add a row of pixels and try encoding a second quadratic curve:

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5

Let’s again make an augmented matrix with pixels on the left and control points on the right.

\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1  \end{array}\right]

Putting that into rref to solve for the pixel values we get this:

\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1  \end{array}\right]

We got the identity matrix on the left, so we don’t have any inconsistencies or free variables.

If we turn that matrix back into equations we get this:

P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = C_2 \\ P_{20} = 2*C_4 - C_2 \\ P_{21} = C_5

We were successful! We can store two piecewise Bezier curves in 6 pixels by setting the pixel values to these specific values.

The last example we’ll show is the next stage, where it falls apart. We’ll add another row of pixels and try to encode 3 Bezier curves (9 control points) into those 8 pixels.

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{10} = C_3 \\ P_{11}+P_{20} = 2*C_4 \\ P_{21} = C_5 \\ P_{20} = C_6 \\ P_{21}+P_{30} = 2*C_7 \\ P_{31} = C_8

This is the augmented matrix with pixels on the left and control points on the right:

\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

The rref form is:

\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & -2 & 0 & 1 & 0 & 0 \\ \end{array}\right]

Let’s turn that back into equations.

P_{00} = C_0 \\ P_{01} = 2*C_1 - C_3 \\ P_{10} = C_3 \\ P_{11} = 2*C_4 - C_6 \\ P_{20} = C_6 \\ P_{21} = C_5 \\ P_{30} = 2*C_7 - C_5 \\ P_{31} = C_8 \\ 0 = C_2 - 2*C_4 + C_6

We have a problem unfortunately! The bottom row says this:

0 = C_2 - 2*C_4 + C_6

That means that we can only store these curves in this pixel configuration if we limit the values of the control points 2,4,6 to values that make that last equation true.

Since my desire is to be able to store curves in textures without “unusual” restrictions on what the control points can be, I’m going to count this as a failure for a general case solution.

It only gets worse from here for the case of trying to add another row of pixels for each curve you want to add.

It looks like storing two quadratic curves in a 2×6 group of pixels is the most optimal (data dense) storage. If you go any higher, it puts restrictions on the control points. If you go any lower, you have a free variable, which means you aren’t making full use of all of the pixels you have.

This means that if you are storing piecewise quadratic curves in 2d textures, doing it this way will cause you to use 3/4 as many pixels as doing it the other way, and you will be using 1 pixel per control point stored, instead of 1.333 pixels per control point stored.

This isn’t the end of the story though, so let’s continue (:

3. 2D Texture / Quadratic Piecewise Curves – C0 Continuity

If we add the requirement that our piecewise curves must be connected (aka that they have C0 continuity), we can actually do something pretty interesting. Take a look at this setup:

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5

Putting this into matrix form looks like this:

\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

In rref it becomes this:

\left[\begin{array}{rrrrrr|rrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5\\ 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & -1 & 0 & 2 & 0 & 0 & -2 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 \\ \end{array}\right]

Turning the rref back into equations we get:

P_{00} = C_0 \\ P_{01} - P_{21} = 2*C_1-2*C_4 \\ P_{10} + P_{21} = 2*C_4 \\ P_{11} = C_3 \\ P_{20} = C_5 \\ 0 = C_2 - C_3

P21 is a free variable, so we can set it to whatever we want. Once we choose a value, the pixel values P01 and P10 will be fully defined.

The bottom equation might have you worried, because it looks like an inconsistency (aka restriction) but it is actually expected.

That last equation says 0 = C2-C3 which can be rearranged into C2 = C3. That just means that the end of our first curve has to equal the beginning of our second curve. That is C0 just the continuity we already said we’d agree to.

So, it worked! Let’s try adding a row of pixels and another curve to see what happens.

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\ P_{11} = C_3 \\ P_{10}+P_{21} = 2*C_4 \\ P_{20} = C_5\\ P_{20} = C_6\\ P_{21}+P_{30} = 2*C_7\\ P_{31} = C_8

Putting that into matrix form:

\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ \end{array}\right]

And in rref:

\left[\begin{array}{rrrrrrrr|rrrrrrrrr} P_{00} & P_{01} & P_{10} & P_{11} & P_{20} & P_{21} & P_{30} & P_{31} & C_0 & C_1 & C_2 & C_3 & C_4 & C_5 & C_6 & C_7 & C_8\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 2 & 0 & 0 & -2 & 0 & 0 & 2 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & -2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0\\ \end{array}\right]

Turning the rref back into equations:

P_{00} = C_0 \\ P_{01}+P_{30} = 2*C_1 - 2*C_4+2*C_7 \\ P_{10}-P_{30} = 2*C_4-2*C_7 \\ P_{11} = C_3 \\ P_{20} = C_6 \\ P_{21} + P_{30} = 2*C_7\\ P_{31} = C_8\\ 0 = C_2 - C_3\\ 0 = C_5 - C_6\\

We see that P30 is a free variable, and the last two rows show us we have the C0 continuity requirements: C2 = C3 and C5 = C6.

The last section without C0 continuity reached it’s limit of storage space efficiency after storing two curves (6 control points) in 6 pixels.

When we add the C0 continuity requirement, we were able to take it further and store 3 curves in 8 pixels. Technically those 3 curves have 9 control points, but because the end point of each curve has to be the same as the start point of the next curve it makes it so in reality there is only 3 control points for the first curve and then 2 additional control points for each additional curve. That makes 8 control points for 3 curves, not 9.

Unlike the last section, using this zigzag pattern with C0 continuity, you can encode any number of curves. I am not sure how to prove it, but from observation, there is no sign of any shrinking of capacity as we increase the number of curves, adding two more rows of pixels for each curve. If you know how to prove this more formally, please let me know!

Note that instead of explicitly having 3 control points per curve, where the first control point of a curve has to equal the last control point of the previous curve, you can instead describe the piecewise curves with fewer control points. You need 3 control points for the first curve, and then 2 control points for each curve after that.

Mathematically both ways are equivelant and you’ll get to the same answer. The accompanying source code works that way, but I show this example in this longer way to more explicitly show how things work.

4. 2D Texture / Quadratic Piecewise Curves – Storage Efficiency

Let’s compare the storage efficiency of the last two sections to each other, as well as to the original technique.

\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 6 & 1.00 & 3.00 \\ 3 & 2x5 & 10 & 9 & 1.11 & 3.33 \\ 4 & 2x6 & 12 & 12 & 1.00 & 3.00 \\ 5 & 2x8 & 16 & 15 & 1.06 & 3.20 \\ 6 & 2x9 & 18 & 18 & 1.00 & 3.00 \\ \hline \end{array}

\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x3 & 6 & 5 & 1.20 & 3.00 \\ 3 & 2x4 & 8 & 7 & 1.14 & 2.66 \\ 4 & 2x5 & 10 & 9 & 1.11 & 2.50 \\ 5 & 2x6 & 12 & 11 & 1.09 & 2.40 \\ 6 & 2x7 & 14 & 13 & 1.08 & 2.33 \\ \hline \end{array}

\begin{array}{|cccccc|} \hline & & \rlap{\text{2d / Quadratic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2 & 4 & 3 & 1.33 & 4.00 \\ 2 & 2x4 & 8 & 6 & 1.33 & 4.00 \\ 3 & 2x6 & 12 & 9 & 1.33 & 4.00 \\ 4 & 2x8 & 16 & 12 & 1.33 & 4.00 \\ 5 & 2x10 & 20 & 15 & 1.33 & 4.00 \\ 6 & 2x12 & 24 & 18 & 1.33 & 4.00 \\ \hline \end{array}

The tables show that the first method uses fewer pixels per control point, while the second method uses fewer pixels per curve.

The first method can get you to what I believe to be the maximum density of 1 pixel per control point if you store an even number of curves. It can also give you a curve for every 3 pixels of storage.

The second method approaches the 1 pixel per control point as you store more and more curves and also approaches 2 pixels of storage per curve stored. Note that the second method’s table is using the convention of 3 control points are used for the first curve, and 2 additional control points for each curve after that.

The deciding factor for which method to use is probably going to be whether or not you want to force C0 continuity of your curve data. If so, you’d use the second technique, else you’d use the first.

The original technique uses a constant 1.33 pixels per control point, and 4 pixels to store each curve. Those numbers shows how this extension improves on the storage efficiency of the original technique.

5. Real World Limitations

This extension has a problem that the original technique does not have unfortunately.

While the stuff above is correct mathematically, there are limitations on the values we can store in actual textures. For instance, if we have 8 bit uint8 color channels we can only store values 0 to 255.

Looking at one of the equations P_{01} = 2*C_1 - C_3 , if C1 is 255 and C3 is 0, we are going to need to store 510 in the 8 bits we have for P01, which we can’t. If C1 is 0 and C3 is not zero, we are going to have to store a negative value in the 8 bits we have for P01, which we can’t.

This becomes less of a problem when using 16 bit floats per color channel, and is basically solved when using 32 bit floats per color channels, but that makes the technique hungrier for storage and less efficient again.

While that limits the usefulness of this extension, there are situations where this would still be appropriate – like if you already have your data stored in 16 or 32 bit color channels like some data (eg position data) would require..

The extension goes further, into 3d textures and beyond though, so let’s explore a little bit more.

6. 3D Texture / Cubic Piecewise Curves

The original technique talks about how to use a 2x2x2 3d volume texture to store a cubic Bezier curve (per color channel) and to retrieve it by doing a trilinear interpolated texture read.

If you have four control points A,B,C,D then the first slice of the volume texture will be a 2d texture storing the quadratic Bezier curve defined by A,B,C and the second slice will store B,C,D. You still sample along the diagonal of the texture but this time it’s a 3d diagonal instead of 2d. Here is that setup, where the texture is sampled along the diagonal from from A to D:

A = C_0 \\ B = C_1 \\ C = C_2 \\ D = C_3

Let’s look at what this extension means for 3d textures / cubic curves.

The equation for a cubic Bezier curve looks like this:

As^3 + 3Bs^2t + 3Cst^2 + Dt^3

If we derive that from trilinear interpolation between 8 points A,B,C,D,E,F,G,H, the second to last step would look like this:

As^3 + (B+C+E)s^2t + (D+F+G)st^2 + Ht^3

So, similar to our 2d setup, we have some freedom about our values.

In the original technique, B,C,E would have to be equal to the second control point, and D,F,G would have to be equal to the third control point. With the new extension, in both cases, those groups of values only have to AVERAGE to their specific control points. Once again, this gives us some freedoms for the values we can use, and lets us use our pixels more efficiently.

Here is the setup, again using texture coordinates (in the form Pzyx) for the pixels instead of letters.

P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3

here’s how the equations look in matrix form, which also happens to already be in rref:

\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} &    C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &    1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 &    0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 &    0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 &    0 & 0 & 0 & 1 \\ \end{array}\right]

P010 and P100 are free variables and so are P101 and P110, making a total of four free variables. They can be set to any value desired, which will then define the value that P001 and P011 need to be.

Let’s add another piecewise cubic Bezier curve, and another row of pixels to the texture to see what happens.

P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{010} = C_{4}\\ P_{011} + P_{020} + P_{110} = 3C_{5}\\ P_{021} + P_{111} + P_{120} = 3C_{6}\\ P_{121} = C_{7}\\

Here are the equations in matrix form:

\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

Here it is in rref:

\left[\begin{array}{rrrrrrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

Putting that back into equations we have this:

P_{000} = C_{0}\\ P_{001} + P_{100} = 3C_{1} + -C_{4}\\ P_{010} = C_{4}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{020} + -P_{101} = -3C_{2} + 3C_{5}\\ P_{021} + P_{120} = -C_{3} + 3C_{6}\\ P_{111} = C_{3}\\ P_{121} = C_{7}\\

The result is that we still have four free variables: P100, P101, P110 and P120. When we give values to those pixels, we will then be able to calculate the values for P001, P011, P020 and P021.

There is a limit to this pattern though. Where the maximum number of curves to follow the pattern was 2 with the 2d / quadratic case, the maximum number of curves to follow this pattern with the 3d / cubic case is 3. As soon as you try to put 4 curves in this pattern it fails by having constraints. Interestingly, we still have 4 free variables when putting 3 curves in there, so it doesn’t follow the 2d case where free variables disappeared as we put more curves in, indicating when the failure would happen.

If you know how to more formally analyze when these patterns of equations will fail, please let me know!

7. 3D Texture / Cubic Piecewise Curves – Multiple Curves?

Looking at the 3d texture case of 2x2x2 storing a single curve, I saw that there were 4 free variables. Since it takes 4 control points to define a cubic curve, I wondered if we could use those 4 free variables to encode another cubic curve.

Here’s a setup where the x axis is flipped for the second curve. It’s a little bit hard to tell from the diagram, but the blue line does still go through the center of the 3d cube. It goes from P001 to P110, while the first curve still goes from P000 to P111.

Here’s what the equations look like:

P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3*C_{1}\\ P_{011} + P_{101} + P_{110} = 3*C_{2}\\ P_{111} = C_{3}\\ P_{001} = C_{4}\\ P_{000} + P_{011} + P_{101} = 3*C_{5}\\ P_{010} + P_{100} + P_{111} = 3*C_{6}\\ P_{110} = C_{7}\\

And in matrix form:

\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

After putting the matrix in rref to solve the equations, we get this matrix:

\left[\begin{array}{rrrrrrrr|rrrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} & C_{7} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 3 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 3 & 0 & 0 & -3 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & \frac{1}{3} & \frac{-1}{3} & 0 & -1 & 0 \\ \end{array}\right]

Which is this set of equations:

P_{000} = -3C_{2} + 3C_{5} + C_{7}\\ P_{001} = C_{4}\\ P_{010} + P_{100} = -C_{3} + 3C_{6}\\ P_{011} + P_{101} = 3C_{2} + -C_{7}\\ P_{110} = C_{7}\\ P_{111} = C_{3}\\ 0 = C_{0} + 3C_{2} - 3C_{5} - C_{7}\\ 0 = C_{1} + C_{3}/3 - C_{4}/3 + -C_{6}\\

In the end there are 2 free variables, but also 2 constraints on the values that the control points can take. The constraints mean it doesn’t work which is unfortunate. That would have been a nice way to bring the 3d / cubic case to using 1 pixel per control point!

I also tried other variations like flipping y or z along with x (flipping all three just makes the first curve in the reverse direction!) but couldn’t find anything that worked. Too bad!

8. 3D Texture / Cubic Piecewise Curves – C0 Continuity

Since the regular 3d texture / cubic curve pattern has a limit (3 curves), let’s look at the C0 continuity version like we did for the 2d texture / quadratic case where we sample zig zag style.

Since the sampling has to pass through the center of the cube, we need to flip both x and z each curve.

That gives us a setup like this:

Here are the constraints for the pixel values:

P_{000} = C_{0}\\ P_{001} + P_{010} + P_{100} = 3C_{1}\\ P_{011} + P_{101} + P_{110} = 3C_{2}\\ P_{111} = C_{3}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{020} = C_{6}\\

Which looks like this in matrix form:

\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 3 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\right]

Here is the matrix solved in rref:

\left[\begin{array}{rrrrrrrrrrrr|rrrrrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{020} & P_{021} & P_{100} & P_{101} & P_{110} & P_{111} & P_{120} & P_{121} & C_{0} & C_{1} & C_{2} & C_{3} & C_{4} & C_{5} & C_{6} \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & -1 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & 0 & 0 & -3 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & -1 & 0 & 0 & 3 & 0 & -3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ \end{array}\right]

And here is that matrix put back into equations form:

P_{000} = C_{0}\\ P_{001} - P_{021} + P_{100} - P_{120} = 3C_{1} - 3C_{5}\\ P_{010} + P_{021} + P_{120} = 3C_{5}\\ P_{011} + P_{110} + P_{121} = 3C_{4}\\ P_{020} = C_{6}\\ P_{101} - P_{121} = 3C_{2} - 3C_{4}\\ P_{111} = C_{3}\\

It worked! It also has 5 free variables.

This pattern works for as many curves as i tried (21 of them), and each time you add another curve / row of this pattern you gain another free variable.

So, storing 2 curves results in 6 free variables, 3 curves has 7 free variables, 4 curves has 8 free variables and so on.

9. 3D Texture / Cubic Piecewise Curves – Storage Efficiency

Let’s compare storage efficiency of these 3d texture / cubic curve techniques like we did for the 2d texture / quadratic curve techniques.

\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Extension}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 8 & 1.50 & 6.00 \\ 3 & 2x4x2 & 16 & 12 & 1.33 & 5.33 \\ 4 & 2x6x2 & 24 & 16 & 1.50 & 6.00 \\ 5 & 2x7x2 & 28 & 20 & 1.40 & 5.60 \\ 6 & 2x8x2 & 32 & 24 & 1.33 & 5.33 \\ \hline \end{array}

\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Quadratic - Extension + C0 Continuity}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x3x2 & 12 & 7 & 1.71 & 6.00 \\ 3 & 2x4x2 & 16 & 10 & 1.60 & 5.33 \\ 4 & 2x5x2 & 20 & 13 & 1.54 & 5.00 \\ 5 & 2x6x2 & 24 & 16 & 1.50 & 4.80 \\ 6 & 2x7x2 & 28 & 19 & 1.47 & 4.67 \\ \hline \end{array}

\begin{array}{|cccccc|} \hline & & \rlap{\text{3d / Cubic - Original Technique}} & & & \\ \hline \text{Curves} & \text{Dimensions} & \text{Pixels} & \text{Control Points} & \text{Pixels Per Control Point} & \text{Pixels Per Curve} \\ \hline 1 & 2x2x2 & 8 & 4 & 2.00 & 8.00 \\ 2 & 2x4x2 & 16 & 8 & 2.00 & 8.00 \\ 3 & 2x6x2 & 24 & 12 & 2.00 & 8.00 \\ 4 & 2x8x2 & 32 & 16 & 2.00 & 8.00 \\ 5 & 2x10x2 & 40 & 20 & 2.00 & 8.00 \\ 6 & 2x12x2 & 48 & 24 & 2.00 & 8.00 \\ \hline \end{array}

The original technique had a constant 2 pixels per control point and 8 pixels per cubic curve.

The basic extension lets you bring that down to 1.33 pixels per control point, and 5.33 pixels per curve.

If C0 continuity is desired, as you store more and more curves the extension can bring things down towards 1.33 pixels per control point, and 4 pixels per curve. (Remember that with the C0 extension you have 4 control points for the first curve and then 3 more for each subsequent curve, so that 1.33 pixels per control point isn’t exactly an apples to apples comparison vs the basic extension).

The pattern continues for 4D textures and higher (for higher than cubic curves too!), but working through the 2d and 3d cases for quadratic / cubic curves is the most likely usage case both because 4d textures and higher are kind of excessive (probably you’d need to do multiple texture reads to simulate them), but also when fitting curves to data, quadratic and cubic curves tend to do well in that they don’t usually overfit the data or have as many problems with ringing.

Despite that, I do think it’s useful to look at it from an N dimensional point of view to see the larger picture, so let’s do that next.

10. Generalizing The Unit Hyper Cube

Let’s ignore the zig zag sampling pattern and storing multiple curves in a texture and just get back to the basic idea.

Given an N dimensional texture that is 2x2x…x2 that you are going to sample across the diagonal to get a degree N Bezier curve from, how do you know what values to put in which control points to use this technique?

You could derive it from N-linear interpolation, but that is a lot of work.

The good news is it turns out there is a simple pattern, that is also pretty interesting.

Let’s check out the 1d, 2d and 3d cases to see what patterns we might be able to see.

1d Texture / linear Bezier / linear interpolation:

P_{0} = C_0 \\ P_{1} = C_1 \\

\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]

2d Texture / Quadratic Bezier:

P_{00} = C_0 \\ P_{01}+P_{10} = 2*C_1 \\ P_{11} = C_2 \\

\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]

3d Texture / Cubic Bezier:

P_{000} = C_0\\ P_{001}+P_{010}+P_{100} = 3*C_1\\ P_{011}+P_{101}+P_{110} = 3*C_2\\ P_{111} = C_3

\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} &    C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &    1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 & 0 &    0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 &    0 & 0 & 3 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 &    0 & 0 & 0 & 1 \\ \end{array}\right]

The first pattern you might see is that the right side of the equations for an N dimensional hypercube is the identity matrix, but instead of using 1 for the value along the diagonal, it uses values from Pascal’s Triangle (binomial coefficients).

To simplify this a bit though, we could also notice that the number on the right side of the equation equals the sum of the numbers on the left side of the equation. Mathematically it would be the same to say that the numbers on the left side of the equation have to sum up to 1. This would make the matrix on the right just be the identity matrix and we can forget about Pascal’s triangle numbers (they will show up implicitly as divisors of the left side equation coefficients but there’s no need to explicitly calculate them).

But then we are still left with the matrix on the left. How do we know which pixels belong in which rows?

It turns out there is another interesting pattern here. In all the matrices above it follows this pattern:

  • Row 0 has a “1” wherever the pixel coordinate has 0 ones set
  • Row 1 has a “1” wherever the pixel coordinate has 1 ones set
  • Row 2 has a “1” wherever the pixel coordinate has 2 ones set
  • Row 3 has a “1” wherever the pixel coordinate has 3 ones set
  • ….

That pattern continues indefinitely, but don’t forget that the numbers (coefficients) on the left side of the equation must add up to one.

Here is the matrix form of 1d / linear, 2d / quadratic, and 3d / cubic again with the right matrix being the identity matrix, and the equations below them. Notice the pattern about counts of one bits in each row!

1D:

\left[\begin{array}{rr|rr} P_{0} & P_{1} & C_0 & C_1\\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{array}\right]

P_0 = C_0 \\ P_1 = C_1 \\

2D:

\left[\begin{array}{rrrr|rrr} P_{00} & P_{01} & P_{10} & P_{11} & C_0 & C_1 & C_2 \\ 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 \end{array}\right]

P_{00} = C_0 \\ P_{01}/2 + P_{10}/2 = C_1 \\ P_{11} = C_2 \\

3D:

\left[\begin{array}{rrrrrrrr|rrrr} P_{000} & P_{001} & P_{010} & P_{011} & P_{100} & P_{101} & P_{110} & P_{111} &    C_0 & C_1 & C_2 & C_3 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 &    1 & 0 & 0 & 0 \\ 0 & \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3} & 0 & 0 & 0 &    0 & 1 & 0 & 0 \\ 0 & 0 & 0 & \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} & 0 &    0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 &    0 & 0 & 0 & 1 \\ \end{array}\right]

P_{000} = C_0 \\ P_{001}/3 + P_{010}/3 + P_{100}/3 = C_1 \\ P_{011}/3 + P_{101}/3 + P_{110}/3 = C_2 \\ P_{111} = C_3 \\

Here are the formulas for linear, quadratic and cubic Bezier curves to show a different way of looking at this. Below each is the same formula but with the 1d, 2d and 3d pixels in the formula instead of the control points, using the formulas above which relate pixel values to control point values. Note that I have replaced (1-t) with s for easier reading.

f(t) = As + Bt \\ \\ f(t) = P_0s + P_1t\\

f(t) = As^2 + 2Bst + Ct^2 \\ \\ f(t) = P_{00}s^2 + (P_{01}+P_{10})st + P_{11}t^2 \\

f(t) = As^3 + 3Bs^2t + 3Cst^2 + Dt^3 \\ f(t) = P_{000}s^3 + (P_{001}+P_{010}+P_{100})s^2t + (P_{011}+P_{101}+P_{110})st^2 + P_{111}t^3

I think it’s really interesting how in the last equation as an example, “3B” literally becomes 3 values which could have the value of B. In the plain vanilla technique they did have the value of B. In this extension, the only requirement is that they average to B.

It’s also interesting to notice that if you have an N bit number and you count how many permutations have each possible number of bits turned on, the resulting counts is the Pascal’s triangle row. That is nothing new, but it seems like there might be a fun way to convert a set of random numbers (white noise) into a Gaussian distribution, just by counting how many one bits there were in each number. That isn’t new either, and there are better algorithms, but still I think it’s an interesting idea, and may be useful in a pinch since it seems pretty computationally inexpensive.

11. Closing

This extension makes storage efficiency a bit better than the plain vanilla technique, especially if you are interested in C0 continuous curves.

The extension does come at a price though, as you may find yourself in a situation where you need to store a value that is outside of the possible values for common data formats to store (such as needing to store a negative number or a larger than 255 number in a uint8).

Even so, if these three criteria are met:

  1. You are already storing data in textures. (Counter point: compute is usually preferred over texture lookup these days)
  2. You are relying on the texture interpolator to interpolate values between data points. (Counter point: if you don’t want the interpolation, use a buffer instead so you fit more of the data you actually care about in the cache)
  3. You are storing data in 16 or 32 bit real numbers. (Counter point: uint8 is half as large as 16 bit and a quarter as large as 32 bit already)

Then this may be an attractive solution for you, even over the plain vanilla technique.

For future work, I think it would be interesting to see how this line of thinking applies to surfaces.

I also think there is probably some fertile ground looking into what happens when sampling off of the diagonal of the textures. Intuitively it seems you might be able to store some special case higher order curves in lower dimension / storage textures, but I haven’t looked into it yet.

A common usage case when encoding data in a texture would probably include putting curves side by side on the x axis of the texture. It could be interesting to look into whether curves need to be completely separate from each other horizontally (aka 2 pixel of width for each track of curves in the texture), or if you could perhaps fit two curves side by side in a 3 pixel width, or any similar ideas.

Lastly, when looking at these groups of points on these N dimensional hyper-cubes, I can’t help but wonder what kinds of shapes they are. Are they simplices? If so, is there a pattern to the dimensions they are of?

It’s a bit hard to visualize, but taking a look at the first few rows of pascal’s triangle / hyper cubes here’s what I found:

  • Dimension 1 (line) : Row 2 = 1,1. Those are both points, so are simplices of 0 dimension.
  • Dimension 2 (square) : Row 3 = 1,2,1. The 1’s are points, the 2 is a line, so are simplices of dimension 0, 1, 0.
  • Dimension 3 (cube) : Row 4 = 1,3,3,1. The 1’s are still points. The 3’s are in fact triangles, I checked. So, simplices of dimension 0, 2, 2, 0.
  • Dimension 4 (hypercube) : Row 5 = 1,4,6,4,1. The 1’s are points. The 4’s are tetrahedrons. The 6 is a 3 dimensional object. I’m not sure it’s shape but that makes it not be a simplex. Possibly it’s two simplices fused together some how. I don’t really know. So, the dimensions anyways are: 0, 3, 3, 3, 0.
  • Beyond? That’s as far as I looked. If you look further / deeper and find anything interesting please share!

Update: PBS Infinite Series ended up posting a video on the topic of hypercube slices and pascal’s triangle (seriously!). Give it a watch if you are interested in how these things relate: Dissecting Hypercubes with Pascal’s Triangle | Infinite Series

Questions, comments, corrections, etc? Post a comment below or hit me up on twitter at @Atrix256.

If you have a usage case for this or any of the related techniques, I’d love to hear about them.

Thanks for reading!

12. Code

It’s easy to talk about things and claim that everything is correct, when in fact, the moment you try it, everything falls apart.

I made up some simple standalone c++ code that goes through the cases we talked about, doing the math we did, and also verifying that the texture interpolation is equivalent to actually calculating Bezier curves (using Bernstein polynomials).

You can also use this code as a starting point to explore higher curve counts or other storage patterns. It uses only standard includes and no libraries, so it should be real easy to drop this into a compiler and start experimenting.

Here’s some example output, which shows 6 cubic curves stored in a 3d texture using the zig zag sampling pattern.

Here’s the code:

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <stdlib.h>
#include <array>
#include <algorithm>
#include <unordered_set>
#include <random>
#include <vector>

#define SHOW_MATHJAX_MATRIX() 0
#define SHOW_MATHJAX_EQUATIONS() 0
#define SHOW_EQUATIONS_BEFORE_SOLVE() 0
#define EQUALITY_TEST_SAMPLES 1000

typedef int32_t TINT;

TINT CalculateGCD (TINT smaller, TINT larger);
TINT CalculateLCM (TINT smaller, TINT larger);

// A rational number, to handle fractional numbers without typical floating point issues
struct CRationalNumber
{
	CRationalNumber (TINT numerator = 0, TINT denominator = 1)
		: m_numerator(numerator)
		, m_denominator(denominator)
	{ }

	TINT m_numerator;
	TINT m_denominator;

	CRationalNumber Reciprocal () const
	{
		return CRationalNumber(m_denominator, m_numerator);
	}

	void Reduce ()
	{
		if (m_numerator != 0 && m_denominator != 0)
		{
			TINT div = CalculateGCD(m_numerator, m_denominator);
			m_numerator /= div;
			m_denominator /= div;
		}

		if (m_denominator < 0)
		{
			m_numerator *= -1;
			m_denominator *= -1;
		}
		
		if (m_numerator == 0)
			m_denominator = 1;
	}

	bool IsZero () const
	{
		return m_numerator == 0 && m_denominator != 0;
	}

	// NOTE: the functions below assume Reduce() has happened
	bool IsOne () const
	{
		return m_numerator == 1 && m_denominator == 1;
	}

	bool IsMinusOne () const
	{
		return m_numerator == -1 && m_denominator == 1;
	}

	bool IsWholeNumber () const
	{
		return m_denominator == 1;
	}
};

// Define a vector as an array of rational numbers
template<size_t N>
using TVector = std::array<CRationalNumber, N>;

// Define a matrix as an array of vectors
template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

//===================================================================================================================================
//                                              GCD / LCM
//===================================================================================================================================

// from my blog post: https://blog.demofox.org/2015/01/24/programmatically-calculating-gcd-and-lcm/

TINT CalculateGCD (TINT smaller, TINT larger)
{
	// make sure A <= B before starting
	if (larger < smaller)
		std::swap(smaller, larger);

	// loop
	while (1)
	{
		// if the remainder of larger / smaller is 0, they are the same
		// so return smaller as the GCD
		TINT remainder = larger % smaller;
		if (remainder == 0)
			return smaller;

		// otherwise, the new larger number is the old smaller number, and
		// the new smaller number is the remainder
		larger = smaller;
		smaller = remainder;
	}
}

TINT CalculateLCM (TINT A, TINT B)
{
	// LCM(A,B) = (A/GCD(A,B))*B
	return (A / CalculateGCD(A, B))*B;
}

//===================================================================================================================================
//                                              RATIONAL NUMBER MATH
//===================================================================================================================================

void CommonDenominators (CRationalNumber& a, CRationalNumber& b)
{
	TINT lcm = CalculateLCM(a.m_denominator, b.m_denominator);

	a.m_numerator *= lcm / a.m_denominator;
	b.m_numerator *= lcm / b.m_denominator;

	a.m_denominator = lcm;
	b.m_denominator = lcm;
}

bool operator == (const CRationalNumber& a, const CRationalNumber& b)
{
	CRationalNumber _a(a), _b(b);
	CommonDenominators(_a, _b);
	return _a.m_numerator == _b.m_numerator;
}

void operator *= (CRationalNumber& a, const CRationalNumber& b)
{
	a.m_numerator *= b.m_numerator;
	a.m_denominator *= b.m_denominator;
}

CRationalNumber operator * (const CRationalNumber& a, const CRationalNumber& b)
{
	return CRationalNumber(a.m_numerator * b.m_numerator, a.m_denominator * b.m_denominator);
}

void operator -= (CRationalNumber& a, const CRationalNumber& b)
{
	CRationalNumber _b(b);
	CommonDenominators(a, _b);
	a.m_numerator -= _b.m_numerator;
}

//===================================================================================================================================
//                                              GAUSS-JORDAN ELIMINATION CODE
//===================================================================================================================================

// From my blog post: https://blog.demofox.org/2017/04/10/solving-n-equations-and-n-unknowns-the-fine-print-gauss-jordan-elimination/

// Make a specific row have a 1 in the colIndex, and make all other rows have 0 there
template <size_t M, size_t N>
bool MakeRowClaimVariable (TMatrix<M, N>& matrix, size_t rowIndex, size_t colIndex)
{
    // Find a row that has a non zero value in this column and swap it with this row
    {
        // Find a row that has a non zero value
        size_t nonZeroRowIndex = rowIndex;
        while (nonZeroRowIndex < M && matrix[nonZeroRowIndex][colIndex].IsZero())
            ++nonZeroRowIndex;
 
        // If there isn't one, nothing to do
        if (nonZeroRowIndex == M)
            return false;
 
        // Otherwise, swap the row
        if (rowIndex != nonZeroRowIndex)
            std::swap(matrix[rowIndex], matrix[nonZeroRowIndex]);
    }
 
    // Scale this row so that it has a leading one
    CRationalNumber scale = matrix[rowIndex][colIndex].Reciprocal();
	for (size_t normalizeColIndex = colIndex; normalizeColIndex < N; ++normalizeColIndex)
	{
		matrix[rowIndex][normalizeColIndex] *= scale;
		matrix[rowIndex][normalizeColIndex].Reduce();
	}
 
    // Make sure all rows except this one have a zero in this column.
    // Do this by subtracting this row from other rows, multiplied by a multiple that makes the column disappear.
    for (size_t eliminateRowIndex = 0; eliminateRowIndex < M; ++eliminateRowIndex)
    {
        if (eliminateRowIndex == rowIndex)
            continue;
 
        CRationalNumber scale = matrix[eliminateRowIndex][colIndex];
		for (size_t eliminateColIndex = 0; eliminateColIndex < N; ++eliminateColIndex)
		{
			matrix[eliminateRowIndex][eliminateColIndex] -= matrix[rowIndex][eliminateColIndex] * scale;
			matrix[eliminateRowIndex][eliminateColIndex].Reduce();
		}
    }
 
    return true;
}
 
// make matrix into reduced row echelon form
template <size_t M, size_t N>
void GaussJordanElimination (TMatrix<M, N>& matrix)
{
    size_t rowIndex = 0;
    for (size_t colIndex = 0; colIndex < N; ++colIndex)
    {
        if (MakeRowClaimVariable(matrix, rowIndex, colIndex))
        {
            ++rowIndex;
            if (rowIndex == M)
                return;
        }
    }
}

//===================================================================================================================================
//                                                           Shared Testing Code
//===================================================================================================================================

template <size_t M, size_t N, typename LAMBDA>
void PrintEquations (
	TMatrix<M, N>& augmentedMatrix,
	size_t numPixels,
	LAMBDA& pixelIndexToCoordinates
)
{
	char pixelCoords[10];

#if SHOW_MATHJAX_MATRIX()
	// print the matrix opening stuff
	printf("\left[\begin{array}{");
	for (size_t i = 0; i < N; ++i)
	{
		if (i == numPixels)
			printf("|");
		printf("r");
	}
	printf("}n");
	// print the header row
	for (size_t i = 0; i < numPixels; ++i)
	{
		pixelIndexToCoordinates(i, pixelCoords);
		if (i == 0)
			printf("P_{%s}", pixelCoords);
		else
			printf(" & P_{%s}", pixelCoords);
	}
	for (size_t i = numPixels; i < N; ++i)
	{
		printf(" & C_{%zu}", i-numPixels);
	}
	printf(" \\n");

	// Print the matrix
	for (const TVector<N>& row : augmentedMatrix)
	{
		bool first = true;
		for (const CRationalNumber& n : row)
		{
			if (first)
				first = false;
			else
				printf(" & ");

			if (n.IsWholeNumber())
				printf("%i", n.m_numerator);
			else
				printf("\frac{%i}{%i}", n.m_numerator, n.m_denominator);
		}
		printf(" \\n");
	}

	// print the matrix closing stuff
	printf("\end{array}\right]n");
#endif

	// print equations
	for (const TVector<N>& row : augmentedMatrix)
	{
		// indent
		#if SHOW_MATHJAX_EQUATIONS() == 0
			printf("    ");
		#endif

		// left side of the equation
		bool leftHasATerm = false;
		for (size_t i = 0; i < numPixels; ++i)
		{
			if (!row[i].IsZero())
			{
				if (leftHasATerm)
					printf(" + ");
				pixelIndexToCoordinates(i, pixelCoords);

				#if SHOW_MATHJAX_EQUATIONS()
					if (row[i].IsOne())
						printf("P_{%s}", pixelCoords);
					else if (row[i].IsMinusOne())
						printf("-P_{%s}", pixelCoords);
					else if (row[i].IsWholeNumber())
						printf("%iP_{%s}", row[i].m_numerator, pixelCoords);
					else if (row[i].m_numerator == 1)
						printf("P_{%s}/%i", pixelCoords, row[i].m_denominator);
					else
						printf("P_{%s} * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
				#else
					if (row[i].IsOne())
						printf("P%s", pixelCoords);
					else if (row[i].IsMinusOne())
						printf("-P%s", pixelCoords);
					else if (row[i].IsWholeNumber())
						printf("%iP%s", row[i].m_numerator, pixelCoords);
					else if (row[i].m_numerator == 1)
						printf("P%s/%i", pixelCoords, row[i].m_denominator);
					else
						printf("P%s * %i/%i", pixelCoords, row[i].m_numerator, row[i].m_denominator);
				#endif
				leftHasATerm = true;
			}
		}
		if (!leftHasATerm)
			printf("0 = ");
		else
			printf(" = ");

		// right side of the equation
		bool rightHasATerm = false;
		for (size_t i = numPixels; i < N; ++i)
		{
			if (!row[i].IsZero())
			{
				if (rightHasATerm)
					printf(" + ");

				#if SHOW_MATHJAX_EQUATIONS()
					if (row[i].IsOne())
						printf("C_{%zu}", i - numPixels);
					else if (row[i].IsMinusOne())
						printf("-C_{%zu}", i - numPixels);
					else if (row[i].IsWholeNumber())
						printf("%iC_{%zu}", row[i].m_numerator, i - numPixels);
					else if (row[i].m_numerator == 1)
						printf("C_{%zu}/%i", i - numPixels, row[i].m_denominator);
					else
						printf("C_{%zu} * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
				#else
					if (row[i].IsOne())
						printf("C%zu", i - numPixels);
					else if (row[i].IsMinusOne())
						printf("-C%zu", i - numPixels);
					else if (row[i].IsWholeNumber())
						printf("%iC%zu", row[i].m_numerator, i - numPixels);
					else if (row[i].m_numerator == 1)
						printf("C%zu/%i", i - numPixels, row[i].m_denominator);
					else
						printf("C%zu * %i/%i", i - numPixels, row[i].m_numerator, row[i].m_denominator);
				#endif
				rightHasATerm = true;
			}
		}

		#if SHOW_MATHJAX_EQUATIONS()
			printf("\\n");
		#else
			printf("n");
		#endif
	}
}

template <size_t M, size_t N, typename LAMBDA>
bool SolveMatrixAndPrintEquations (
	TMatrix<M, N>& augmentedMatrix,
	size_t numPixels,
	std::unordered_set<size_t>& freeVariables,
	LAMBDA& pixelIndexToCoordinates
)
{
	#if SHOW_EQUATIONS_BEFORE_SOLVE()
	printf("   Initial Equations:n");
	PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);
	printf("   Solved Equations:n");
	#endif

	// put augmented matrix into rref
	GaussJordanElimination(augmentedMatrix);

	// Print equations
	PrintEquations(augmentedMatrix, numPixels, pixelIndexToCoordinates);

	// Get free variables and check for control point constraint
	bool constraintFound = false;
	for (const TVector<N>& row : augmentedMatrix)
	{
		bool leftHasATerm = false;
		for (size_t i = 0; i < numPixels; ++i)
		{
			if (!row[i].IsZero())
			{
				if (leftHasATerm)
					freeVariables.insert(i);
				else
					leftHasATerm = true;
			}
		}

		bool rightHasATerm = false;
		for (size_t i = numPixels; i < N; ++i)
		{
			if (!row[i].IsZero())
				rightHasATerm = true;
		}

		if (!leftHasATerm && rightHasATerm)
			constraintFound = true;
	}

	printf("  %zu free variables.n", freeVariables.size());

	if (constraintFound)
	{
		printf("  Constraint Found.  This configuration doesn't work for the general case!nn");
		return false;
	}

	return true;
}

float lerp (float t, float a, float b)
{
	return a * (1.0f - t) + b * t;
}

template <size_t NUMPIXELS, size_t NUMCONTROLPOINTS, size_t NUMEQUATIONS>
void FillInPixelsAndControlPoints (
	std::array<float, NUMPIXELS>& pixels,
	std::array<float, NUMCONTROLPOINTS>& controlPoints,
	const TMatrix<NUMEQUATIONS, NUMPIXELS+ NUMCONTROLPOINTS>& augmentedMatrix,
	const std::unordered_set<size_t>& freeVariables)
{
	// come up with random values for the control points and free variable pixels
	static std::random_device rd;
	static std::mt19937 mt(rd());
	static std::uniform_real_distribution<float> dist(-10.0f, 10.0f);
	for (float& cp : controlPoints)
		cp = dist(mt);
	for (size_t var : freeVariables)
		pixels[var] = dist(mt);

	// fill in the non free variable pixels per the equations
	for (const TVector<NUMPIXELS + NUMCONTROLPOINTS>& row : augmentedMatrix)
	{
		// the first non zero value is the non free pixel we need to set.
		// all other non zero values are free variables that we previously calculated values for
		bool foundPixel = false;
		size_t pixelIndex = 0;
		for (size_t i = 0; i < NUMPIXELS; ++i)
		{
			if (!row[i].IsZero())
			{
				// we are setting the first pixel we find
				if (!foundPixel)
				{
					pixelIndex = i;
					foundPixel = true;
				}
				// subtract out all free variables which is the same as moving them to the right side of the equation
				else
				{
					pixels[pixelIndex] -= pixels[i] * float(row[i].m_numerator) / float(row[i].m_denominator);
				}
			}
		}

		// if there is no pixel value to set on the left side of the equation, ignore this row
		if (!foundPixel)
			continue;

		// add in the values from the right side of the equation
		for (size_t i = NUMPIXELS; i < NUMPIXELS + NUMCONTROLPOINTS; ++i)
		{
			if (!row[i].IsZero())
				pixels[pixelIndex] += controlPoints[i - NUMPIXELS] * float(row[i].m_numerator) / float(row[i].m_denominator);
		}
	}
}

size_t TextureCoordinateToPixelIndex2d (size_t width, size_t height, size_t y, size_t x)
{
	return y * width + x;
};

void PixelIndexToTextureCoordinate2d (size_t width, size_t height, size_t pixelIndex, size_t& y, size_t& x)
{
	x = pixelIndex % width;
	y = pixelIndex / width;
}

size_t TextureCoordinateToPixelIndex3d (size_t width, size_t height, size_t depth, size_t z, size_t y, size_t x)
{
	return 
		z * width * height + 
		y * width +
		x;
};

void PixelIndexToTextureCoordinate3d (size_t width, size_t height, size_t depth, size_t pixelIndex, size_t& z, size_t& y, size_t& x)
{
	x = pixelIndex % width;

	pixelIndex = pixelIndex / width;

	y = pixelIndex % height;

	pixelIndex = pixelIndex / height;

	z = pixelIndex;
}

void PiecewiseCurveTime (float time, size_t numCurves, size_t& outCurveIndex, float& outTime)
{
	time *= float(numCurves);
	outCurveIndex = size_t(time);

	if (outCurveIndex == numCurves)
	{
		outCurveIndex = numCurves - 1;
		outTime = 1.0f;
	}
	else
	{
		outTime = std::fmodf(time, 1.0f);
	}
}



//===================================================================================================================================
//                                                       2D Textures / Quadratic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For each additional curve, add two points to the end like this:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 = C3                        1
//  P11 + P20 = 2 * C4              3 4
//  P21 = C5                        5
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//  
//  * 3 control points per curve.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * there are 3 equations per curve, so 3 rows in the augmented matrix per curve.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadratic (float totalTime, const std::array<float, N>& coefficients)
{
	const size_t c_numCurves = N / 3;

	float t;
	size_t startCurve;
	PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

	size_t offset = startCurve * 3;

	float s = 1.0f - t;
	return
		coefficients[offset + 0] * s * s +
		coefficients[offset + 1] * s * t * 2.0f +
		coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadratic (float totalTime, const std::array<float, N>& pixels)
{
	const size_t c_numCurves = (N / 2) - 1;

	float t;
	size_t startRow;
	PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

	float row0 = lerp(t, pixels[startRow * 2], pixels[startRow * 2 + 1]);
	float row1 = lerp(t, pixels[(startRow + 1) * 2], pixels[(startRow + 1) * 2 + 1]);
	return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
void Test2DQuadratic ()
{
	const size_t c_imageWidth = 2;
	const size_t c_imageHeight = NUMCURVES + 1;
	const size_t c_numPixels = c_imageWidth * c_imageHeight;
	const size_t c_numControlPoints = NUMCURVES * 3;
	const size_t c_numEquations = NUMCURVES * 3;

	// report values for this test
	printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
	printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

	// lambdas to convert between pixel index and texture coordinates
	auto TextureCoordinateToPixelIndex = [&](size_t y, size_t x) -> size_t
	{
		return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
	};
	auto pixelIndexToCoordinates = [&](size_t pixelIndex, char pixelCoords[10])
	{
		size_t y, x;
		PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
		sprintf(pixelCoords, "%zu%zu", y, x);
	};

	// create the equations
	TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
	for (size_t i = 0; i < c_numEquations; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

		// left side of the equation goes in this yx coordinate pattern:
		//   00 
		//   01 10
		//   11
		// But, curve index is added to the y index.
		// Also, left side coefficients must add up to 1.
		size_t curveIndex = i / 3;
		switch (i % 3)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(curveIndex + 0, 0)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(curveIndex + 0, 1)] = CRationalNumber(1, 2);
				row[TextureCoordinateToPixelIndex(curveIndex + 1, 0)] = CRationalNumber(1, 2);
				break;
			}
			case 2:
			{
				row[TextureCoordinateToPixelIndex(curveIndex + 1, 1)] = CRationalNumber(1, 1);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i] = CRationalNumber(1);
	}

	// solve the matrix if possible and print out the equations
	std::unordered_set<size_t> freeVariables;
	if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
		return;

	// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

	// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate 
	std::array<float, c_numPixels> pixels = { 0 };
	std::array<float, c_numControlPoints> controlPoints = { 0 };
	FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

	// do a number of samples of each method at the same time values, and report the largest difference (error)
	float largestDifference = 0.0f;
	for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
	{
		float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

		float value1 = EvaluateBernsteinPolynomial2DQuadratic(t, controlPoints);
		float value2 = EvaluateLinearInterpolation2DQuadratic(t, pixels);

		largestDifference = std::max(largestDifference, std::abs(value1 - value2));
	}
	printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test2DQuadratics ()
{
	printf("Testing 2D Textures / Quadratic Curvesnn");

	Test2DQuadratic<1>();
	Test2DQuadratic<2>();
	Test2DQuadratic<3>();

	system("pause");
}

//===================================================================================================================================
//                                    2D Textures / Quadratic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P00 P01
//  P10 P11
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  --- For second curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//
//  P00 = C0                        0
//  P01 + P10 = 2 * C1              1 2
//  P11 = C2                        3
//
//  P10 + P21 = 2 * C3              2 5
//  P20 = C4                        4
//
//  --- For third curve, do:
//
//  P00 P01
//  P10 P11
//  P20 P21
//  P30 P31
//
//  P00 = C0
//  P01 + P10 = 2 * C1
//  P11 = C2
//
//  P10 + P21 = 2 * C3
//  P20 = C4
//
//  P21 + P30 = 2 * C5
//  P31 = C6
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//  
//  * control points: 1 + NumCurves*2.
//  * image width it 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + NumCurves*2.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial2DQuadraticC0 (float totalTime, const std::array<float, N>& coefficients)
{
	const size_t c_numCurves = (N - 1) / 2;

	float t;
	size_t startCurve;
	PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

	size_t offset = startCurve * 2;

	float s = 1.0f - t;
	return
		coefficients[offset + 0] * s * s +
		coefficients[offset + 1] * s * t * 2.0f +
		coefficients[offset + 2] * t * t;
}

template <size_t N>
float EvaluateLinearInterpolation2DQuadraticC0 (float totalTime, const std::array<float, N>& pixels)
{
	const size_t c_numCurves = (N / 2) - 1;

	float t;
	size_t startRow;
	PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

	// Note we flip x axis direction every odd row to get the zig zag
	float horizT = (startRow % 2) == 0 ? t : 1.0f - t;

	float row0 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
	++startRow;
	float row1 = lerp(horizT, pixels[startRow * 2], pixels[startRow * 2 + 1]);
	return lerp(t, row0, row1);
}

template <size_t NUMCURVES>
void Test2DQuadraticC0 ()
{
	const size_t c_imageWidth = 2;
	const size_t c_imageHeight = NUMCURVES + 1;
	const size_t c_numPixels = c_imageWidth * c_imageHeight;
	const size_t c_numControlPoints = 1 + NUMCURVES * 2;
	const size_t c_numEquations = 1 + NUMCURVES * 2;

	// report values for this test
	printf("  %zu curves.  %zu control points.  2x%zu texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
	printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

	// lambdas to convert between pixel index and texture coordinates
	auto TextureCoordinateToPixelIndex = [&] (size_t y, size_t x) -> size_t
	{
		return TextureCoordinateToPixelIndex2d(c_imageWidth, c_imageHeight, y, x);
	};
	auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
	{
		size_t y, x;
		PixelIndexToTextureCoordinate2d(c_imageWidth, c_imageHeight, pixelIndex, y, x);
		sprintf(pixelCoords, "%zu%zu", y, x);
	};

	// create the equations
	TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
	for (size_t i = 0; i < c_numEquations; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

		// left side of the equation has a pattern like this:
		//   00
		//   01 10
		//
		// But, pattern index is added to the y index.
		// Also, the x coordinates flip from 0 to 1 on those after each pattern.
		// Also, left side coefficients must add up to 1.

		size_t patternIndex = i / 2;
		size_t xoff = patternIndex % 2 == 1;
		size_t xon = patternIndex % 2 == 0;
		switch (i % 2)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(patternIndex + 0, xoff)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(patternIndex + 0, xon)] = CRationalNumber(1, 2);
				row[TextureCoordinateToPixelIndex(patternIndex + 1, xoff)] = CRationalNumber(1, 2);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i] = CRationalNumber(1);
	}
	
	// solve the matrix if possible and print out the equations
	std::unordered_set<size_t> freeVariables;
	if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
		return;

	// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

	// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate 
	std::array<float, c_numPixels> pixels = { 0 };
	std::array<float, c_numControlPoints> controlPoints = { 0 };
	FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

	// do a number of samples of each method at the same time values, and report the largest difference (error)
	float largestDifference = 0.0f;
	for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
	{
		float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

		float value1 = EvaluateBernsteinPolynomial2DQuadraticC0(t, controlPoints);
		float value2 = EvaluateLinearInterpolation2DQuadraticC0(t, pixels);

		largestDifference = std::max(largestDifference, std::abs(value1 - value2));
	}
	printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test2DQuadraticsC0 ()
{
	printf("nTesting 2D Textures / Quadratic Curves with C0 continuitynn");

	Test2DQuadraticC0<1>();
	Test2DQuadraticC0<2>();
	Test2DQuadraticC0<3>();
	Test2DQuadraticC0<4>();

	system("pause");
}

//===================================================================================================================================
//                                             3D Textures / Cubic Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 5 6
//  P111 = C3                       7
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0                       0
//  P001 + P010 + P100 = 3 * C1     1 2 4
//  P011 + P101 + P110 = 3 * C2     3 7 8
//  P111 = C3                       9
//
//  P010 = C4                       2
//  P011 + P020 + P110 = 3 * C5     3 4 8
//  P021 + P111 + P120 = 3 * C6     5 9 10
//  P121 = C7                       11
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//  
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubic (float totalTime, const std::array<float, N>& coefficients)
{
	const size_t c_numCurves = N / 4;

	float t;
	size_t startCurve;
	PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

	size_t offset = startCurve * 4;

	float s = 1.0f - t;
	return
		coefficients[offset + 0] * s * s * s +
		coefficients[offset + 1] * s * s * t * 3.0f +
		coefficients[offset + 2] * s * t * t * 3.0f +
		coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubic (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
	const size_t c_numCurves = (N / 4) - 1;

	float t;
	size_t startRow;
	PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

	//    rowZYX
	float row00x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
	float row01x = lerp(t, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
	float row0yx = lerp(t, row00x, row01x);

	float row10x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
	float row11x = lerp(t, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
	float row1yx = lerp(t, row10x, row11x);

	return lerp(t, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubic ()
{
	const size_t c_imageWidth = 2;
	const size_t c_imageHeight = NUMCURVES + 1;
	const size_t c_imageDepth = 2;
	const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
	const size_t c_numControlPoints = NUMCURVES * 4;
	const size_t c_numEquations = NUMCURVES * 4;

	// report values for this test
	printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
	printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

	// lambdas to convert between pixel index and texture coordinates
	auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
	{
		return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
	};
	auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
	{
		size_t z, y, x;
		PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
		sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
	};

	// create the equations
	TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
	for (size_t i = 0; i < c_numEquations; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

		// left side of the equation goes in this zyx coordinate pattern:
		//   000 
		//   001 010 100 
		//   011 101 110
		//   111
		// But, curve index is added to the y index.
		// Also, left side coefficients must add up to 1.
		size_t curveIndex = i / 4;
		switch (i % 4)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
				break;
			}
			case 2:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
				break;
			}
			case 3:
			{
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i] = CRationalNumber(1);
	}

	// solve the matrix if possible and print out the equations
	std::unordered_set<size_t> freeVariables;
	if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
		return;

	// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

	// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate 
	std::array<float, c_numPixels> pixels = { 0 };
	std::array<float, c_numControlPoints> controlPoints = { 0 };
	FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

	// do a number of samples of each method at the same time values, and report the largest difference (error)
	float largestDifference = 0.0f;
	for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
	{
		float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

		float value1 = EvaluateBernsteinPolynomial3DCubic(t, controlPoints);
		float value2 = EvaluateLinearInterpolation3DCubic(t, pixels, TextureCoordinateToPixelIndex);

		largestDifference = std::max(largestDifference, std::abs(value1 - value2));
	}
	printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubics ()
{
	printf("nTesting 3D Textures / Cubic Curvesnn");

	Test3DCubic<1>();
	Test3DCubic<2>();
	Test3DCubic<3>();
	Test3DCubic<4>();

	system("pause");
}

//===================================================================================================================================
//                                         3D Textures / Cubic Curves Multiple Curves
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
// This is the same as 3D Textures / Cubic Curves, but there is a second curve stored by flipping x coordinates.
//
//  --- Other details:
//  
//  * control points: 4 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + (NumCurves/2).
//  * equations: 4 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t HALFNUMCURVES>
void Test3DCubicMulti ()
{
	const size_t NUMCURVES = HALFNUMCURVES * 2;
	const size_t c_imageWidth = 2;
	const size_t c_imageHeight = HALFNUMCURVES + 1;
	const size_t c_imageDepth = 2;
	const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
	const size_t c_numControlPoints = NUMCURVES * 4;
	const size_t c_numEquations = NUMCURVES * 4;

	// report values for this test
	printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
	printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

	// lambdas to convert between pixel index and texture coordinates
	auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
	{
		return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
	};
	auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
	{
		size_t z, y, x;
		PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
		sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
	};

	// create the first set of equations
	TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
	for (size_t i = 0; i < c_numEquations / 2; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

		// left side of the equation goes in this zyx coordinate pattern:
		//   000 
		//   001 010 100 
		//   011 101 110
		//   111
		// But, curve index is added to the y index.
		// Also, left side coefficients must add up to 1.
		size_t curveIndex = i / 4;
		switch (i % 4)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
				break;
			}
			case 2:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 3);
				break;
			}
			case 3:
			{
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 1);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i] = CRationalNumber(1);
	}

	// create the second set of equations
	for (size_t i = 0; i < c_numEquations / 2; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i + c_numEquations / 2];

		// left side of the equation goes in this zyx coordinate pattern, which is the same as above but x axis flipped.
		//   001
		//   000 011 101 
		//   010 100 111
		//   110
		// But, curve index is added to the y index.
		// Also, left side coefficients must add up to 1.
		size_t curveIndex = i / 4;
		switch (i % 4)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 1)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 0, 0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 1)] = CRationalNumber(1, 3);
				break;
			}
			case 2:
			{
				row[TextureCoordinateToPixelIndex(0, curveIndex + 1, 0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 0, 0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 1)] = CRationalNumber(1, 3);
				break;
			}
			case 3:
			{
				row[TextureCoordinateToPixelIndex(1, curveIndex + 1, 0)] = CRationalNumber(1, 1);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i + c_numEquations / 2] = CRationalNumber(1);
	}

	// solve the matrix if possible and print out the equations
	std::unordered_set<size_t> freeVariables;
	SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates);
}

void Test3DCubicsMulti ()
{
	printf("nTesting 3D Textures / Cubic Curves with Multiple Curvesnn");

	Test3DCubicMulti<1>();

	system("pause");
}

//===================================================================================================================================
//                                       3D Textures / Cubic Curves With C0 Continuity
//===================================================================================================================================
//
// Find the limitations of this pattern and show equivalence to Bernstein Polynomials (Bezier Curve Equations). Pattern details below.
//
//  --- For first curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//
//  P000 = C0                       
//  P001 + P010 + P100 = 3 * C1     
//  P011 + P101 + P110 = 3 * C2     
//  P111 = C3                       
//
//  --- For second curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//
//  P000 = C0                       
//  P001 + P010 + P100 = 3 * C1     
//  P011 + P101 + P110 = 3 * C2     
//  P111 = C3                       
//                       
//  P011 + P110 + P121 = 3 * C4     
//  P010 + P021 + P110 = 3 * C5     
//  P020 = C6     
//
//  --- For third curve, do:
//
//  P000 P001    P100 P101
//  P010 P011    P110 P111
//  P020 P021    P120 P121
//  P030 P031    P130 P131
//
//  P000 = C0                       
//  P001 + P010 + P100 = 3 * C1     
//  P011 + P101 + P110 = 3 * C2     
//  P111 = C3                       
//                       
//  P011 + P110 + P121 = 3 * C4     
//  P010 + P021 + P110 = 3 * C5     
//  P020 = C6     
//
//  P021 + P030 + P120 = 3 * C7     
//  P031 + P121 + P130 = 3 * C8     
//  P131 = C9   
//
//  and so on...
//  each equation is then multiplied by a value so the right side is identity and left side coefficients add up to 1.
//
//  --- Other details:
//  
//  * control points: 1 + 3 * NumCurves.
//  * image width it 2
//  * image depth is 2
//  * image height is 1 + NumCurves.
//  * equations: 1 + 3 * NumCurves.  This many rows in the augmented matrix.
//  * augmented matrix columns = num pixels (left columns) + num control points (right columns)
//

template <size_t N>
float EvaluateBernsteinPolynomial3DCubicC0 (float totalTime, const std::array<float, N>& coefficients)
{
	const size_t c_numCurves = (N-1) / 3;

	float t;
	size_t startCurve;
	PiecewiseCurveTime(totalTime, c_numCurves, startCurve, t);

	size_t offset = startCurve * 3;

	float s = 1.0f - t;
	return
		coefficients[offset + 0] * s * s * s +
		coefficients[offset + 1] * s * s * t * 3.0f +
		coefficients[offset + 2] * s * t * t * 3.0f +
		coefficients[offset + 3] * t * t * t;
}

template <size_t N, typename LAMBDA>
float EvaluateLinearInterpolation3DCubicC0 (float totalTime, const std::array<float, N>& pixels, LAMBDA& TextureCoordinateToPixelIndex)
{
	const size_t c_numCurves = (N / 4) - 1;

	float t;
	size_t startRow;
	PiecewiseCurveTime(totalTime, c_numCurves, startRow, t);

	// Note we flip x and z axis direction every odd row to get the zig zag

	//    rowZYX
	float xzT = (startRow % 2) == 0 ? t : 1.0f - t;
	float row00x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 0, 1)]);
	float row01x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(0, startRow + 1, 1)]);
	float row0yx = lerp(t, row00x, row01x);

	float row10x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 0, 1)]);
	float row11x = lerp(xzT, pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 0)], pixels[TextureCoordinateToPixelIndex(1, startRow + 1, 1)]);
	float row1yx = lerp(t, row10x, row11x);

	return lerp(xzT, row0yx, row1yx);
}

template <size_t NUMCURVES>
void Test3DCubicC0 ()
{

	const size_t c_imageWidth = 2;
	const size_t c_imageHeight = NUMCURVES + 1;
	const size_t c_imageDepth = 2;
	const size_t c_numPixels = c_imageWidth * c_imageHeight * c_imageDepth;
	const size_t c_numControlPoints = 1 + NUMCURVES * 3;
	const size_t c_numEquations = 1 + NUMCURVES * 3;

	// report values for this test
	printf("  %zu curves.  %zu control points.  2x%zux2 texture = %zu pixels.n", NUMCURVES, c_numControlPoints, c_imageHeight, c_numPixels);
	printf("  %f pixels per curve.  %f pixels per control point.n", float(c_numPixels) / float(NUMCURVES), float(c_numPixels) / float(c_numControlPoints));

	// lambdas to convert between pixel index and texture coordinates
	auto TextureCoordinateToPixelIndex = [&] (size_t z, size_t y, size_t x) -> size_t
	{
		return TextureCoordinateToPixelIndex3d(c_imageWidth, c_imageHeight, c_imageDepth, z, y, x);
	};
	auto pixelIndexToCoordinates = [&] (size_t pixelIndex, char pixelCoords[10])
	{
		size_t z, y, x;
		PixelIndexToTextureCoordinate3d(c_imageWidth, c_imageHeight, c_imageDepth, pixelIndex, z, y, x);
		sprintf(pixelCoords, "%zu%zu%zu", z,y,x);
	};

	// create the equations
	TMatrix<c_numEquations, c_numPixels + c_numControlPoints> augmentedMatrix;
	for (size_t i = 0; i < c_numEquations; ++i)
	{
		TVector<c_numPixels + c_numControlPoints>& row = augmentedMatrix[i];

		// left side of the equation has a pattern like this:
		//   000
		//   001 010 100
		//   011 101 110
		//
		// But, pattern index is added to the y index.
		// Also, the x and z coordinates flip from 0 to 1 on those after each pattern.
		// Also, left side coefficients must add up to 1.
		size_t patternIndex = i / 3;
		size_t xz0 = patternIndex % 2 == 1;
		size_t xz1 = patternIndex % 2 == 0;
		switch (i % 3)
		{
			case 0:
			{
				row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz0)] = CRationalNumber(1, 1);
				break;
			}
			case 1:
			{
				row[TextureCoordinateToPixelIndex(xz0, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz0)] = CRationalNumber(1, 3);
				break;
			}
			case 2:
			{
				row[TextureCoordinateToPixelIndex(xz0, patternIndex + 1, xz1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(xz1, patternIndex + 0, xz1)] = CRationalNumber(1, 3);
				row[TextureCoordinateToPixelIndex(xz1, patternIndex + 1, xz0)] = CRationalNumber(1, 3);
				break;
			}
		}

		// right side of the equation is identity
		row[c_numPixels + i] = CRationalNumber(1);
	}

	// solve the matrix if possible and print out the equations
	std::unordered_set<size_t> freeVariables;
	if (!SolveMatrixAndPrintEquations(augmentedMatrix, c_numPixels, freeVariables, pixelIndexToCoordinates))
		return;

	// Next we need to show equality between the N-linear interpolation of our pixels and bernstein polynomials with our control points as coefficients

	// Fill in random values for our control points and free variable pixels, and fill in the other pixels as the equations dictate 
	std::array<float, c_numPixels> pixels = { 0 };
	std::array<float, c_numControlPoints> controlPoints = { 0 };
	FillInPixelsAndControlPoints<c_numPixels, c_numControlPoints, c_numEquations>(pixels, controlPoints, augmentedMatrix, freeVariables);

	// do a number of samples of each method at the same time values, and report the largest difference (error)
	float largestDifference = 0.0f;
	for (size_t i = 0; i < EQUALITY_TEST_SAMPLES; ++i)
	{
		float t = float(i) / float(EQUALITY_TEST_SAMPLES - 1);

		float value1 = EvaluateBernsteinPolynomial3DCubicC0(t, controlPoints);
		float value2 = EvaluateLinearInterpolation3DCubicC0(t, pixels, TextureCoordinateToPixelIndex);

		largestDifference = std::max(largestDifference, std::abs(value1 - value2));
	}
	printf("  %i Samples, Largest Error = %fnn", EQUALITY_TEST_SAMPLES, largestDifference);
}

void Test3DCubicsC0 ()
{

	printf("nTesting 3D Textures / Cubic Curves with C0 continuitynn");

	Test3DCubicC0<1>();
	Test3DCubicC0<2>();
	Test3DCubicC0<3>();
	Test3DCubicC0<4>();
	Test3DCubicC0<5>();
	Test3DCubicC0<6>();

	system("pause");
}

//===================================================================================================================================
//                                                                 main
//===================================================================================================================================

int main (int agrc, char **argv)
{
	Test2DQuadratics();
	Test2DQuadraticsC0();
	Test3DCubics();
	Test3DCubicsMulti();
	Test3DCubicsC0();

	return 0;
}

A Geometric Interpretation of Neural Networks

In the 90s before I was a professional programmer / game developer I looked at neural networks and found them interesting but got scared off by things like back propagation, which I wasn’t yet ready to understand.

With all the interesting machine learning things going on in modern times, I decided to have a look again and have been pleasantly surprised at how simple they are to understand. If you have knowledge of partial derivatives and gradients (like, if you’ve done any ray marching), you have the knowledge it takes to understand it.

Here are some really great resources I recomend for learning the nuts and bolts of how modern neural networks actually work:
Learn TensorFlow and deep learning, without a Ph.D.
Neural Networks and Deep Learning
A Neural Network Playground (Web Based NN sand box from google)

This post doesn’t require any understanding of neural networks or partial derivatives, so don’t worry if you don’t have that knowledge. The harder math comes up when training a neural network, but we are only going to be dealing with evaluating neural networks, which is much simpler.

A Geometric Interpretation of a Neuron

A neural network is made up layers.

Each layer has some number of neurons in it.

Every neuron is connected to every neuron in the previous and next layer.

Below is a diagram of a neural network, courtesy of wikipedia. Every circle is a neuron.

To calculate the output value of a neuron, you multiply every input into that neuron by a weight for that input, and add a bias. This value is fed into an activation function (more on activation functions shortly) and the result is the output value of that neuron. Here is a diagram for a single neuron:

A more formal definition of a neuron’s output is below. b is the bias, w_j is the j’th weight and i_j is the j’th input.
Output = b+\sum_{j=0}^nw_ji_j

You might notice that the above is really just the dot product of the weight vector and the input vector, with a value added on the end. We could re-write it like that:
Output = Dot(w,i)+b

Interestingly, that is the same equation that you use to find the distance of a point to a plane. Let’s say that we have a plane defined by a unit length normal N and a distance to the origin d, and we want to calculate the distance of a point P to that plane. We’d use this formula:
Distance = Dot(N,P)+d

This would give us a signed distance, where the value will be negative if we are in the negative half space defined by the plane, and positive otherwise.

This equation works if you are working in 3 dimensional space, but also works in general for any N dimensional point and plane definition.

What does this mean? Well, this tells us that every neuron in a neural network is essentially deciding what side of a hyperplane a point is on. Each neuron is doing a linear classification, saying if something is on side A or side B, and giving a distance of how far it is into A or B.

This also means that when you combine multiple neurons into a network, that an output neuron of that neural network tells you whether the input point is inside or outside of some shape, and by how much.

I find this interesting for two reason.

Firstly, it means that a neural network can be interpreted as encoding SHAPES, which means it could be used for modeling shapes. I’m interested in seeing what sort of shapes it’s capable of, and any sorts of behaviors this representation might have. I don’t expect it to be useful for, say, main stream game development (bonus if it is useful!) but at minimum it ought to be an interesting investigation to help understand neural networks a bit better.

Secondly, there is another machine learning algorithm called Support Vector Machines which are also based on being able to tell you which side of a separation a data point is on. However, unlike the above, SVM separations are not limited to plane tests and can use arbitrary shapes for separation. Does this mean that we are leaving something on the table for neural networks? Could we do better than we are to make networks with fewer layers and fewer neurons that do better classification by doing non linear separation tests?

Quick side note: besides classification, neural nets can help us with something called regression, which is where you fit a network to some analog data, instead of the more discrete problem of classification, which tells you what group something belongs to.

It turns out that the activation function of a neuron can have a pretty big influence on what sort of shapes are possible, which makes it so we aren’t strictly limited to shapes made up of planes and lines, and also means we aren’t necessarily leaving things on the table compared to SVM’s.

This all sort of gives me the feeling though that modern neural networks may not be the best possible algorithm for the types of things we use them for. I feel like we may need to generalize them beyond biological limitations to allow things like multiplications between weighted inputs instead of only sums. I feel like that sort of setup will be closer to whatever the real ideal “neural computation” model might be. However, the modern main stream neural models do have the benefit that they are evaluated very efficiently via dot products. They are particularly well suited for execution on GPUs which excel at performing homogenous operations on lots and lots of data. So, a more powerful and more general “neuron” may come at the cost of increased computational costs, which may make them less desirable in the end.

As a quick tangent, here is a paper from 2011 which shows a neural network model which does in fact allow for multiplication between neuron inputs. You then will likely be wanting exponents and other things, so while it’s a step in the right direction perhaps, it doesn’t yet seem to be the end all be all!
Sum-Product Networks: A New Deep Architecture

It’s also worth while to note that there are other flavors of neural networks, such as convolutional neural networks, which work quite a bit differently.

Let’s investigate this geometric interpretation of neurons as binary classifiers a bit, focusing on some different activation functions!

Step Activation Function

The Heaviside step function is very simple. If you give it a value greater than zero, it returns a 1, else it returns a 0. That makes our neuron just spit out binary: either a 0 or a 1. The output of a neuron using the step activation function is just the below:

Output = Dot(w,i)+b > 0

The output of a neuron using the step activation function is true if the input point is in the positive half space of the plane that this neuron describes, else it returns false.

Let’s think about this in 2d. Let’s make a neural network that takes x and y as input and spits out a value. We can make an image that visualizes the range from (-1,-1) to (1,1). Negative values can be shown in blue, zero in white, and positive values in orange.

To start out, we’ll make a 2d plane (aka a line) that runs vertically and passes through the origin. That means it is a 2d plane with a normal of (1,0) and a distance from the origin of 0. In other words, our network will have a single neuron that has weights of (1,0) and a bias of 0. This is what the visualization looks like:

You can actually play around with the experiments of this post and do your own using an interactive visualization I made for this post. Click here to see this first experiment: Experiment: Vertical Seperation

We can change the normal (weights) to change the angle of the line, and we can also change the bias to move the line to it’s relative left or right. Here is the same network that has it’s weights adjusted to (1,1) with a bias of 0.1.

Experiment: Diagonal Separation

The normal (1,1) isn’t normalized though, which makes it so the distance from origin (aka the bias) isn’t really 0.1 units. The distance from origin is actually divided by the length of the normal to get the REAL distance to origin, so in the above image, where the normal is a bit more than 1.0, the line is actually less than 0.1 units from the origin.

Below is the visualization if we normalize the weights to (0.707,0.707) but leave the bias at 0.1 units. The result is that the line is actually 0.1 units away from the origin.

Experiment: Normalized Diagonal Separation

Recalling the description of our visualization, white is where the network outputs zero, while orange is where the network outputs a positive number (which is 1 in this case).

If we define three lines such that their negative half spaces don’t completely overlap, we can get a triangle where the network outputs a zero, while everywhere else it outputs a positive value. We do this by having three sibling neurons in the first layer which define three separate lines, and then in the output neuron we give them all a weight of 1. This makes it so the area outside the triangle is always a positive value, which step turns into 1, but inside the triangle, the value remains at 0.


Experiment: Negative Space Triangle

We can turn this negative space triangle into a positive space triangle however by making the output neuron have a weight on the inputs of -1, and adding a bias of 0.1. This makes it so that pixels in the positive space of any of the lines will become a negative value. The negative space of those three lines get a small bias to make it be a positive value, resulting in the step function making the values be 0 outside of the triangle, and 1 inside the triangle. This gives us a positive space triangle:


Experiment: Positive Space Triangle

Taking this a bit further, we can make the first layer define 6 lines, which make up two different triangles – a bigger one and a smaller one. We can then have a second layer which has two neurons – one which makes a positive space larger triangle, and one which makes a positive space smaller triangle. Then, in the output neuron we can give the larger triangle neuron a weight of 1, while giving the smaller triangle neuron a weight of -1. The end result is that we have subtracted the smaller triangle from the larger one:


Experiment: Triangle Cutout

Using the step function we have so far been limited to line based shapes. This has been due to the fact that we can only test our inputs against lines. There is a way around this though: Pass non linear input into the network!

Below is a circle with radius 0.5. The neural network has only a single input which is sqrt(x*x+y*y). The output neuron has a bias of -0.5 and a weight of 1. It’s the bias value which controls the radius of the circle.

You could pass other non linear inputs into the network to get a whole host of other shapes. You could pass sin(x) as an input for example, or x squared.


Experiment: Circle

While the step function is inherently very much limited to linear tests, you can still do quite a lot of interesting non linear shapes (and data separations) by passing non linear input into the network.

Unfortunately though, you as a human would have to know the right non linear inputs to provide. The network can’t learn how to make optimal non linear separations when using the step function. That’s quite a limitation, but as I understand it, that’s how it works with support vector machines as well: a human can define non linear separations, but the human must decide the details of that separation.

BTW it seems like there could be a fun puzzle game here. Something like you have a fixed number of neurons that you can arrange in however many layers you want, and your goal is to separate blue data points from orange ones, or something like that. If you think that’d be a fun game, go make it with my blessing! I don’t have time to pursue it, so have at it (:

Identity and Relu Activation Functions

The identity activation function doesn’t do anything. It’s the same as if no activation function is used. I’ve heard that it can be useful in regression, but it can also be useful for our geometric interpretation.

Below is the same circle experiment as before, but using the identity activation function instead of the step activation function:


Experiment: Identity Circle

Remembering that orange is positive, blue is negative, and white is zero, you can see that outside the circle is orange (positive) and inside the circle is blue (negative), while the outline of the circle itself is white. We are looking at a signed distance field of the circle! Every point on this image is a scalar value that says how far inside or outside that point is from the surface of the shape.

Signed distance fields are a popular way of rendering vector graphics on the GPU. They are often approximated by storing the distance field in a texture and sampling that texture at runtime. Storing them in a texture only requires a single color channel for storage, and as you zoom in to the shape, they preserve their shape a lot better than regular images. You can read more about SDF textures on my post: Distance Field Textures.

Considering the machine learning perspective, having a signed distance field is also an interesting proposition. It would allow you to do classification of input, but also let you know how deeply that input point is classified within it’s group. This could be a confidence level maybe, or could be interpreted in some other way, but it gives a sort of analog value to classification, which definitely seems like it could come in handy sometimes.

If we take our negative space triangle example from the last section and change it from using step activation to identity activation, we find that our technique doesn’t generalize naively though, as we see below. (It doesn’t generalize for the positive space triangle either)


Experiment: Negative Space Triangle Identity

The reason it doesn’t generalize is that the negatives and positives of pixel distances to each of the lines cancel out. Consider a pixel on the edge of the triangle: you are going to have a near zero value for the edge it’s on, and two larger magnitude negative values from the other edges it is in the negative half spaces of. Adding those all together is going to be a negative value.

To help sort this out we can use an activation function called “relu”, which returns 0 if the value it’s given is less than zero, otherwise it returns the value. This means that all our negative values become 0 and don’t affect the distance summation. If we switch all the neurons to using relu activation, we get this:


Experiment: Negative Space Triangle Relu

If you squint a bit, you can see a triangle in the white. If you open the experiment and turn on “discrete output” to force 0 to blue you get a nice image that shows you that the triangle is in fact still there.

Our result with relu is better than identity, but there are two problems with our resulting distance field.

Firstly it isn’t a signed distance field – there is no blue as you might notice. It only gives positive distances, for pixels that are outside the shape. This isn’t actually that big of an issue from a rendering perspective, as unsigned distance fields are still useful. It also doesn’t seem that big of an issue from a machine learning perspective, as it still gives some information about how deeply something is classified, even though it is only from one direction.

I think with some clever operations, you could probably create the internal negative signed distance using different operations, and then could compose it with the external positive distance in the output neuron by adding them together.

The second problem is a bigger deal though: The distance field is no longer accurate!

By summing the distance values, the distance is incorrect for points where the closest feature of the triangle is a vertex, because multiple lines are contributing their distance to the final value.

I can’t think of any good ways to correct that problem in the context of a neural network, but the distance is an approximation, and is correct for the edges, and also gets more correct the closer you get to the object, so this is still useful for rendering, and probably still useful for machine learning despite it being an imperfect measurement.

Sigmoid and Hyperbolic Tangent Activation Function

The sigmoid function is basically a softer version of the step function and gives output between 0 and 1. The hyperbolic tangent activation function is also a softer version of the step function but gives output between -1 and 1.

Sigmoid:

Hyperbolic Tangent:

(images from Wolfram Mathworld)

They have different uses in machine learning, but I’ve found them to be visibly indistinguishable in my experiments after compensating for the different range of output values. It makes me think that smoothstep could probably be a decent activation function too, so long as your input was in the 0 to 1 range (maybe you could clamp input to 0 and 1?).

These activation functions let you get some non linearity in your neural network in a way that the learning algorithms can adjust, which is pretty neat. That puts us more into the realm where a neural net can find a good non linear separation for learned data. For the geometric perspective, this also lets us make more interesting non linear shapes.

Unfortunately, I haven’t been able to get a good understanding of how to use these functions to craft desired results.

It seems like if you add even numbers of hyperbolic tangents together in a neural network that you end up getting a lot of white, like below:


Experiment: tanh1

However, if you add an odd number of them together, it starts to look a bit more interesting, like this:


Experiment: tanh2

Other than that, it’s been difficult seeing a pattern that I can use to craft things. The two examples above were made by modifying the negative space triangle to use tanh instead of step.

Closing

We’ve wandered a bit in the idea of interpreting neural networks geometrically but I feel like we’ve only barely scratched the surface. This also hasn’t been a very rigorous exploration, but more has just been about exploring the problem space to get a feeling for what might be possible.

It would be interesting to look more deeply into some of these areas, particularly for the case of distance field generation of shapes, or combining activation functions to get more diverse results.

While stumbling around, it seems like we may have gained some intuition about how neural networks work as well.

It feels like whenever you add a layer, you are adding the ability for a “logical operation” to happen within the network.

For instance, in the triangle cutout experiment, the first layer after the inputs defines the 6 individual lines of the two triangles and classifies input accordingly. The next layer combines those values into the two different triangle shapes. The layer after that converts them from negative space triangles to positive space triangles. Lastly, the output layer subtracts the smaller triangle’s values from the larger triangle’s values to make the final triangle outline shape.

Each layer has a logical operation it performs, which is based on the steps previous to it.

Another piece of intuition I’ve found is that it seems like adding more neurons to a layer allows it to do more work in parallel.

For instance, in the triangle cutout experiment, we created those two triangles in parallel, reserving some neurons in each layer for each triangle. It was only in the final step that we combined the values into a single output.

Since neurons can only access data from the previous network layer, it seems as though adding more neurons to layers can help push data forward from previous layers, to later layers that might need the data. But, it seems like it is most efficient to process input data as early as possible, so that you don’t have to shuttle it forward and waste layers / neurons / memory and computing power.

Here is some info on other activation functions:
Wikipedia:Activation Function

Here’s a link that talks about how perceptrons (step activated neural networks) relate to SVMs:
Hyperplane based Classification: Perceptron and (Intro to) Support Vector Machines

By the way, did I mention you can visualize neural networks in three dimensions as well?

Experiment: 3d Visualization

Here are the two visualizers of neural networks I made for this post using WebGL2:
Neural Network Visualization 2D
Neural Network Visualization 3D

If you play around with this stuff and find anything interesting, please share!

Evaluating Points on Analytical Surfaces and in Analytical Volumes Using the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension shows how to use the technique to evaluate points on surfaces and inside of volumes, where those surfaces and volumes are defined either by Bezier curves or polynomials (Tensor products of polynomials to be more specific).

As an example of what this post will allow you to do:

  • By taking a single sample of a 3d RGBA volume texture, you’ll be able to get a bicubic interpolated value (a bicubic surface).
  • Alternately, taking a single sample of a 3d RGBA volume texture will allow you to get a linear interpolation between two biquadratic surfaces (a linear/biquadratic volume).
  • This post also covers how to extend this to higher degree surfaces and volumes.

Here are two images generated by the WebGL2 demos I made for this post which utilize this technique for rendering surfaces, fog volumes, and solid volumes. (link to the demos at bottom of post!)

All textures are size 2 on each axis which makes it a cache friendly technique (you can grow the texture sizes for piecewise curves/surfaces/volumes though). It leverages the hardware interpolation which makes it a relatively computationally inexpensive technique, and it supports all polynomials within the limitations of floating point math, so is also very flexible and expressive. You could even extend this to rational polynomial surfaces and volumes which among other things would allow perfect representations of conic sections.

The animated Bezier curve images in this post came from wikipedia. Go have a look and drop them a few bucks if you find wikipedia useful!
Wikipedia: Bézier curve

Curves

If you’ve read my curve paper and understand the basics you can skip this section and go onto the section “Before Going Into Surfaces”.

Let’s talk about how to store curves of various degrees in textures and evaluate points on them using the GPU Texture sampler. We’ll need this info when we are working with surfaces and volumes because a higher degree curve is dual to a section of lower degree surface or an even lower degree volume.

The three ways we’ll be talking about controlling the order of curves are:

  1. Texture Dimensionality – 1d texture vs 2d texture vs 3d texture vs 4d texture.
  2. Number of Color Channels – How many color channels are used? R? RG? RGB? RGBA?
  3. Multiple Texture Samples – Doing multiple texture reads.

Texture Dimensionality

By texture dimensionality I mean how many dimensions the texture has. In all cases, the size of the texture is going to be 2 on each axis.

Starting with a 1d texture, we have a single texture coordinate (u) to sample along. As we change the u value from 0 to 1, we are just linearly interpolating between the two values. A 1d texture that has 2 pixels in it can store a degree 1 curve, also known as a linear Bezier curve. With linear texture sampling, the GPU hardware will do this linear interpolation for you.

The equation for linear interpolation between two values A and B which are at t=0 and t=1 respectively is:
A*(1-t) + B*t

Here’s the 1d texture:

Here’s a linear curve:

Going to a 2d texture it gets more interesting. We now have two texture coordinates to sample along (u,v). Using linear sampling, the hardware will do bilinear interpolation (linear interpolation across each axis) to get the value at a specific (u,v) texture coordinate.

Here is the equation for bilinear interpolation between 4 values A,B,C,D which are at texture coordinates (0,0), (1,0), (0,1), (1,1) respectively, being sampled at (u,v):

(A*(1-u) + B*u)*(1-v) + (C*(1-u) + D*u) * v

That equation interpolates from A to B by u (x axis), and from C to D by u (x axis), and then interpolates from the first result to the second by v (y axis). Note that it doesn’t actually matter which axis is interpolated by first. An equivelant equation would be one that interpolates from A to B by v (y axis) and from B to C by v (y axis) and then between those results by u (x axis).

With that equation, something interesting starts to happen if you use the same value (t) for u and v, expand and simplify, and end up at this equation:

A*(1-t)^2 + (B+C)*(1-t)t + Dt^2

That equation is very close to the quadratic Bezier formula, which is below:

A*(1-t)^2 + B*2(1-t)t + Ct^2

To get to that equation, we just make B and C the same value (B), and rename D to C since that letter is unused. This tells us how we need to set up our 2d texture such that when we sample along the diagonal, we get the correct point on our quadratic Bezier curve:

Here’s a quadratic Bezier curve in action. You can see how it is a linear interpolation between two linear interpolations, just like taking a bilinearly interpolated sample on our texture is.

Taking this to a 3d texture, we now have three texture coordinates to sample along (u,v,w). Again, with linear sampling turned on, the hardware will do trilinear interpolation to get the value at a specific (u,v,w) texture coordinate.

If we follow the same process as the 2d texture, we will wind up with the equation for a cubic Bezier curve:

A*(1-t)^3 + B*3(1-t)^2t + C*3(1-t)t^2 + Dt^3

Here’s how the texture is laid out:

Here’s a cubic Bezier curve in action, where you can see 3 levels of linear interpolations, just like how trilinear interpolation works:

While I have never used a 4d texture it appears that directx supports them and there looks to be an OpenGL extension to support them as well.

If we took this to a 4d texture, we would end up with the equation for a quartic curve. If you have trouble visualizing what a 4d texture even looks like, you aren’t alone. You have four texture coordinates to sample along (u,v,w,t). When you sample it, there are two 3d volume textures that are sampled at (u,v,w), resulting in two values as a result. These values are then interpolated by t to give you the final value. A fourth dimensional texture lookup is just an interpolation between 2 three dimensional texture lookups. That is true of all dimensional texture lookups in fact. An N dimensional texture lookup is just the linear interpolation between two N-1 dimensional texture lookups. For example, a three dimensional texture lookup is just an interpolation between 2 two dimensional texture lookups. This “hierchical interpolation” is the link I noticed between texture interpolation and the De Casteljau algorithm, since that is also a hierchical interpolation algorithm, just with fewer values interpolated between.

Here’s how the 4d texture is laid out:

Here’s the quartic Bezier equation, which is what you get the answer to if you sample a 4d texture at (t,t,t,t):

A*(1-t)^4 + B*4(1-t)^3t + C*6(1-t)^2t^2 + D*4(1-t)t^3 + Et^4

Here’s a quartic Bezier curve in action, showing 4 levels of linear interpolation, just like how quadrilinear interpolation works with 4d textures:

So, the bottom line of this section is that if we sample along the diagonal of an N dimensional texture which has one color channel, we will get points on a degree N curve.

Number of Color Channels

Another way we can control the degree of a curve stored in a texture is by the number of color channels that are stored in the texture.

In the section above we showed a 1d texture that stored a linear curve. it had only one color channel:

Let’s add another color channel. A,B will be stored in the red channel, and B,C will be stored in the green channel:

When we read that texture at location (t), we will get the following values:

  1. R: The linear interpolation between A and B at time t.
  2. G: The linear interpolation between B and C at time t.

Now, if we just lerp between R and G in our shader, for time t, we will get the point at time t, on the cubic Bezier curve defined by control points A,B,C.

Pretty cool right?

What happens if we add another color channel, blue?

Well, when we sample the texture at time t, we get the following values:

  1. R: The linear interpolation between A and B at time t.
  2. G: The linear interpolation between B and C at time t.
  3. B: The linear interpolation between C and D at time t.

We can combine these values using the quadratic Bezier curve formula, as if these were each a control point:

R*(1-t)^2 + G*2(1-t)t + Bt^2

The result we get is a point on the CUBIC curve defined by the four control points A,B,C,D.

In the previous section, it took a 3d volume texture to calculate a cubic curve. In this section we were able to do it with a 1d RGB texture, but it came at the cost of of having to do some calculation in the shader code after sampling the texture to combine the color channels and get the final result.

How exactly does adding a color channel affect the degree though? Each color channel added increases the degree by 1.

You can see this is true by seeing in the last section how a 3 dimensional texture can evaluate a cubic, and a 4 dimensional texture can evalaute a quartic, but the 4th dimensional texture was just two 3 dimensional textures. Adding a second color channel just doubles the size of your data (and adding two tripples, and adding three quadruples), so having a 3d volume texture that has two color channels is the same as having a 4d volume texture with a single color channel. In both cases, you are just interpolating between two 3d texture samples.

So, for every color channel we add, we add a degree.

Multiple Texture Samples

Multiple texture samples is the last way to control curve degree that we are going to talk about.

Taking extra texture samples is a lot like adding color channels.

If you have a 1d RGB texture, you get a result of 3 lerps – R,G,B – which you can use to calculate a cubic curve point (order 3). If you take a second sample, you get R0,G0,B0,R1,G1,B1 which is a result of 6 lerps, which gives you a point on a sextic curve (order 6).

If you have a 2d RGBA texture, you get the result of 4 quadratic interpolations – R,G,B,A – which gives you an order 5 curve point. Taking another texture read gives you 8 quadratic interpolation results, which you can put together to make an order 9 curve point. Taking a third texture read would get you up to order 13.

Just like adding color channels, taking extra texture samples requires you to combine the multiple results in your shader, which increases computational cost.

Besides that, you are also doing more texture reads, which can be another source of performance loss. The textures are small (up to 2x2x2x2) so are texture cache friendly, but if you have multiple textures, it could start to add up I’m sure.

IMO this option should be avoided in favor of the others, when possible.

Before Going Into Surfaces

Before we start on surfaces, I want to mention a few things.

Even though we’ve been talking about Bezier curves specifically, a previous post explained how to convert any polynomial from power basis form into Bernstein basis form (aka you can turn any polynomial into a Bezier curve that is exactly equivelant). So, this generalizes to polynomials, and even rational polynomials if you do division in your shader code, but I’ll point you towards that post for more information on that: Evaluating Polynomials with the GPU Texture Sampler.

You can also extend the above for piecewise curves easily enough. You just set up a different curve (or surface or volume, as we describe below) for different ranges of your parameter space values. From time 0 to 1, you may use one texture, and from time 1 to 2, you may use another. Better yet, you would store both curves in a single texture, and just make the texture be a little larger, instead of having two separate textures.

Also, many other types of curves – B-splines, nurbs – can be broken down exactly into piecewise Bezier curves (rational, if the source curve is rational). Check these links for more info:
Algorithms for B-Spline Curves
Wikipedia: De Boor’s Algorithm.

Surfaces

Finally onto surfaces!

I’m going to show how to extend the curve calculation technique to calculating points on Bezier rectangles. A Bezier rectangle is a rectangular surface which has one or more bezier curves across the X axis and one or more bezier curves across the Y axis. The degree of the curve on each axis doesn’t need to match so it could be quadratic on one axis and cubic on the other as an example.

To actually evaluate a point on the surface at location (u,v), you evaluate a point on each x axis curve for time u, and then you use those resulting values as control points in another curve that you evaluate at time v.

Just like linear interpolation, it doesn’t matter which axis you evaluate first for a Bezier rectangle surface so you could switch the order of the axis evaluation if you want to.

The image above shows a bicubic surface, the blue lines show the x axis cubic bezier curves, while the yellow lines show the y axis cubic bezier curves. Those lines are called “isolines” or “isocurves”. The 16 control points are shown in magenta.

Another name for a Bezier rectangle is a tensor product surface. This is a more generalized term as it isn’t limited to Bezier curves.

Note: there is another type of Bezier surface called a Bezier Triangle but I haven’t worked much with them so can’t say if any of these techniques work with them or not. It would be interesting to explore how these techniques apply to Bezier triangles, if at all.

Hopefully it should come as no surprise that a 2d texture using regular bilinear interpolation is in fact a Bezier rectangle which is linear on the x axis and linear on the y axis. It has a degree of (1,1) and is stored in a 2d texture (2×2 pixels), where the four control points are just stored in the four pixels. You just sample the texture at (u,v) to get that point on the surface. Pretty simple stuff.

Order (1,1) Bezier Rectangle:

Something interesting to note is that while the isolines (edges) of the rectangle are linear, the surface itself is curved. In fact, we know that the diagonal of this surface is in fact a quadratic Bezier curve because we calculate curves by sampling along the diagonal! (if the middle corners are different, it’s the same as if they were both replaced with the average of their values).

There are other ways to store this Degree (1,1) surface in a texture besides how i described. You could also have a 1 dimensional texture with two color channels, where you sample it along the u axis, and then interpolate your R and G values, using the v axis value. This would come at the cost of doing a lerp in the shader code, instead of having the texture sampler hardware do it for you.

Now that the simplest case is out of the way, how about the next simplest? What if we want a surface where we linearly interpolate between two quadratic curves? That is, what if we want to make a degree (2,1) Bezier rectangle?

Order (2,1) Bezier Rectangle:

Well if you think about it geometrically, we can store a quadratic curve in a 2d texture (2×2) with a single color channel. To linearly interpolate between two of those, we need two of those to interpolate between. So, we need a 3d texture, since that is just an interpolation between two 2d textures.

When we sample that texture, we use the coordinates (u,u,v). That will make it quadratic in u, but linear in v.

Stepping up the complexity again, what if we wanted to make a biquadratic surface – aka degree (2,2)?

Order (2,2) Bezier Rectangle:

Well, to make a quadratic curve we need 3 control points, so for a biquadratic surface we need 3 quadratic curves to quadratically interpolate between.

One way to do this would be with a 4d texture, sampling along (u,u,v,v) to make it quadratic in both u and v.

But, because 4d textures are kind of exotic and may not be supported, we can achieve this by instead having a 3d texture with two color channels: R,G.

When we sample that texture, we sample at (u,u,v) to get two values: R,G. Next we linearly interpolate from R to G using v. This makes us quadratic in both u and v.

There are other ways to encode this surface as well, but i’ll leave that to you to think about if you want to (:

Lastly, what if we wanted a bicubic surface? A cubic curve has 4 control points, so we need 4 cubic curves to cubically interpolate between to make our final surface.

Order (3,3) Bezier Rectangle:

Thinking back to the first section, a 3d texture can evaluate a cubic curve. Since we need four cubic curves, let’s just use all four color channels RGBA. We would sample our texture at (u,u,u) to get four cubic curves in RGBA and then would use the cubic Bezier formula to combine those four values using v into our final result.

Surfaces Generalized

Generalizing surface calculations a bit, there are basically two steps.

First is you need to figure out what your requirements for the x axis is as far as texture storage for the desired degree you want. From there, you figure out what degree you want on your y axis, and that degree is what you multiply the x axis texture storage requirements for.

It can be a little bit like tetris trying to figure out how to fit various degree surfaces into various texture sizes and layouts, but it gets easier with a little practice.

It’s also important to remember that the x axis being the first axis is by convention only. It could easily be the y axis that defines the texture storage requirements, and is multiplied by the degree of the x axis.

Volumes

Volumes aren’t a whole lot more complex than surfaces, but they are a lot hungrier for texture space and linear interpolations!

Extending the generalization of surfaces, you once again figure out requirements for the x axis, multiply those by the degree of the y axis, and then multiply that result by the degree of the z axis.

The simplest case for volumes is the trilinear case, aka the Degree (1,1,1) Bezier rectangle.

Order (1,1,1) Bezier Cube:

It’s a bit difficult to understand what’s going on in that picture by seeing the data as just fog density, so the demos let you specify a surface threshold such that if the fog is denser than that amount, it shows it as a surface. Here is the same trilinear Bezier volume with a surface threshold.

Order (1,1,1) Bezier Cube:

You just store your 8 values in the 8 corners of the 2x2x2 texture cube, and sample at (u,v,w) to get your trilinear result.

The next simplest case is that you want to quadratically interpolate between two linear surfaces – a Degree (1,1,2) Bezier rectangle.

Order (1,1,2) Bezier Cube:

To do this, you need 3 bilinear surfaces to interpolate between.

One way to do this would be to have a 2d Texture with R,G,B color channels. Sample the texture at (u,v), then quadratically interpolate R,G,B using w.

Another way to do this would be to have a 3d texture with R and G channels. When sampling, you sample the 3d texture at (u,v,w) to get your R and G results. You then linearly interpolate from R to G by w to get the final value.

Yet another way to do this would be to use a 4d texture if you have support for it, and sample along (u,v,w,w) to get your curve point using only hardware interpolation.

The next simplest volume type is a linear interpolation between two biquadratic surfaces – a Degree (2,2,1) Bezier rectangle.

Order (2,2,1) Bezier Cube:

From the surfaces section, we saw we could store a biquadratic surface in a 3d texture using two color channels R,G. After sampling at (u,u,v) you interpolate from R to G by v.

To make a volume that linearly interpolates between two biquadratic surfaces, we need two biquadratic surfaces, so need to double the storage we had before.

We can use a 3d texture with 4 color channels to make this happen by storing the first biquadratic in R,G and the second in B,A, sampling this texture at (u,u,v). Next, we interpolate between R and G by v, and also interpolate between B and A by v. Lastly, we linearly interpolate between those two results using w.

The next higher surface would be a triquadratic volume, which is degree (2,2,2). Since you can store a biquadratic surface in a 3d 2x2x2 texture with two color channels, and a triquadratic volume needs 3 of those, we need a 3d texture 6 color channels. Since that doesn’t exist, we could do something like store 2 of the quadratic surfaces in a 2x2x2 RGBA texture, and the other quadratic surface in a 2x2x2 RG texture. We would take two texture samples and combine the 6 results into our final value.

Tricubic is actually pretty simple to conceptualize luckily. We know that we can store a bicubic surface in a 3d 2x2x2 RGBA texture. We also know that we would need 4 of those if we want to make a tricubic volume. So, we could do 4 texture reads (one for each of our bicubic surfaces) and then combine those 4 samples across w to get our final volume value.

Closing

Hopefully you were able to follow along and see that this stuff is potentially pretty powerful.

Some profiling needs to be done to better understand the performance characteristics of using the texture sampler in this way, versus other methods of curve, surface and volume calculation. I have heard that even when your texture samples are in the texture cache, that it can still take like ~100 cycles to get the information back on a texture read. That means that this is probably not going to be as fast as using shader instructions to calculate the points on the curve. However, if you are compute bound and can offload some work to the texture sampler, or if you are already using a texture to store 1d/2d/3d data (or beyond) that you can aproximate with this technique, that you will have a net win.

One thing I really like about this is that it makes use of non programmable hardware to do useful work. It feels like if you were compute bound, that you could offload some work to the texture sampler if you had some polynomials to evaluate (or surfaces/volumes to sample), and get some perf back.

I also think this could possibly be an interesting way to make concise representations (and evaluation) of non polygonal models. I imagine it would have to be piecewise to make things that look like real world objects, but you do have quite a bit of control with Bezier curves, surfaces and volumes, especially if you use rational ones by doing a divide in your shader.

Here’s a few specifica areas I think this technique could help out with:

  • Higher order texture interpolation with fewer samples – You’d have to preprocess textures and would spend more memory on them, but it may be worth while in some situations for higher quality results with a single texture read.
  • 2D signed distance field rendering – SDF textures are great for making pseudo vector art. They do break down in some cases and at some magnification levels though. It would be interesting to see if using this technique could improve things either with higher order interpolation, or maybe by encoding (signed) distances differently. Possibly also just useful for describing 2d vector art in a polynomial form?
  • 3d signed distance field rendering – Ray marching can make use of signed distance fields to render 3d objects. It can also make use of functions which can only give you inside or outside status based on a point. It would be interesting to explore encoding and decoding both of these types of functions within textures using this technique, to sample shapes during ray marching.

If you are interested in the above, or curious to learn more, here are some good links!

2D Catmull-Rom in 4 samples
Distance Field Textures
Inigo Quilez: raymarching distance fields

If you have any questions, corrections, feedback, ideas for extensions, etc please let me know! You can leave a comment below, or contact me on twitter at @Atrix256.

Feedback / Ideas

@anders_breakin had some ideas that could possibly pan out:

  1. The derivative of a Bezier curve is another Bezier curve (Derivatives of a Bézier Curve). You could encode the derivative curve(s) in a texture and use that to get the normal instead of using the central differences method. That might give higher quality normals, but should also decrease the number of texture reads needed to get the normal.
  2. If you want more accuracy, you may subdivide the curve into more numerous piecewise curves. The texture interpolator only has 8 bits of decimal precision (X.8 fixed point) when interpolating, but if you give it less of the curve/surface/volume to interpolate over at a time, it seems like that would result in more effective precision.

@Vector_GL suggested reading the values in the vertex shader and using the results in the pixel shader. I think something like this could work where you read the control points in the VS, and pass them to the PS, which would then be able to ray march the tensor product surface by evaluating it without texture reads. So long as you have fewer VS instances than PS instances (the triangles are not subpixel!) that this could be an interesting thing to try. It doesn’t take advantage of the texture interpolator, but maybe there would be a way to combine the techniques. If not, this still seems very pragmatic.

I was thinking maybe this could be done via “rasterization” by drawing a bunch of unit cubes and having the PS do the ray marching. With some careful planning, you could probably use Z-testing on this too, to quickly cull hidden pixels without having to ray march them.

Demos

Here are the WebGL2 demos:
Analytical Surfaces Evaluated by the GPU Texture Sampler
Analytical Volume Evaluated by the GPU Texture Sampler

Failed Experiment: The GPU Texture Sampler is Turing Complete But That Fact is Pretty Useless

While it’s true that the GPU texture sampler can evaluate digital logic circuits, it turns out there’s a much better and simpler way to evaluate logic with textures. That better and simpler way isn’t even that useful unfortunately!

This post will show the path I took from the initially intriguing possibilities to the more mundane final answer. You may be able to see mistakes in my reasoning along the way, or be able to get to the punch line sooner (:

This was meant to be an extension to a paper I wrote talking about how you can evaluate Bezier curves by storing only the control points in a texture and then sampling along the texture diagonal:
GPU Texture Sampler Bezier Curve Evaluation

The ideas from this post started with a tweet from @marcosalvi:

Because the last post showed how to evaluate arbitrary polynomials using the texture sampler, and digital circuits can be described as as polynomials in Algebraic Normal Form (ANF), that means we can use the texture sampler to evaluate digital logic circuits. Let’s check it out!

First up, we need to be able to convert logic into ANF. Oddly enough, I already have a post about how to do that, with working C++ source code, so go check it out: Turning a Truth Table Into A digital Circuit (ANF).

As an example, let’s work with a circuit that takes 3 input bits, and adds them together to make a 2 bit result. We’ll need one ANF expression per output bit. O_0 will be the 1’s place output bit (least significant bit), and O_1 will be the 2’s place output bit (most significant bit). Our 3 input bits will be u,v,w.

O_0 = u \oplus v \oplus w
O_1 = uv \oplus vw \oplus uw

If we want to use our polynomial evaluation technique, we need equations that are univariate (one variable) instead of multivariate (multiple variables). We can try just using a single variable x in place of u,v and w. Remember that in ANF, you work with polynomials mod 2 (aka \mathbb{Z}_2), and that XOR (\oplus) is addition while AND is multiplication. This gives the formulas below:

O_0 = x + x + x = 3x
O_1 = xx + xx + xx = 3x^2

The next thing we need to use the technique is to know the Bezier control points that make a Bezier curve that is equivalent to this polynomial. Since we have 3 input variables into our digital circuit, if they were all 3 multiplied together (ANDed together), we would have a cubic equation, so we need to convert those polynomials to cubic Bernstein basis polynomials. We can use the technique from the last post to get the control points of that equivalent curve.

O_0  \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 1 & 2 & 3 \\ 3 & 3 / 3 = 1 & 1 & 1 &   \\ 0 & 0 / 3 = 0 & 0 &   &   \\ 0 & 0 / 1 = 0 &   &   &   \\ \end{array}

O_1  \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 0 & 1 & 3 \\ 0 & 0 / 3 = 0 & 1 & 2 &   \\ 3 & 3 / 3 = 1 & 1 &   &   \\ 0 & 0 / 1 = 0 &   &   &   \\ \end{array}

Now that we have our control points, we can set up our textures to evaluate our two cubic Bezier curves (one for O_0, one for O_1). We’ll need to use 3d textures and we’ll need to set up the control points like the below, so that when we sample along the diagonal of the texture we get the points on our curves.

The picture below shows where each control point goes, to set up a cubic Bezier texture. The blue dot is the origin (0,0,0) and the red dot is the extreme value of the cube (1,1,1). The grey line represents the diagonal that we sample along.

Coincidentally, our control points for the O_0 curve are actually 0,1,2,3 so that cube above is what our 3d texture needs to look like for the O_0 curve.

Below is what the O_1 curve’s 3d texture looks like. Note that in reality, we could store these both in a single 3d texture, just use say the red color channel for O_0 and the green color channel for O_1.

Now that we have our textures set up let’s try it out. Let’s make a table where we have our three input bits, and we use those as texture coordinates in our 3d textures (the texture cubes above) to see what values we get. (Quick note – things are slightly simplified here vs reality. The pixel’s actual value is at a half pixel offset from the texture coordinates, so we’d be sampling between (0.5,0.5,0.5) and (1.5,1.5,1.5) instead of from (0,0,0) to (1,1,1), but we can ignore that detail for now to make this stuff clearer.)

\begin{array}{|c|c|c|c|c|} \hline u & v & w & O_1 & O_0 \\ \hline 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 2 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 & 2 \\ 1 & 1 & 0 & 1 & 2 \\ 1 & 1 & 1 & 3 & 3 \\ \hline \end{array}

Now, let’s modulus the result by 2 since ANF expects to work mod 2 (\mathbb{Z}_2 to be more precise), and put the decimal value of the result next to it.

\begin{array}{|c|c|c|c|c|c|} \hline u & v & w & O_1 \% 2 & O_0 \% 2 & Result \\ \hline 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 & 0 & 2 \\ 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 2 \\ 1 & 1 & 0 & 1 & 0 & 2 \\ 1 & 1 & 1 & 1 & 1 & 3 \\ \hline \end{array}

It worked! The result value is the count of the input bits set to 1.

Unfortunately we have a problem. When we converted the multivariate equation into a univariate equation, we just replaced u,v,w with x. This is only valid if the function is symmetric – if u,v,w can be interchanged with each other and not affect the result of the function. This bit adding digital circuit we made happened to have that property, but most digital circuits do not have that property – most of the time, not all input bits are treated equal. If we made a circuit that added two 2-bit numbers and have a 3-bit result for instance, the high bits of the input numbers have a very different meaning than the low bits and this technique falls apart. (Quick note – we are actually doing the reverse of the polynomial blossoming thing i mentioned in the last post. Blossoming is the act of taking a univariate function and breaking it into a multivariate function that is linear in each variable. The term is called symmetric multiaffine equation if you want to find out more about that.)

This turns out not to be a deal breaker though because it turns out we didn’t have to do a lot of the work that we did to get these volume textures. It turns out we don’t need to calculate the Bezier curve control points, and we don’t even need to make an ANF expression of the digital circuit we want to evaluate.

Let’s recap what we are trying to do. We have 3 input values which are either 0 or 1, we have a 3d texture which is 2x2x2, and we are ultimately using those 3 input values as texture coordinates (u,v,w) to do a lookup into a texture to get a single bit value out.

Here’s a big aha moment. We are just making a binary 3d lookup table, so can take our truth table of whatever it is we are trying to do, and then directly make the final 3d textures described above.

Not only does it work for the example we gave, with a lot less effort and math, it also works for the broken case I mentioned of the function not being symmetric, and not all input bits being equal.

Something else to note is that because we are only sampling at 0 or 1, we don’t need linear texture interpolation at all and can use nearest neighbor (point) sampling on our textures for increased performance. Also because the texture data is just a binary 0 or 1, we could use 1 bit textures.

The second aha moment comes up when you realize that all we are doing is taking some number of binary input bits, using those as texture coordinates, and then looking up a value in a texture.

You can actually use a 1D texture for this!

You take your input bits and form an integer, then look up the value at that pixel location. You build your texture lookup table using this same mapping.

So… it turns out this technique led to a dead end. It was just extra complexity to do nothing special.

Before it all fell apart, I was also thinking this might be a good avenue for doing homomorphic encryption on the GPU, but I don’t believe this aids that at all. (Super Simple Symmetric Leveled Homomorphic Encryption Implementation)

But Wait – Analog Valued Logic?

One thought I had while all this was unraveling was that maybe this was still useful, because if you put an analog value in (not a 0 or 1, but say 0.3), that maybe this could be used as a sort of “Fuzzy Logic” type logic evaluation.

Unfortunately, it looks like that doesn’t work either!

You can see how it breaks down and some more info here:
Computer Science Stack Exchange: Using analog values with Algebraic Normal Form?

Oh Well

Sometimes when exploring new frontiers (even if they are just new to us) we hit dead ends, our ideas fail etc. It happens. It’s part of the learning process, and also is useful sometimes to know what doesn’t work and why, instead of just always knowing what DOES work.

Anyways… posts on using the texture sampler for calculating points on data surfaces and data volumes are coming next (:

To give a brief taste of how that is going to play out:

  • Doing a single texture read of a 3d RGBA texture can give you a triquadratic interpolated value.
  • Alternately, doing a single texture read of a 3d RGBA texture can give you a bicubic interpolated value.

Thanks for reading!

Evaluating Polynomials with the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

I’ve been thinking about the items in the “future work” section and found some interesting things regarding polynomials, logic gates, surfaces and volumes. This is the first post, which deals with evaluating polynomials.

Evaluating Polynomials

One of the main points of my paper was that N-linear interpolation (linear, bilinear, trilinear, etc) can be used to evaluate the De Casteljau algorithm since both things are just linear interpolations of linear interpolations. (Details on bilinear interpolation here: Bilinear Filtering & Bilinear Interpolation).

This meant that it was also able to calculate Bernstein Polynomials (aka the algebraic form of Bezier curves), since Bernstein polynomials are equivalent to the De Casteljau algorithm.

I started looking around to see what would happen if you messed around with the De Casteljau algorithm a bit, like interpolate at one level by
t^2 or t*0.5+0.5 or by a constant or by another variable completely. My hope was that I’d be able to make the technique more generic and open it up to a larger family of equations, so people weren’t limited to just Bernstein polynomials.

That opened up a pretty deep rabbit hole on polynomial blossoming and something called Symmetric Multiaffine Functions. There are some great links in the answer here:
Math Stack Exchange: Modifying and Generalizing the De Casteljau Algorithm

In the end, it turned out to be pretty simple though. It turns out that any polynomial can be converted back and forth from “Power Basis” (which looks like Ax^2+Bx+C) to “Bernstein Basis” (which looks like A(1-t)^2+B(1-t)t+Ct^2) so long as they are the same degree.

This isn’t the result I was expecting but it is a nice result because it’s simple. I think there is more to be explored by sampling off the diagonal, and using different t values at different stages of interpolation, but this result is worth sharing.

By the way, you could also use curve fitting to try and approximate a higher degree function with a lower degree one, but for this post, I’m only going to be talking about exact conversion from Bernstein polynomials to Power polynomials.

Since we can convert power basis polynomials to Bernstein polynomials, and the technique already works for Bernstein polynomials, that means that if we have some random polynomial, say y=2x^3+4x+2, that we can make this technique work for that too. The technique got a little closer to arbitrary equation evaluation. Neat!

Converting Power Basis to Bernstein Basis

I found the details of the conversion process at Polynomial Evaluation and Basis Conversion which was linked to by Math Stack Exchange: Convert polynomial curve to Bezier Curve control points.

This is best explained working through examples, so let’s start by converting a quadratic polynomial from power basis to Bernstein basis.

Quadratic Function

y=2x^2+8x+3

The first thing we do is write the coefficients vertically, starting with the x^0 coefficient, then the x^1 coefficient and continuing on to the highest value x^n:

\begin{array}{c} 3 \\ 8 \\ 2 \\ \end{array}

Next, we need to divide by the Binomial Coefficients (aka the row of Pascal’s Triangle which has the same number of items as we have coefficients). In this case we need to divide by: 1,2,1.

\begin{array}{c|c} 3 & 3 / 1 = 3 \\  8 & 8 / 2 = 4 \\ 2 & 2 / 1 = 2 \\ \end{array}

Now we generate a difference table backwards. it’s hard to explain what that is in words, but if you notice, each value is the sum of the value to the left of it, and the one below that.

\begin{array}{c|c|c|c} 3 & 3 / 1 = 3 & 7 & 13 \\  8 & 8 / 2 = 4 & 6 & \\ 2 & 2 / 1 = 2 &   & \\ \end{array}

We are all done. The control points for the Bezier curve are on the top row (ignoring the left most column). They are 3,7,13 which makes it so we have the following two equations being equal. The first is in power basis, the second is in Bernstein basis.

y=2x^2+8x+3
y=3(1-x)^2+14(1-x)x+13x^2

Note: don’t forget that Bezier curves multiply the control points by the appropriate row in Pascal’s triangle. That’s where the 14 comes from in the middle term of the Bernstein polynomial. We are multiplying the control points 3,7,13 by the row in Pascal’s triangle 1,2,1 to get the final coefficients of 3,14,13.

Let’s have Wolfram Alpha help us verify that they are equal.

Wolfram Alpha: graph y=2x^2+8x+3, y=3*(1-x)^2+14x*(1-x)+13x^2, from 0 to 1

Yep, they are equal! If you notice the legend of the graph, wolfram actually converted the Bernstein form back to power basis, and you can see that they are exactly equivalent.

You can also write the Bernstein form like the below, which i prefer, using t instead of x and also setting s=1-t.

y=3s^2+14st+13t^2

Cubic Function

A cubic function is not that much harder than a quadratic function. After this, you should see the pattern and be able to convert any degree easily.

y=5x^3+9x-4

Again, the first thing we do is write the coefficients vertically, starting with the constant term. Note that we don’t have an x^2 term, so it’s coefficient is 0.

\begin{array}{c} -4 \\  9 \\  0 \\  5 \\ \end{array}

We next divide by the Pascal’s triangle row 1,3,3,1.

\begin{array}{c|c} -4 & -4 / 1 = -4 \\  9 &  9 / 3 =  3 \\  0 &  0 / 3 =  0 \\  5 &  5 / 1 =  5 \\ \end{array}

Now, make the difference table going backwards again:

\begin{array}{c|c|c|c|c} -4 & -4 / 1 = -4 & -1 & 2 & 10 \\  9 &  9 / 3 =  3 &  3 & 8 & \\  0 &  0 / 3 =  0 &  5 &   & \\  5 &  5 / 1 =  5 &    &   & \\ \end{array}

Our Bezier control points are along the top: -4,-1,2,10. Keeping in mind that the coefficients for a cubic bezier curve are multiplied by 1,3,3,1 we can make the Bernstein form and put it next to our original formula:

y=5x^3+9x-4
y=-4(1-x)^3-3(1-x)^2x+6(1-x)x^2+10x^3

Let’s check in wolfram alpha again:
Wolfram Alpha: graph y=5x^3+9x-4, y=-4(1-x)^3-3x(1-x)^2+6x^2(1-x)+10x^3, from 0 to 1

And here it is in the cleaner form:

y=-4s^3-3s^2t+6st^2+10t^3

Some Notes On Calculating Polynomials with the Texture Sampler

You may notice that in the comparison graphs i only plotted the graphs from 0 to 1 on the x axis (aka the t axis). The equations are actually equivalent outside of that range as well, but the technique from my paper only works from the 0 to 1 range because it relies on built in hardware pixel interpolation. This may sound like a big limitation, but if you know the minimum and maximum value of x that you want to plug into your equation at runtime, you can convert your x into a percent between those values, get the resulting polynomial, convert it to Bernstein form, set up the texture, and then at runtime convert your input parameter into that percent when you do the lookup. In other words, you squeeze the parts of the function you care about into the 0 to 1 range.

Another issue you will probably hit is that standard RGBA8 textures have only 8 bits per channel and can only store values between 0 and 1. Since the texture is supposed to be storing your control points, that is bad news.

One way to get around this is to find the largest coefficient value and divide the others by this value. This will put the coefficients into the 0 to 1 range, which will be able to be stored in your texture. After sampling the texture, you multiply the result by that scaling value to get the correct answer.

Scaling won’t help having both negative and positive coefficients though. To handle negative coefficients, you could map the 0-1 space to be from -1 to 1, similar to how we often do it with normal maps and other signed data stored in textures. After doing the lookup you’d have to unmap it too of course.

You could also solve negative values and scaling problems by squishing the y axis into the 0 to 1 space by subtracting the minimum and dividing by the maximum minus the minimum, similarly to how we squished the x range into 0 to 1.

If you instead move to an RGBAF32 texture, you’ll have a full 32 bit float per color channel and won’t have problems with either large values or negative values. You will still have to deal with x only going from 0 to 1 though.

I also want to mention that the hardware texture interpolation works in a X.8 fixed point format. There are more details in my paper, but that means that you’ll get some jagged looking artifacts on your curve instead of a smoothly varying value. If that is a problem for you in practice, my paper talks about a few ways to mitigate that issue.

Before moving on, I wanted to mention that it’s easy to support rational polynomials using this method as well. A rational polynomial is when you divide one polynomial by another polynomial, and relates to rational Bezier curves, where you divide one curve by another curve (aka you give weights to control points). Rational curves are more powerful and in fact you can perfectly represent sine and cosine with a quadratic rational polynomial. More info on that in my paper.

To calculate rational polynomials, you just encode the numerator polynomial in one color channel, and the denominator polynomial in another color channel. After you sample the texture and get the result of your calculation, you divide the numerator value by the denominator value. It costs one division in your shader code, but that’s pretty cheap for the power it gives you!

Regarding the texture size requirements to store a polynomial of a specific degree…

Every dimension of the texture, and every color channel in that texture, adds a degree.

However, to get the benefit of the degree increase from the color channel, you need to do a little more math in the shader – check my paper for more details!

So, if you wanted to store a quadratic polynomial in a texture, you would need either a 2d texture with 1 color channel, or you could do it with a 1d texture that had 2 color channels.

If you wanted to store a cubic polynomial in a texture, you could use a 3d texture with 1 color channel, or a 2d texture with two color channels (there would be some waste here) or a 1d texture with three color channels.

For a polynomial that had a maximum degree term of 6, you could use a 3d volume texture that had 3 color channels: RGB.

If you need to evaluate a very high degree polynomial, you can actually take multiple texture samples and combine them.

For instance, if you had a 2d texture with a single color channel, you could do a single texture read to get a quadratic.

If you did two texture reads, you would have two quadratics.

If you linearly interpolated between those two quadratics, you would end up with a cubic.

That isn’t a very high degree curve but is easier to grasp how they combine.

Taking this up to RGBA 3d volume textures, a single texture read will get you a curve of degree 6. If you do another read, it will take it to degree 7. Another read gets you to 8, another to 9, etc.

With support for 4d textures, an RGBA texture read would give you a degree 7 curve. Another read would boost it to 8, another to 9, another to 10, etc.

Regarding the specific sizes of the textures, in all cases the texture size is “2” on each dimension because we are always just linearly interpolating within a hyper cube of pixel values. You can increase the size of the texture for piecewise curves, check out the paper for more details on that and other options.

Closing

Hopefully you found this useful or interesting!

There may not have been much new information in here for the more math inclined people, but I still think it’s worth while to explicitly show how the technique works for both Bernstein polynomials as well as the more common power basis polynomials.

I still think it would be interesting to look at what happens when you sample off of the diagonal, and also what happens if you use different values at different stages of the interpolation. As an example, instead of just looking up a texture at (t,t) for the (u,v) value to get a quadratic curve point, what if we look up by (t,t^2)? At first blush, it seems like by doing that we may be able to boost a curve to a higher degree, maybe at the cost of some reduced flexibility for the specific equations we can evaluate?

Next up I’ll be writing up some more extensions to the paper involving logic gates, surfaces, and volumes.

Have any feedback, questions or interesting ideas? Let me know!

GPU Texture Sampler Bezier Curve Evaluation

Below is a paper I submitted to jcgt.org that unfortunately did not get accepted. Maybe next time!

The main idea of this paper is that bilinear interpolation can be equivalent to the De Casteljau algorithm, which means that if you set up a texture in a specific way, and sample from it at specific texture coordinates, that it will in fact give you Bezier curve points as output! It scales up for higher dimensional textures, as well as higher order curves.

The image below shows this in action for a cubic Bezier curve (3 control points) being stored and recalled from a 2×2 texture (there is actually a curve stored in each color channel).

This image is from an extension linked to lower down which applies the technique to surfaces and volumes:

The primary feedback from the reviewers and editor was that:

  • It was an interesting technique and they thought it was a paper worth reading.
  • The usage case was fairly limited though – basically only when your are compute bound in your shader program, and have some curve calculations to offload to the texture sampler. Or if you are already using a lookup texture and would benefit from fewer instructions and smaller lookup textures.
  • It could have been shorter due to the writing being shorter, but also it could have been less thorough. For instance, it didn’t need to show equivalence to both the De Casteljau’s algorithm as well as Bernstein polynomials, since it’s already known that those are equivalent.
  • They wanted some more performance details

I agree with the feedback, and don’t feel like taking the time to change and resubmit or submit else where, so I’m sharing it here on my blog. I hope you enjoy it and find it interesting (:

Here is the paper:
GPUBezier2016.pdf

Here is the supplemental materials (opengl and webgl source code):
SupplementalMaterials.zip

Here is the webgl demo from the supplemental materials, hosted on my site:
GPU Efficient Texture Based Bezier Curve Evaluation

Here are some working shadertoy demos of the technique:
Shadertoy: Mystery Curves – Quadratic
Shadertoy: Mystery Curves – Cubic
Shadertoy: Mystery Curves – Quartic
Shadertoy: Mystery Curves – Quintic

Extensions

Continuations of this work:

Failed Experiments

Continuations that didn’t work out:

What are your thoughts? Is this cool? Is it lame? Got some ideas to improve it? Leave a comment! (:

G-Buffer Upsizing

The other day I had a thought:

Rendering smaller than full screen images is super helpful for performance, but upsizing an image results in pretty bad quality vs a full resolution render.

What if instead of upsizing the final rendered image, we upsized the values that were used to shade each pixel?

In other words, what if we rendered a scene from a less than full resolution g-buffer?

I was thinking that could be useful in doing ray based graphics, not having to trace or march quite so many rays, but also perhaps it could be useful for things like reflections where a user isn’t likely to notice a drop in resolution.

I’m sure I’m not the first to think of this, but I figured I’d explore it anyways and see what I could see.

I made an interactive shadertoy demo to explore this if you want to see it first hand:
Shadertoy: G-Buffer Upsizing

Result

In short, it does look better in a lot of ways because the normals, uv coordinates and similar parameters interpolate really well, but the edges of shapes are aliased really bad (jaggies).

Check out the images below to see what i mean. The first image is a full sized render. The second image is a 1/4 sized render (half x and half y resolution). The third image is a 1/16th sized render (quarter x and quarter y resolution)



For comparison, here’s a 1/4 sized and 1/16 sized render upsized using bicubic IMAGE interpolation instead of g-buffer data interpolation:


Details & More Information

Despite the aliased results at 1/16th render size, this seems like it may be a reasonable technique at larger render sizes, depending on the level of quality you need. Doing half vertical or half horizontal resolution looks very close to the full sized image for instance. The edges are a tiny bit more aliased along one direction, but otherwise things seem decent:

Since the g-buffer has only limited space, you will probably want to bit pack multiple fields together in the same color channels. When you do that, you throw out the possibility of doing hardware interpolation unfortunately, because it interpolates the resulting bit packed value, not the individual fields that you packed in.

Even when doing the interpolation yourself in the pixel shader, for the most part you can really only store information that interpolates well. For instance, you could store a diffuse R,G,B color, but you wouldn’t want to store and then interpolate a material index. This is because you might have material index 10 (say it’s blue) next to material index 0 (say it’s green), and then when you interpolate you could end up with material index 5 which may be red. You’d get red between your blue and green which is very obviously wrong.

In my demo I did have a material index per pixel, but i just used nearest neighbor for that particular value always. To help the appearance of aliasing, I also stored an RGB diffuse color that i interpolated.

I stored the uvs in the g-buffer and sampled the textures themselves in the final shader though, to make sure and get the best texture information I could. This makes textures look great at virtually any resolution and is a lot of the reason why the result looks as good as it does IMO.

The fact that normals interpolate is a good thing, except when it comes to hard edges like the edge of the cube, or at the edge of any object really. In the case of the cube edge, it smooths the edge a little bit, making a surface that catches specular lighting and so highlights itself as being incorrect (!!). In the case of the edge of regular objects, a similar thing happens because it will interpolate between the normal at the edge of the object and the background, making a halo around the object which again catches specular lighting and highlights itself as being incorrect.

I think it could be interesting or fruitful to explore using edge detection to decide when to blend or not, to help the problem with normals, or maybe even just some edge detection based anti aliasing could be nice to make the resulting images better. The depth (z buffer value) could also maybe be used to help decide when to interpolate or not, to help the problem of halos around every object.

Interestingly, bicubic interpolation actually seems to enhance the problem areas compared to bilinear. It actually seems to highlight areas of change, where you would actually want it to sort of not point out the problems hehe. I think this is due to Runge’s phenomenon. Check out the depth information below to see what i mean. The first is bilinear, the second is bicubic:


One final side benefit of this I wanted to mention, is that if you are doing ray based rendering, where finding the geometry information per pixel can be time consuming, you could actually create your g-buffer once and re-shade it with different animated texture or lighting parameters, to give you a constant time (and very quick) render of any scene of any complexity, so long as the camera wasn’t moving, and there were no geometry changes happening. This is kind of along the same lines as the very first post I made to this blog about 4 years ago, which caches geometry in screen space tiles, allowing dirty rectangles to be used (MoriRT: Pixel and Geometry Caching to Aid Real Time Raytracing).

Anyone else go down this path and have some advice, or have any ideas on other things not mentioned? (:

Next up I think I want to look at temporal interpolation of g-buffers, to see what sort of characteristics that might have. (Quick update, the naive implementation of that is basically useless as far as i can tell: G-Buffer Temporal Interpolation).

Related Stuff

On shadertoy, casty mentioned that if you have some full res information, and some less than full res information, you can actually do something called “Joint Bilateral Upsampling” to get a better result.

Give this paper a read to learn more!
Joint Bilateral Upsampling

It turns out someone has already solved this challenge with great success. They use “the MSAA trick” to get more samples at the edges. Check out ~page 38:
GPU-Driven Rendering Pipelines

Hiding a Lookup Table in a Modulus Operation

Lookup tables are a tool found in every programmer’s tool belt.

Lookup tables let you pre-calculate a complex calculation in advance, store the results in a table (an array), and then during performance critical parts of your program, you access that table to get quick answers to the calculations, without having to do the complex calculation on the fly.

In this post I’ll show a way to embed a lookup table inside of a single (large) number, where you extract values from that lookup table by taking a modulus of that number with different, specific values.

This technique is slower and takes more memory than an actual lookup table, but it’s conceptually interesting, so I wanted to share.

Also, I stumbled on this known technique while working on my current paper. The paper will make this technique a bit more practical, and I’ll share more info as soon as I am able, but for now you can regard this as a curiosity 😛

Onto the details!

1 Bit Input, 1 Bit Output: Pass Through

Let’s learn by example and start with a calculation that takes in an input bit, and gives that same value for an output bit. It’s just a 1 bit pass through lookup table.

\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 0 & 0 \\ 1 & 1 \\ \end{array}

To be able to convert that to something we can decode with modulus we have to solve the following equations:

x \% k_0 = 0 \\ x \% k_1 = 1

x is the number that represents our lookup table. k_0 and k_1 are the values that we modulus x against to get our desired outputs out.

It looks as if we have two equations and three unknowns – which would be unsolvable – but in reality, x is the only unknown. The k values can be whatever values it takes to make the equations true.

I wrote a blog post on how to solve equations like these in a previous post: Solving Simultaneous Congruences (Chinese Remainder Theorem).

You can also use this chinese remainder theorem calculator, which is handy: Chinese Remainder Theorem Calculator

The short answer here is that the k values can be ANY numbers, so long as they are pairwise co-prime to each other – AKA they have a greatest common divisor of 1.

If we pick 3 and 4 for k0 and k1, then using the chinese remainder theorem we find that x can equal 9 and the equations are true. Technically the answer is 9 mod 12, so 21, 33, 45 and many other numbers are also valid values of x, but we are going to use the smallest answer to keep things smaller, and more manageable.

So, in this case, the value representing the lookup table would be 9. If you wanted to know what value it gave as output when you plugged in the value 0, you would modulus the lookup table (9) against k0 (3) to get the output. If you wanted to know what value it gave as output when you plugged in the value 1, you would modulus the lookup table (9) against k1 (4) to get the output. The table below shows that it passes through the value in both cases like it should:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 9 \% 3 & 0\\ 1 & x \% k_1 & 9 \% 4 & 1\\ \end{array}

1 Bit Input, 1 Bit Output: Not Gate

Let’s do something a little more interesting. Let’s make the output bit be the reverse of the input bit. The equations we’ll want to solve are this:

x \% k_0 = 1 \\ x \% k_1 = 0

We can use 3 and 4 for k0 and k1 again if we want to. Using the Chinese remainder theorem to solve the equations gives us a value of 4 for x. Check the truth table below to see how this works:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 4 \% 3 & 1\\ 1 & x \% k_1 & 4 \% 4 & 0\\ \end{array}

1 Bit Input, 1 Bit Output: Output Always 1

What if we wanted the output bit to always be 1 regardless of input bit?

x \% k_0 = 1 \\ x \% k_1 = 1

Using 3 and 4 for our k values again, we solve and get a value of 1 for x. Check the truth table to see it working below:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 1 \% 3 & 1\\ 1 & x \% k_1 & 1 \% 4 & 1\\ \end{array}

Hopefully one bit input to one bit output makes sense now. Let’s move on (:

2 Bit Input, 1 Bit Output: XOR Gate

Things get a little more interesting when we bump the number of input bits up to 2. If we want to make a number which represents XOR, we now have 4 equations to solve.

x \% k_{00} = 0 \\ x \% k_{01} = 1 \\ x \% k_{10} = 1 \\ x \% k_{11} = 0

In general we will have 2^N equations, where N is the number of input bits.

You might have noticed that I use subscripts for k corresponding to the input bits that the key represents. This is a convention I’ve found useful when working with this stuff. Makes it much easier to see what’s going on.

Now with four equations, we need 4 pairwise coprime numbers – no number has a common factor with another number besides 1.

Let’s pull them out of the air. Umm… 3, 4, 5, 7

Not too hard with only two bits of input, but you can see how adding input bits makes things a bit more complex. If you wanted to make something that took in two 16 bit numbers as input for example, you would need 2^32 co-prime numbers, since there was a total of 32 bits of input!

When we solve those four equations, we get a value of 21 for x.

Notice how x is larger now that we have more input bits? That is another added complexity as you add more input bits. The number representing your program can get very, very large, and require you to use “multi precision integer” math libraries to store and decode the programs, when the numbers get larger than what can be held in a 64 bit int.

Boost has a decent library for this, check out boost::multiprecision::cpp_int, it’s what I use. You can download boost from here: http://www.boost.org/doc/libs/1_59_0/more/getting_started/windows.html

Anyhow, let’s check the truth table to see if our values work:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x \% k_{00} & 21 \% 3 & 0 \\ 01 & x \% k_{01} & 21 \% 4 & 1 \\ 10 & x \% k_{10} & 21 \% 5 & 1 \\ 11 & x \% k_{11} & 21 \% 7 & 0 \end{array}

Woot, it worked.

2 Bit Input, 2 Bit Output: OR, AND

What happens when we add another bit of output? Basically we just treat each output bit as it’s own lookup table. This means that if we have two output bits, we will have two numbers representing our program (one for each bit), and that this is true regardless of how many input bits we have.

Let’s make the left output bit (x_0 ) be the OR of the input bits and the right output bit (x_1 ) be the AND of the input bits.

That give us these two sets of equations to solve:

x_0 \% k_{00} = 0 \\ x_0 \% k_{01} = 1 \\ x_0 \% k_{10} = 1 \\ x_0 \% k_{11} = 1 \\ \\ x_1 \% k_{00} = 0 \\ x_1 \% k_{01} = 0 \\ x_1 \% k_{10} = 0 \\ x_1 \% k_{11} = 1 \\

We can use the same coprime numbers for our k values as we used in the last section (3,4,5,7). Note that we use the same k values in each set of equations. This is intentional and required for things to work out!

If we solve each set of equations we get 141 for x0, and 120 for x1.

Let’s see if that worked:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x_0 \% k_{00}, x_1 \% k_{00} & 141 \% 3, 120 \% 3 & 00 \\ 01 & x_0 \% k_{01}, x_1 \% k_{01} & 141 \% 4, 120 \% 4 & 10 \\ 10 & x_0 \% k_{10}, x_1 \% k_{10} & 141 \% 5, 120 \% 5 & 10 \\ 11 & x_0 \% k_{11}, x_1 \% k_{11} & 141 \% 7, 120 \% 7 & 11 \end{array}

Hey, it worked again. Neat!

Example Code

Now that we have the basics worked out, here is some sample code.

The lookup table takes in 8 bits as input, mapping 0..255 to 0…2pi and gives the sine of that value as output in a float. So it has 8 bits of input and 32 bits of output.

#include <vector>
#include <boost/multiprecision/cpp_int.hpp>
#include <stdint.h>
#include <string.h>
#include <memory>

typedef boost::multiprecision::cpp_int TINT;
typedef std::vector<TINT> TINTVec;

const float c_pi = 3.14159265359f;

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
static TINT ExtendedEuclidianAlgorithm (TINT smaller, TINT larger, TINT &s, TINT &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }

    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<TINT, 2> remainders = { larger, smaller };
    std::array<TINT, 2> ss = { 1, 0 };
    std::array<TINT, 2> ts = { 0, 1 };
    size_t indexNeg2 = 0;
    size_t indexNeg1 = 1;

    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        TINT newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        TINT newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];

        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);

            // if t < 0, add the modulus divisor to it, to make it positive
            if (t < 0)
                t += smaller;
            return remainders[indexNeg1];
        }

        // calculate this round's s and t
        TINT newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        TINT newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];

        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;

        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}

//=================================================================================
void MakeKey (TINTVec &keys, TINT &keysLCM, size_t index)
{
    // if this is the first key, use 3
    if (index == 0)
    {
        keys[index] = 3;
        keysLCM = keys[index];
        return;
    }

    // Else start at the last number and keep checking odd numbers beyond that
    // until you find one that is co-prime.
    TINT nextNumber = keys[index - 1];
    while (1)
    {
        nextNumber += 2;
        if (std::all_of(
            keys.begin(),
            keys.begin() + index,
            [&nextNumber] (const TINT& v) -> bool
            {
                TINT s, t;
                return ExtendedEuclidianAlgorithm(v, nextNumber, s, t) == 1;
            }))
        {
            keys[index] = nextNumber;
            keysLCM *= nextNumber;
            return;
        }
    }
}

//=================================================================================
void CalculateLookupTable (
    TINT &lut,
    const std::vector<uint64_t> &output,
    const TINTVec &keys,
    const TINT &keysLCM,
    const TINTVec &coefficients,
    size_t bitMask
)
{
    // figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    lut = 0;
    for (size_t i = 0, c = keys.size(); i < c; ++i)
    {
        // we either want this term to be 0 or 1 mod the key.  if zero, we can multiply by zero, and
        // not add anything into the bit value!
        if ((output[i] & bitMask) == 0)
            continue;

        // if 1, use chinese remainder theorem
        TINT s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], keys[i], s, t);
        lut = (lut + ((coefficients[i] * t) % keysLCM)) % keysLCM;
    }
}

//=================================================================================
template <typename TINPUT, typename TOUTPUT, typename LAMBDA>
void MakeModulus (TINTVec &luts, TINTVec &keys, LAMBDA &lambda)
{
    // to keep things simple, input sizes are being constrained.
    // Do this in x64 instead of win32 to make size_t 8 bytes instead of 4
    static_assert(sizeof(TINPUT) < sizeof(size_t), "Input too large");
    static_assert(sizeof(TOUTPUT) < sizeof(uint64_t), "Output too large");

    // calculate some constants
    const size_t c_numInputBits = sizeof(TINPUT) * 8;
    const size_t c_numInputValues = 1 << c_numInputBits;
    const size_t c_numOutputBits = sizeof(TOUTPUT) * 8;

    // Generate the keys (coprimes)
    TINT keysLCM;
    keys.resize(c_numInputValues);
    for (size_t index = 0; index < c_numInputValues; ++index)
        MakeKey(keys, keysLCM, index);

    // calculate co-efficients for use in the chinese remainder theorem
    TINTVec coefficients;
    coefficients.resize(c_numInputValues);
    fill(coefficients.begin(), coefficients.end(), 1);
    for (size_t i = 0; i < c_numInputValues; ++i)
    {
        for (size_t j = 0; j < c_numInputValues; ++j)
        {
            if (i != j)
                coefficients[i] *= keys[j];
        }
    }

    // gather all the input to output mappings by permuting the input space
    // and storing the output for each input index
    std::vector<uint64_t> output;
    output.resize(c_numInputValues);
    union
    {
        TINPUT value;
        size_t index;
    } input;
    union
    {
        TOUTPUT value;
        size_t index;
    } outputConverter;

    for (input.index = 0; input.index < c_numInputValues; ++input.index)
    {
        outputConverter.value = lambda(input.value);
        output[input.index] = outputConverter.index;
    }

    // iterate through each possible output bit, since each bit is it's own lut
    luts.resize(c_numOutputBits);
    for (size_t i = 0; i < c_numOutputBits; ++i)
    {
        const size_t bitMask = 1 << i;
        CalculateLookupTable(
            luts[i],
            output,
            keys,
            keysLCM,
            coefficients,
            bitMask
        );
    }
}

//=================================================================================
int main (int argc, char **argv)
{
    // Look up tables encodes each bit, keys is used to decode each bit for specific
    // input values.
    TINTVec luts;
    TINTVec keys;

    // this is the function that it turns into modulus work
    typedef uint8_t TINPUT;
    typedef float TOUTPUT;
    auto lambda = [] (TINPUT input) -> TOUTPUT
    {
        return sin(((TOUTPUT)input) / 255.0f * 2.0f * c_pi);
    };

    MakeModulus<TINPUT, TOUTPUT>(luts, keys, lambda);

    // show last lut and key to show what kind of numbers they are
    std::cout << "Last Lut: " << *luts.rbegin() << "n";
    std::cout << "Last Key: " << *keys.rbegin() << "n";

    // Decode all input values
    std::cout << "n" << sizeof(TINPUT) << " bytes input, " << sizeof(TOUTPUT) << " bytes outputn";
    for (size_t keyIndex = 0, keyCount = keys.size(); keyIndex < keyCount; ++keyIndex)
    {
        union
        {
            TOUTPUT value;
            size_t index;
        } result;

        result.index = 0;

        for (size_t lutIndex = 0, lutCount = luts.size(); lutIndex < lutCount; ++lutIndex)
        {
            TINT remainder = luts[lutIndex] % keys[keyIndex];
            size_t remainderSizeT = size_t(remainder);
            result.index += (remainderSizeT << lutIndex);
        }

        TINT remainder = luts[0] % keys[keyIndex];
        std::cout << "i:" << keyIndex << " o:" << result.value << "n";
    }

    WaitForEnter();
    return 0;
}

Here is some output from the program. The first is to show what the last (largest) look up table and key look like. Notice how large the look up table number is!

Here it shows some sine values output from the program, using modulus against the large numbers calculated, to get the bits of the result out:

How to Get Lots of Pairwise Co-Prime Numbers?

You can generate a list of pairwise coprimes using brute force. Have an integer that you increment, and check if it’s pairwise co-prime to the existing items in the list. If it is, add it to the list! Rinse and repeat until you have as many as you want.

That is the most practical way to do it, but there are two other interesting ways I wanted to mention.

The first way is using Fermat numbers. The Fermat numbers are an infinite list of pairwise co-prime numbers and are calculated as 2^{2^n}+1 where n is an integer. Fermat numbers also have the benefit that you can get the nth item in the list without calculating the numbers that came before it. The only problem is that the numbers grow super huge very fast. The first 7 values are: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617. If Fermat numbers didn’t grow so quickly, they sure would be useful for things like this technique.

The second way is using something called Sylvester’s sequence. It too is an infinite list of pairwise co-prime numbers, and it too grows very large very quickly unfortunately. I also don’t believe there is a way to calculate the Nth item in the list directly. Every number is based on previous numbers, so you have to calculate them all from the beginning. No random access!

Beyond Binary

In this post I showed how to work in binary digits, but there is no reason why you have to encode single bits in the lookup tables.

Instead of encoding 0 or 1 in each modulus “lookup table”, you could also perhaps store base 10 numbers in the tables and have 0-9. Or, maybe you encode a byte per lookup table.

Encoding more than one bit effectively makes both your input and your output smaller, which helps the algorithm do more with less.

Your keys will need to be larger though, since the keys have to be larger than the value you plan to store, and your resulting lookup table will be a larger number as well. It might make the technique more worth while though.

I’ll leave that as an exercise for you. If try it and find neat stuff, post a comment and let us know, or drop me an email or something. It’d be neat to hear if people find any practical usage cases of this technique 😛

The End, For Now!

I want to point out that increasing the number of input bits in this technique is a pretty expensive thing to do, but increasing the number of output bits is a lot cheaper. It kind of makes sense in a way if you think about it. Input bits add information from the outside world that must be dealt with, while output bits are just fluff that can easily be diluted or concentrated by adding or removing bits that are associated with, and calculated from, the input bits.

Another problem you may have noticed with this technique is that if you have a really expensive calculation that you are trying to “flatten” into modulus math like this, that you have to run that calculation many, many times to know what values a lookup table would give you. You have to run it once per possible input to get every possible output. That is expected when making a lookup table, since you are paying a cost up front to make things faster later.

The paper I’m working on changes things a bit though. One of the things it does is it makes it so doing this technique only requires that you evaluate the function once, and it calculates all values simultaneously to give the end result that you can then do modulus against. It’s pretty cool IMO and I will share more details here as soon as I am able – and yes, i have actual working code that does that, believe it or not! I’m looking forward to being able to share it later on. Maybe someone will find some really cool usage case for it.