## Raytracing Reflection, Refraction, Fresnel, Total Internal Reflection, and Beer’s Law

This post talks about how to render images like the below in real time using ray tracing. Some realism in the images come from reflection and refraction, but the real icing on the cake comes from Fresnel, total internal reflection and Beer’s law. We’ll look at the contributions of these features individually and talk about how to properly combine them for the greatest and most realistic results (:

The renderings come from a shadertoy that accompanies this post: Shadertoy: Reflect Refract TIR Fresnel RayT

My motivation for learning more about this stuff is that I’m starting to make a marble madness inspired game, and I’m planning to do hybrid rendering between rasterization and ray based techniques.

This post will focus more on how to make these features work for you in ray tracing, and less about the reasons for why they work the way they do. This post is more about practical implementations, and less about rigorous explanations.

If you have any questions, comments, corrections or additions, feel free to leave in the comments section at the bottom, or feel free to hit me up on twitter at @Atrix256.

This post is assumes you know how to do basic raytracing with ambient, diffuse and specular lighting, like the image below, which we are going to start with:

# Reflection

The first thing to talk about is reflection. More specifically we are going to be talking about SPECULAR reflection.

Specular is defined by the dictionary to mean “of, relating to, or having the properties of a mirror.”, so what we normally think of as reflection (like from a mirror) is in fact specular reflection.

There is non specular reflection, aka diffuse reflection, which just means that light is reflected off of a surface in a non mirror like way. This is accomplished with diffuse lighting where commonly we dot product the normal of a surface with the direction from the surface to the light to know how much to light that point on the surface.

If specular reflection is how mirrors reflect, and diffuse reflection is how regular diffuse surfaces work, then you might wonder what specular lighting is all about.

A specular highlight is actually just a cheap approximation to doing a mirror like specular reflection of a light source, so it is a cheap kind of specular reflection.

Let’s talk about how to do real mirror like specular reflection in a ray tracer.

When light hits a surface, some amount of the light will be reflected, and some amount of the light will be transmitted into the object. Transmitted light is used for the diffuse shading.

As you might imagine, the amount of light reflected plus the amount of light transmitted must equal the total amount of light that hit the surface. (note that some transmitted energy may be absorbed and given off as heat, or the object may be glowing, so may give off more light than received but let’s ignore that stuff for now.)

A common way to deal with this is to define a reflectivity of a surface as the amount of light it reflects in percent and use 100% minus that amount as the transmitted amount.

You might say that 2% of light that hits a surface reflects. That means that 98% of the light that hits a surface is transmitted and used for the diffuse shading.

When shading a point on the surface, you would calculate both the reflected color at that point on the surface, and the diffuse shaded color at that point, but you multiply the reflected color by 0.02 and the diffuse shaded color by 0.98 and add them together. Doing this gives a result like the below:

The higher the reflectivity percent, the more reflection you get, and the less diffuse shading you get. Turning down reflectivity has the opposite effect.

How do you calculate the reflected color of a point on a surface though?

You simply calculate the ray that reflected off of the surface, and calculate the color of what that ray hit.

To calculate a reflected ray, you need to know the direction that the ray was traveling when it hit the surface (The incident direction), you need to know the location that the ray hit the surface (the surface location), and you need to know the normal of the surface at the intersection point.

If you have those things you can calculate the reflected ray direction as this:

$ReflectRayLocation = SurfaceLocation \\ ReflectRayDirection = IncidentDirection - 2 * SurfaceNormal * DotProduct(IncidentDirection, SurfaceNormal)$

Note that in hlsl and glsl there is a function called “reflect” which takes the incident direction, and the surface normal, and returns you the reflected direction.

Doing the above is mathematically correct, but if you then try to raytrace from that ray, you may hit the same object you are reflecting off. One way to fight that problem is to push the ray positin a small amount away from the surface to make sure it misses it. You can do that for example like this:

$ReflectRayLocation = ReflectRayLocation + ReflectRayDirection * 0.01$

There are other ways to make sure you don’t hit the same object again, but pushing the ray away from the object a small amount really does work pretty nicely in practice.

# Refraction

I mentioned in the last section that whatever light wasn’t reflected when it hit a surface was transmitted. I also said that the transmitted light was used for diffuse shading.

In reality, it’s passed through the “bidirectional scattering distribution function” aka BSDF. You may have heard the term “bidirectional reflection distribution function” aka BRDF. A BRDF only deals with the hemisphere (half a sphere) that surrounds the surface normal. The BSDF on the other hand deals with the full sphere surrounding a surface normal so BRDF is a subset of what is possible with the BSDF.

Because the BSDF deals with an entire sphere, that means that it can handle reflection (specular and non specular) like the BRDF can, but it can also deal with transparency and refraction, where some of the light travels THROUGH an object.

In a path tracer where everything is physically accurate and mathematically precise, we would be interested in dealing with a BSDF and integrating over the sphere, but since we are working with a ray tracer, our physical accuracy needs are a lot lower – we only want a result that looks plausible.

What we are going to do for our transparency is calculate a direction that the ray is going to travel through the object, ray trace that ray to get a color, and use the transmitted light (the portion of light that isn’t reflected) as a multiplier of that color, that we add to the reflected amount of light.

If we have an object with 10% reflectivity, and 90% transmittance, but use that transmitted light for transparency, we’ll have a rendering like below:

Now that we have transparency, let’s talk about refraction. Refraction is a physical phenomenon where light bends (“changes speed” i guess but that sounds a bit suspicious for light), as it hits a boundary between two different surfaces.

Every material has a refractive index, and in fact, may have different refractive indices per light frequency. For our purposes, we’ll assume surfaces have the same refractive index for every frequency of light. There’s a list of refractive indices for a lot of different materials here: Index of refraction

How a ray bends when it refracts depends on the ration of the refractive index that it’s leaving to the refractive index that it’s entering. AKA outside/inside.

HLSL and GLSL have a function called refract which take the incident vector, the surface normal vector, and that ration of refractive indices, and return the refracted ray.

When you do a raytrace down the refracted ray, you will have the same problem as when tracing the reflected ray, that you may hit the same surface you just hit again erroneously. To help that, you once again just move the ray slightly away from the surface.

$RefractRayLocation = RefractRayLocation + RefractRayDirection * 0.01$

Here is a rendering where the sphere has 10% reflectivity, 90% transmittance, an air refractive index of 1.0, and a refractive index of 1.125 for the sphere. You can see how the light bends as it goes through the object and looks pretty neat!

# Fresnel

There is an interesting property in our world: If you look at something straight on, it’s the least reflective it will be. If you turn it nearly 90 degrees where it’s nearly edge on, it will be the most reflective it can be. Many surfaces will become almost perfectly reflective when you view them almost edge on. Go try it out with a wall in your house or a glass or other things.

Weird huh? That’s called Fresnel, and is something we can also make use of in ray tracing to get a more realistic image.

Instead of just always using the reflectivity amount of the surface, we use the Fresnel equation to figure out how much reflectivity an object should have based on the view angle versus the surface normal, so that when it’s more edge on it becomes more reflective. At minimum the reflectivity will be the reflectivity of the surface, and at maximum the reflectivity will be 100%.

The amount we transmit for either refraction or diffuse will be 100% minus however much percentage is reflective.

Here is the image showing normal reflection again:

Here is the image with Fresnel:

It looks quite a bit better with fresnel doesn’t it?!

Here’s a GLSL function of Schlick’s Fresnel approximation function. Notice that it takes the surface normal and incident vector, as well as the refractive index being left (n1) and the refractive index being entered (n2):

float FresnelReflectAmount (float n1, float n2, vec3 normal, vec3 incident)
{
// Schlick aproximation
float r0 = (n1-n2) / (n1+n2);
r0 *= r0;
float cosX = -dot(normal, incident);
if (n1 > n2)
{
float n = n1/n2;
float sinT2 = n*n*(1.0-cosX*cosX);
// Total internal reflection
if (sinT2 > 1.0)
return 1.0;
cosX = sqrt(1.0-sinT2);
}
float x = 1.0-cosX;
float ret = r0+(1.0-r0)*x*x*x*x*x;

// adjust reflect multiplier for object reflectivity
ret = (OBJECT_REFLECTIVITY + (1.0-OBJECT_REFLECTIVITY) * ret);
return ret;
}


Our tale of reflection is complete, so let’s go back to refraction / transparency and finish up the last two items.

# Total Internal Reflection

The way that the Fresnel equation works, it’s possible that when moving from a material with a higher index of refraction to a lower one, that the amount of refraction can actually drop to zero percent. In this case, the light doesn’t exit the higher refractive index object and instead ONLY reflects back into the object, because the reflective amount becomes 100%.

When this happens, it’s called “Total Internal Reflection” for hopefully obvious reasons (:

There isn’t a whole lot to say about this, because you can see that this is even accounted for in the GLSL Fresnel function from the last section.

However, if you are ever under water in a swimming pool and look up to see the water surface looking like a mirror, that is total internal reflection in action.

You can also see it in the render below, where you can see reflections on the inside of the walls of the object, especially on the bottom (floor) of the object:

# Beer’s Law

Beer’s law is the last item to talk about, and relates to transparent surfaces. Beer’s law deals with light being absorbed over distance as it travels through a material.

Beer’s law is why a thin piece of jello is mostly colorless, but a thicker piece of jello has a much richer and deeper color.

Here’s a cube with beer’s law absorbing red and green light over distance. You should notice that where the light travels less distance through the cube that it’s not as blue, because not as much red and green light has been absorbed:

To apply beer’s law, you first figure out how long a ray has traveled through the absorbing material (by tracing the ray inside the object to see where it hits the back side). You then do this calculation:

vec3 color = RayTrace(rayPos, rayDir);
vec3 absorb = exp(-OBJECT_ABSORB * absorbDistance);
color *= absorb;


OBJECT_ABSORB is an RGB value that describes how much of each color channel absorbs over distance. For example, a value of (8.0, 2.0, 0.1) would make red get absorbed the fastest, then green, then blue, so would result in a blueish, slightly green color object.

# Putting it All Together

Now that we have the individual components worked out let’s talk about how to put it together.

Firstly, when a ray hits any surface, you need to use the Fresnel equation to get the multiplier for the reflected and transmitted light.

Next you calculate the reflected and transmitted light by recursively raytracing. The transmitted light is either used for the diffuse shading, the transparency/refracted ray, or some combination of those two (technically, it’s all up to the BSDF we are approximating).

Then, you multiply the reflected light by the reflected amount from the Fresnel equation, and the transmitted amount by 100%-reflectionAmount and add the results together.

(Quick side note, if you have emissive color on the surface aka the object glows, you would also add that in here).

Since raytracing is recursive, you would do this each time a ray intersected with an object. In other words, each intersection causes the ray to split into two rays.

As you can imagine, this can make the rendering quite complex, especially on the GPU where you can’t even do real recursion.

One way to help limit the recursiveness a bit is when you are calculating your reflection and transmittance amounts, you can choose a threshold like say 1% where if the reflection is under 1% it clamps it to 0%, and if it’s greater than 99% it clamps it at 100%. You can then choose not to recurse down a specific ray if the ray’s multiplier is 0. The end result will be that reflections or transmittance rays that don’t contribute much to the end result won’t be followed at all.

If you are willing to sacrifice some visual quality to not have to split your ray into two at each object intersection, you could also figure out if reflection or transmittance has the higher multiplier, and make the ray only follow one of the paths. If you were doing this in a path tracer, you could choose which one to follow randomly, using the multiplier as a weight for the random selection.

The problem in both of these two optimizations is that the multiplier is only half of the information though so may incorrectly choose the less meaningful contribution. The other half of the information is the color of the ray if you followed it. The reason for this is that you might have a low multiplier for a really bright spot (caustics have this problem commonly!), or vice versa you may have a high multiplier for a dull featureless spot. With path tracing, if you take enough samples, it all washes out in the averages, and with ray tracing, maybe you accept that it will do the wrong thing sometimes, to stay a real time algorithm, but it’s important to know how this type of choice can fail for you. (Side note, this sort of stuff is what importance sampling in path tracing is all about – trying to make rays follow more meaningful paths to get better results quicker).

When doing real time raytracing you also often have to decide how many times you want to allow a ray to bounce around. In the shadertoy that goes with this post, that parameter is MAX_RAY_BOUNCES and I have it set to 10.

Interestingly, setting it to 1 has no visible impact on the sphere at all, which is a nice improvement. For the box, a value of 3 seems to be the maximum it needs. 3 also seems to be the magic number for the geometric gem type shape.

So, 10 is overkill, but i left it at that in case people play with parameters and change them to values which would require more bounces.

Lastly I wanted to mention that in the scene that I rendered, I did a small “trick” to make it so I didn’t need to do full recursive splitting of rays at each intersection. I did this by making it so the main object in the center of the scene was the only object that had reflection.

This way, I only need to split the ray into two if i hit the main object. Furthermore, when I’m splitting the ray at the main object, the ray that gets the color for the outside world (versus the inside of the object) is a single non recursive ray cast since it can’t hit anything reflective. The result is that at each intersection of the sphere, i do a simple non recursive ray cast for the ray that is going outside of the main object, then i continue the iterative ray on the inside of the object until i run out of bounces.

Doing this causes a recursive process to become an iterative one, which is much friendlier on the gpu.

Below is the final render again from the shadertoy. The parameters are:

• The refractive index of the air is 1.00029
• The refractive index inside the objects are 1.125
• Reflectivity is 1%
• The absorption for beers law is (8.0, 8.0, 3.0)

## Incremental Least Squares Surface and Hyper-Volume Fitting

The last post showed how to fit a $y=f(x)$ equation to a set of 2d data points, using least squares fitting. It allowed you to do this getting only one data point at a time, and still come up with the same answer as if you had all the data points at once, so it was an incremental, or “online” algorithm.

This post generalizes that process to equations of any dimension such as $z=f(x,y)$, $w=f(x,y,z)$ or greater.

Below is an image of a surface that is degree (2,2). This is a screenshot taken from the interactive webgl2 demo I made for this post: Least Squares Surface Fitting

# How Do You Do It?

The two main players from the last post were the ATA matrix and the ATY vector. These two could be incrementally updated as new data points came in, which would allow you to do an incremental (or “online”) least squares fit of a curve.

When working with surfaces and volumes, you have the same things basically. Both the ATA matrix and the ATY vector still exist, but they contain slightly different information – which I’ll explain lower down. However, the ATY vector is renamed, since in the case of a surface it should be called ATZ, and for a volume it should be called ATW. I call it ATV to generalize it, where v stands for value, and represents the last component in a data point, which is the output value we are trying to fit given the input values. The input values are the rest of the components of the data point.

At the end, you once again need to solve the equation $A^TA*c=A^Tv$ to calculate the coefficients (named c) of the equation.

It’s all pretty similar to fitting a curve, but the details change a bit. Let’s work through an example to see how it differs.

# Example: Bilinear Surface Fitting

Let’s fit 4 data points to a bilinear surface, otherwise known as a surface that is linear on each axis, or a surface of degree(1,1):
(0,0,5)
(0,1,3)
(1,0,8)
(1,1,2)

Since we are fitting those data points with a bilinear surface, we are looking for a function that takes in x,y values and gives as output the z coordinate. We want a surface that gives us the closest answer possible (minimizing the sum of the squared difference for each input data point) for the data points we do have, so that we can give it other data points and get z values as output that approximate what we expect to see for that input.

A linear equation looks like this, with coefficients A and B:
$y=Ax+B$

Since we want a bilinear equation this time around, this is the equation we are going to end up with, after solving for the coefficients A,B,C,D:
$y=Axy+Bx+Cy+D$

The first step is to make the A matrix. In the last post, this matrix was made up of powers of the x coordinates. In this post, they are actually going to be made up of the permutation of powers of the x and y coordinates.

Last time the matrix looked like this:
$A = \begin{bmatrix} x_0^0 & x_0^1 & x_0^2 \\ x_1^0 & x_1^1 & x_1^2 \\ x_2^0 & x_2^1 & x_2^2 \\ x_3^0 & x_3^1 & x_3^2 \\ \end{bmatrix}$

This time, the matrix is going to look like this:
$A = \begin{bmatrix} x_0^0y_0^0 & x_0^0y_0^1 & x_0^1y_0^0 & x_0^1y_0^1 \\ x_1^0y_1^0 & x_1^0y_1^1 & x_1^1y_1^0 & x_1^1y_1^1 \\ x_2^0y_2^0 & x_2^0y_2^1 & x_2^1y_2^0 & x_2^1y_2^1 \\ x_3^0y_3^0 & x_3^0y_3^1 & x_3^1y_3^0 & x_3^1y_3^1 \\ \end{bmatrix}$

Simplifying that matrix a bit, it looks like this:
$A = \begin{bmatrix} 1 & y_0 & x_0 & x_0y_0 \\ 1 & y_1 & x_1 & x_1y_1 \\ 1 & y_2 & x_2 & x_2y_2 \\ 1 & y_3 & x_3 & x_3y_3 \\ \end{bmatrix}$

To simplify it even further, there is one row in the A matrix per data point, where the row looks like this:
$\begin{bmatrix} 1 & y & x & xy \\ \end{bmatrix}$

You can see that every permutation of the powers of x and y for each data point is present in the matrix.

The A matrix for our data points is this:
$A = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix}$

Next we need to calculate the ATA matrix by multiplying the transpose of that matrix, by that matrix.

$A^TA = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix} = \begin{bmatrix} 4 & 2 & 2 & 1 \\ 2 & 2 & 1 & 1 \\ 2 & 1 & 2 & 1 \\ 1 & 1 & 1 & 1 \\ \end{bmatrix}$

Taking the inverse of that matrix we get this:

$(A^TA)^{-1} = \begin{bmatrix} 1 & -1 & -1 & 1 \\ -1 & 2 & 1 & -2 \\ -1 & 1 & 2 & -2 \\ 1 & -2 & -2 & 4 \\ \end{bmatrix}$

Next we need to calculate the ATV vector (formerly known as ATY). We calculate that by multiplying the transpose of the A matrix by the Z values:

$A^TV = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} * \begin{bmatrix} 5 \\ 3 \\ 8 \\ 2 \\ \end{bmatrix} = \begin{bmatrix} 18 \\ 5 \\ 10 \\ 2 \\ \end{bmatrix}$

Lastly we multiply the inversed ATA matrix by the ATV vector to get our coefficients.

$\begin{bmatrix} 1 & -1 & -1 & 1 \\ -1 & 2 & 1 & -2 \\ -1 & 1 & 2 & -2 \\ 1 & -2 & -2 & 4 \\ \end{bmatrix} * \begin{bmatrix} 18 \\ 5 \\ 10 \\ 2 \\ \end{bmatrix} = \begin{bmatrix} 5 \\ -2 \\ 3 \\ -4 \\ \end{bmatrix}$

In the last post, the coefficients we got out were in x power order, so the first (top) was for the $x^0$ term, the next was for the $x^1$ term etc.

This time around, the coefficients are in the same order as the permutations of the powers of x and y:
$\begin{bmatrix} 1 & y & x & xy \\ \end{bmatrix}$

That makes our final equation this:
$z = -4xy+3x-2y+5$

If you plug in the (x,y) values from the data set we fit, you’ll see that you get the corresponding z values as output. We perfectly fit the data set!

The process isn’t too different from last post and not too difficult either right?

Let’s see if we can generalize and formalize things a bit.

# Some Observations

Firstly you may be wondering how we come up with the correct permutation of powers of our inputs. It actually doesn’t matter so long as you are consistent. You can have your A matrix rows have the powers in any order, so long as all orders are present, and you use the same order in all operations.

Regarding storage sizes needed, the storage of surfaces and (hyper) volumes are a bit different and generally larger than curves.

To see how, let’s look at the powers of the ATA matrix of a bilinear surface, using the ordering of powers that we used in the example:

$\begin{bmatrix} x^0y^0 & x^0y^1 & x^1y^0 & x^1y^1 \\ x^0y^1 & x^0y^2 & x^1y^1 & x^1y^2 \\ x^1y^0 & x^1y^1 & x^2y^0 & x^2y^1 \\ x^1y^1 & x^1y^2 & x^2y^1 & x^2y^2 \\ \end{bmatrix}$

Let’s rewrite it as just the powers:

$\begin{bmatrix} 00 & 01 & 10 & 11 \\ 01 & 02 & 11 & 12 \\ 10 & 11 & 20 & 21 \\ 11 & 12 & 21 & 22 \\ \end{bmatrix}$

And the permutation we used as just powers to help us find the pattern in the powers of x and y in the ATA matrix:
$\begin{bmatrix} 00 & 01 & 10 & 11 \\ \end{bmatrix}$

Can you find the pattern of the powers used at the different spots in the ATA matrix?

I had to stare at it for a while before I figured it out but it’s this: For the i,j location in the ATA matrix, the powers of x and y are the powers of x and y in the i permutation added to the powers of x and y in the j permutation.

For example, $A^TA_{0,2}$ has xy powers of 10. Permutation 0 has powers of 0,0 and permutation 2 has powers of 1,0, so we add those together to get powers 1,0.

Another example, $A^TA_{2,3}$ has xy powers of 21. Permutation 2 has powers of 1,0 and permutation 3 has powers 1,1. Adding those together we get 2,1 which is correct.

That’s a bit more complex than last post, not too much more difficult to construct the ATA matrix directly – and also construct it incrementally as new data points come in!

How many unique values are there in the ATA matrix though? We need to know this to know how much storage we need.

In the last post, we needed (degree+1)*2–1 values to store the unique ATA matrix values. That can also be written as degree*2+1.

It turns out that when generalizing this to surfaces and volumes, that we need to take the product of that for each axis.

For instance, a surface has ((degreeX*2)+1)*((degreeY*2)+1) unique values. A volume has ((degreeX*2)+1)*((degreeY*2)+1)*((degreeZ*2)+1) unique values.

The pattern continues for higher dimensions, as well as lower, since you can see how in the curve case, it’s the same formula as it was before.

For the same ATA matrix size, a surface has more unique values than a curve does.

As far as what those values actually are, they are the full permutations of the powers of a surface (or hyper volume) that is one degree higher on each axis. For a bilinear surface, that means the 9 permutations of x and y for powers 0,1 and 2:
$x^0y^0,x^0y^1,x^0y^2,x^1y^0,x^1y^1,x^1y^2,x^2y^0,x^2y^1,x^2y^2$
Or simplified:
$1,y,y^2,x,xy,xy^2,x^2,x^2y,x^2y^2$

For the bilinear case, The ATV vector is the sums of the permutations of x,y multiplied by z, for every data point. In other words, you add this to ATV for each data point:
$\begin{bmatrix} z & yz & xz & xyz \\ \end{bmatrix}$

How much storage space do we need in general for the ATV vector then? it’s the product of (degree+1) for each axis.

For instance, a surface has (degreeX+1)*(degreeY+1) values in ATV, and a volume has (degreeX+1)*(degreeY+1)*(degreeZ+1).

You may also be wondering how many data points are required minimum to fit a curve, surface or hypervolume to a data set. The answer is that you need as many data points as there are terms in the polynomial. We are trying to solve for the polynomial coefficients, so there are as many unknowns as there are polynomial terms.

How many polynomial terms are there? There are as many terms as there are permutations of the axes powers involved. In other words, the size of ATV is also the minimum number of points you need to fit a curve, surface, or hypervolume to a data set.

# Measuring Quality of a Fit

You are probably wondering if there’s a way to calculate how good of a fit you have for a given data set. It turns out that there are a few ways to calculate a value for this.

The value I use in the code below and in the demos is called $R^2$ or residue squared.

First you calculate the average (mean) output value from the input data set.

Then you calculate SSTot which is the sum of the square of the mean subtracted from each input point’s output value. Pseudo code:

SSTot = 0;
for (point p in points)
SSTot += (p.out - mean)^2;


You then calculate SSRes which is the sum of the square of the fitted function evaluated at a point, subtracted from each input points’ output value. Pseudo code:

SSRes= 0;
for (point p in points)
SSRes += (p.out - f(p.in))^2;


The final value for R^2 is 1-SSRes/SSTot.

The value is nice because it’s unitless, and since SSRes and SSTot is a sum of squares, SSRes/SSTot is basically the value that the fitting algorithm minimizes. The value is subtracted from 1 so that it’s a fit quality metric. A value of 0 is a bad fit, and a value of 1 is a good fit and generally it will be between those values.

# Example Code

Here is a run from the sample code:

And here is the source code:

#include <stdio.h>
#include <array>

#define FILTER_ZERO_COEFFICIENTS true // if false, will show terms which have a coefficient of 0

//====================================================================
template<size_t N>
using TVector = std::array<float, N>;

template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

//====================================================================
// Specify a degree per axis.
// 1 = linear, 2 = quadratic, etc
template <size_t... DEGREES>
class COnlineLeastSquaresFitter
{
public:
COnlineLeastSquaresFitter ()
{
// initialize our sums to zero
std::fill(m_SummedPowers.begin(), m_SummedPowers.end(), 0.0f);
std::fill(m_SummedPowersTimesValues.begin(), m_SummedPowersTimesValues.end(), 0.0f);
}

// Calculate how many summed powers we need.
// Product of degree*2+1 for each axis.
template <class T>
constexpr static size_t NumSummedPowers(T degree)
{
return degree * 2 + 1;
}
template <class T, class... DEGREES>
constexpr static size_t NumSummedPowers(T first, DEGREES... degrees)
{
return NumSummedPowers(first) * NumSummedPowers(degrees...);
}

// Calculate how many coefficients we have for our equation.
// Product of degree+1 for each axis.
template <class T>
constexpr static size_t NumCoefficients(T degree)
{
return (degree + 1);
}
template <class T, class... DEGREES>
constexpr static size_t NumCoefficients(T first, DEGREES... degrees)
{
return NumCoefficients(first) * NumCoefficients(degrees...);
}

// Helper function to get degree of specific axis
static size_t Degree (size_t axisIndex)
{
static const std::array<size_t, c_dimension-1> c_degrees = { DEGREES... };
return c_degrees[axisIndex];
}

// static const values
static const size_t c_dimension = sizeof...(DEGREES) + 1;
static const size_t c_numCoefficients = NumCoefficients(DEGREES...);
static const size_t c_numSummedPowers = NumSummedPowers(DEGREES...);

// Typedefs
typedef TVector<c_numCoefficients> TCoefficients;
typedef TVector<c_dimension> TDataPoint;

// Function for converting from an index to a specific power permutation
static void IndexToPowers (size_t index, std::array<size_t, c_dimension-1>& powers, size_t maxDegreeMultiply, size_t maxDegreeAdd)
{
for (int i = c_dimension-2; i >= 0; --i)
{
size_t degree = Degree(i) * maxDegreeMultiply + maxDegreeAdd;
powers[i] = index % degree;
index = index / degree;
}
}

// Function for converting from a specific power permuation back into an index
static size_t PowersToIndex (std::array<size_t, c_dimension - 1>& powers, size_t maxDegreeMultiply, size_t maxDegreeAdd)
{
size_t ret = 0;
for (int i = 0; i < c_dimension - 1; ++i)
{
ret *= Degree(i) * maxDegreeMultiply + maxDegreeAdd;
ret += powers[i];
}
return ret;
}

// Add a datapoint to our fitting
{
// Note: It'd be a good idea to memoize the powers and calculate them through repeated
// multiplication, instead of calculating them on demand each time, using std::pow.

// add the summed powers of the input values
std::array<size_t, c_dimension-1> powers;
for (size_t i = 0; i < m_SummedPowers.size(); ++i)
{
IndexToPowers(i, powers, 2, 1);
for (size_t j = 0; j < c_dimension - 1; ++j)
}

// add the summed powers of the input value, multiplied by the output value
for (size_t i = 0; i < m_SummedPowersTimesValues.size(); ++i)
{
IndexToPowers(i, powers, 1, 1);
float valueAdd = dataPoint[c_dimension - 1];
for (size_t j = 0; j < c_dimension-1; ++j)
}
}

// Get the coefficients of the equation fit to the points
bool CalculateCoefficients (TCoefficients& coefficients) const
{
// make the ATA matrix
std::array<size_t, c_dimension - 1> powersi;
std::array<size_t, c_dimension - 1> powersj;
std::array<size_t, c_dimension - 1> summedPowers;
TMatrix<c_numCoefficients, c_numCoefficients> ATA;
for (size_t j = 0; j < c_numCoefficients; ++j)
{
IndexToPowers(j, powersj, 1, 1);

for (size_t i = 0; i < c_numCoefficients; ++i)
{
IndexToPowers(i, powersi, 1, 1);

for (size_t k = 0; k < c_dimension - 1; ++k)
summedPowers[k] = powersi[k] + powersj[k];

size_t summedPowersIndex = PowersToIndex(summedPowers, 2, 1);
ATA[j][i] = m_SummedPowers[summedPowersIndex];
}
}

// solve: ATA * coefficients = m_SummedPowers
// for the coefficients vector, using Gaussian elimination.
coefficients = m_SummedPowersTimesValues;
for (size_t i = 0; i < c_numCoefficients; ++i)
{
for (size_t j = 0; j < c_numCoefficients; ++j)
{
if (ATA[i][i] == 0.0f)
return false;

float c = ((i == j) - ATA[j][i]) / ATA[i][i];
coefficients[j] += c*coefficients[i];
for (size_t k = 0; k < c_numCoefficients; ++k)
ATA[j][k] += c*ATA[i][k];
}
}

// Note: this is the old, "bad" way to solve the equation using matrix inversion.
// It's a worse choice for larger matrices, and surfaces and volumes use larger matrices than curves in general.
/*
// Inverse the ATA matrix
TMatrix<c_numCoefficients, c_numCoefficients> ATAInverse;
if (!InvertMatrix(ATA, ATAInverse))
return false;

// calculate the coefficients
for (size_t i = 0; i < c_numCoefficients; ++i)
coefficients[i] = DotProduct(ATAInverse[i], m_SummedPowersTimesValues);
*/

return true;
}

private:
//Storage Requirements:
// Summed Powers = Product of degree*2+1 for each axis.
// Summed Powers Times Values = Product of degree+1 for each axis.
TVector<c_numSummedPowers>		m_SummedPowers;
TVector<c_numCoefficients>		m_SummedPowersTimesValues;
};

//====================================================================
char AxisIndexToLetter (size_t axisIndex)
{
// x,y,z,w,v,u,t,....
if (axisIndex < 3)
return 'x' + char(axisIndex);
else
return 'x' + 2 - char(axisIndex);
}

//====================================================================
template <class T, size_t M, size_t N>
float EvaluateFunction (const T& fitter, const TVector<M>& dataPoint, const TVector<N>& coefficients)
{
float ret = 0.0f;
for (size_t i = 0; i < coefficients.size(); ++i)
{
float term = coefficients[i];

// then the powers of the input variables
std::array<size_t, T::c_dimension - 1> powers;
fitter.IndexToPowers(i, powers, 1, 1);
for (size_t j = 0; j < powers.size(); ++j)
term *= (float)std::pow(dataPoint[j], powers[j]);

// add this term to our return value
ret += term;
}
return ret;
}

//====================================================================
template <size_t... DEGREES>
void DoTest (const std::initializer_list<TVector<sizeof...(DEGREES)+1>>& data)
{
// say what we are are going to do
printf("Fitting a function of degree (");
for (size_t i = 0; i < COnlineLeastSquaresFitter<DEGREES...>::c_dimension - 1; ++i)
{
if (i > 0)
printf(",");
printf("%zi", COnlineLeastSquaresFitter<DEGREES...>::Degree(i));
}
printf(") to %zi data points: \n", data.size());

// show input data points
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
{
printf("  (");
for (size_t i = 0; i < dataPoint.size(); ++i)
{
if (i > 0)
printf(", ");
printf("%0.2f", dataPoint[i]);
}
printf(")\n");
}

// fit data
COnlineLeastSquaresFitter<DEGREES...> fitter;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)

// calculate coefficients if we can
COnlineLeastSquaresFitter<DEGREES...>::TCoefficients coefficients;
bool success = fitter.CalculateCoefficients(coefficients);
if (!success)
{
printf("Could not calculate coefficients!\n\n");
return;
}

// print the polynomial
bool firstTerm = true;
printf("%c = ", AxisIndexToLetter(sizeof...(DEGREES)));
bool showedATerm = false;
for (int i = (int)coefficients.size() - 1; i >= 0; --i)
{
// don't show zero terms
if (FILTER_ZERO_COEFFICIENTS && std::abs(coefficients[i]) < 0.00001f)
continue;

showedATerm = true;

// show an add or subtract between terms
float coefficient = coefficients[i];
if (firstTerm)
firstTerm = false;
else if (coefficient >= 0.0f)
printf(" + ");
else
{
coefficient *= -1.0f;
printf(" - ");
}

printf("%0.2f", coefficient);

std::array<size_t, COnlineLeastSquaresFitter<DEGREES...>::c_dimension - 1> powers;
fitter.IndexToPowers(i, powers, 1, 1);

for (size_t j = 0; j < powers.size(); ++j)
{
if (powers[j] > 0)
printf("%c", AxisIndexToLetter(j));
if (powers[j] > 1)
printf("^%zi", powers[j]);
}
}
if (!showedATerm)
printf("0");
printf("\n");

// Calculate and show R^2 value.
float rSquared = 1.0f;
if (data.size() > 0)
{
float mean = 0.0f;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
mean += dataPoint[sizeof...(DEGREES)];
mean /= data.size();
float SSTot = 0.0f;
float SSRes = 0.0f;
for (const COnlineLeastSquaresFitter<DEGREES...>::TDataPoint& dataPoint : data)
{
float value = dataPoint[sizeof...(DEGREES)] - mean;
SSTot += value*value;

value = dataPoint[sizeof...(DEGREES)] - EvaluateFunction(fitter, dataPoint, coefficients);
SSRes += value*value;
}
if (SSTot != 0.0f)
rSquared = 1.0f - SSRes / SSTot;
}
printf("R^2 = %0.4f\n\n", rSquared);
}

//====================================================================
int main (int argc, char **argv)
{
// bilinear - 4 data points
DoTest<1, 1>(
{
TVector<3>{ 0.0f, 0.0f, 5.0f },
TVector<3>{ 0.0f, 1.0f, 3.0f },
TVector<3>{ 1.0f, 0.0f, 8.0f },
TVector<3>{ 1.0f, 1.0f, 2.0f },
}
);

// biquadratic - 9 data points
DoTest<2, 2>(
{
TVector<3>{ 0.0f, 0.0f, 8.0f },
TVector<3>{ 0.0f, 1.0f, 4.0f },
TVector<3>{ 0.0f, 2.0f, 6.0f },
TVector<3>{ 1.0f, 0.0f, 5.0f },
TVector<3>{ 1.0f, 1.0f, 2.0f },
TVector<3>{ 1.0f, 2.0f, 1.0f },
TVector<3>{ 2.0f, 0.0f, 7.0f },
TVector<3>{ 2.0f, 1.0f, 9.0f },
TVector<3>{ 2.0f, 2.5f, 12.0f },
}
);

// trilinear - 8 data points
DoTest<1, 1, 1>(
{
TVector<4>{ 0.0f, 0.0f, 0.0f, 8.0f },
TVector<4>{ 0.0f, 0.0f, 1.0f, 4.0f },
TVector<4>{ 0.0f, 1.0f, 0.0f, 6.0f },
TVector<4>{ 0.0f, 1.0f, 1.0f, 5.0f },
TVector<4>{ 1.0f, 0.0f, 0.0f, 2.0f },
TVector<4>{ 1.0f, 0.0f, 1.0f, 1.0f },
TVector<4>{ 1.0f, 1.0f, 0.0f, 7.0f },
TVector<4>{ 1.0f, 1.0f, 1.0f, 9.0f },
}
);

// trilinear - 9 data points
DoTest<1, 1, 1>(
{
TVector<4>{ 0.0f, 0.0f, 0.0f, 8.0f },
TVector<4>{ 0.0f, 0.0f, 1.0f, 4.0f },
TVector<4>{ 0.0f, 1.0f, 0.0f, 6.0f },
TVector<4>{ 0.0f, 1.0f, 1.0f, 5.0f },
TVector<4>{ 1.0f, 0.0f, 0.0f, 2.0f },
TVector<4>{ 1.0f, 0.0f, 1.0f, 1.0f },
TVector<4>{ 1.0f, 1.0f, 0.0f, 7.0f },
TVector<4>{ 1.0f, 1.0f, 1.0f, 9.0f },
TVector<4>{ 0.5f, 0.5f, 0.5f, 12.0f },
}
);

// Linear - 2 data points
DoTest<1>(
{
TVector<2>{ 1.0f, 2.0f },
TVector<2>{ 2.0f, 4.0f },
}
);

// Quadratic - 4 data points
DoTest<2>(
{
TVector<2>{ 1.0f, 5.0f },
TVector<2>{ 2.0f, 16.0f },
TVector<2>{ 3.0f, 31.0f },
TVector<2>{ 4.0f, 16.0f },
}
);

// Cubic - 4 data points
DoTest<3>(
{
TVector<2>{ 1.0f, 5.0f },
TVector<2>{ 2.0f, 16.0f },
TVector<2>{ 3.0f, 31.0f },
TVector<2>{ 4.0f, 16.0f },
}
);

system("pause");
return 0;
}


# Closing

The next logical step here for me would be to figure out how to break the equation for a surface or hypervolume up into multiple equations, like you’d have with a tensor product surface/hypervolume equation. It would also be interesting to see how to convert from these multidimensional polynomials to multidimensional Bernstein basis functions, which are otherwise known as Bezier rectangles (and Bezier hypercubes i guess).

The last post inverted the ATA matrix and multiplied by ATY to get the coefficients. Thanks to some feedback on reddit, I found out that is NOT how you want to solve this sort of equation. I ended up going with Gaussian elimination for this post which is more numerically robust while also being less computation to calculate. There are other options out there too that may be even better choices. I’ve found out that in general, if you are inverting a matrix in code, or even just using an inverted matrix that has already been given to you, you are probably doing it wrong. You can read more about that here: John D. Cook: Don’t invert that matrix.

I didn’t go over what to do if you don’t have enough data points because if you find yourself in that situation, you can either decrease the degree of one of the axes, or you could remove and axis completely if you wanted to. It’s situational and ambiguous what parameter to decrease when you don’t have enough data points to fit a specific curve or hypervolume, but it’s still possible to decay the fit to a lower degree or dimension if you hit this situation, because you will already have all the values you need in the ATA matrix values and the ATV vector. I leave that to you to decide how to handle it in your own usage cases. Something interesting to note is that ATA[0][0] is STILL the count of how many data points you have, so you can use this value to know how much you need to decay your fit to be able to handle the data set.

In the WebGL2 demo I mention, I use a finite difference method to calculate the normals of the surface, however since the surface is described by a polynomial, it’d be trivial to calculate the coefficients for the equations that described the partial derivatives of the surface for each axis and use those instead.

I also wanted to mention that in the case of surfaces and hypervolumes it’s still possible to get an imperfect fit to your data set, even though you may give the exact minimum required number of control points. The reason for this is that not all axes are necesarily created equal. If you have a surface of degree (1,2) it’s linear on the x axis, but quadratic on the y axis, and requires a minimum of 6 data points to be able to fit a data set. As you can imagine, it’s possible to give data points such that the data points ARE NOT LINEAR on the x axis. When this happens, the surface will not be a perfect fit.

Lastly, you may be wondering how to fit data sets where there is more than one output value, like an equation of the form $(z,w)=f(x,y)$.

I’m not aware of any ways to least square fit that as a whole, but apparently a common practice is to fit one equation to z and another to w and treat them independently. There is a math stack exchange question about that here: Math Stack Exchange: Least square fitting multiple values

Here is the webgl demo that goes with this post again:
Least Squares Surface Fitting

Thanks for reading, and please speak up if you have anything to add or correct, or a comment to make!

## Incremental Least Squares Curve Fitting

This Post In Short:

• Fit a curve of degree N to a data set, getting data points 1 at a time.
• Storage Required: 3*N+2 values.
• Update Complexity: roughly 3*N+2 additions and multiplies.
• Finalize Complexity: Solving Ax=b where A is an (N+1)x(N+1) matrix and b is a known vector. (Sample code inverts A matrix and multiplies by b, Gaussian elimination is better though).
• Simple C++ code and HTML5 demo at bottom!

I was recently reading a post from a buddy on OIT or “Order Independent Transparency” which is an open problem in graphics:
Fourier series based OIT and why it won’t work

In the article he talks about trying to approximate a function per pixel and shows the details of some methods he tried. One of the difficulties with the problem is that during a render you can get any number of triangles affecting a specific pixel, but you need a fixed and bounded size amount of storage per pixel for those variable numbers of data points.

That made me wonder: Is there an algorithm that can approximate a data set with a function, getting only one data point at a time, and end up with a decent approximation?

It turns out that there is one, at least one that I am happy with: Incremental Least Squares Curve Fitting.

While this perhaps doesn’t address all the problems that need addressing for OIT specifically, I think this is a great technique for programming in general, and I’m betting it still has it’s uses in graphics, for other times when you want to approximate a data set per pixel.

We’ll work through a math oriented way to do it, and then we’ll convert it into an equivalent and simpler programmer friendly version.

At the bottom of the post is some simple C++ that implements everything we talk about and the image below is a screenshot of an an interactive HTML5 demo I made: Least Squares Curve Fitting

# Mathy Version

Math Stack Exchange: Creating a function incrementally

I have to admit, I’m not so great with matrices outside of the typical graphics/gamedev usage cases of transormation and related, so it took me a few days to work through it and understand it all. If reading that answer made your eyes go blurry, give my explanation a shot. I’m hoping I gave step by step details enough such that you too can understand what the heck he was talking about. If not, let me know where you got lost and I can explain better and update the post.

The first thing we need to do is figure out what degree of a function we want to approximate our data with. For our example we’ll pick a degree 2 function, also known as a quadratic function. That means that when we are done we will get out a function of the form below:

$y=ax^2+bx+c$

We will give data points to the equation and it will calculate the values of a,b and c that approximate our function by minimizing the sum of the squared distance from each point to the curve.

We’ll deal with regular least squared fitting before moving onto incremental, so here’s the data set we’ll be fitting our quadratic curve to:

$(1,5),(2,16),(3,31),(4,50)$

The x values in my data set start at 1 and count up by 1, but that is not a requirement. You can use whatever x and y values you want to fit a curve to.

Next we need to calculate the matrix $A$, where $A_{jk} = x_j^k$ and the matrix has NumDataPoints rows and Degree+1 columns. It looks like the below for a quadratic curve fitting 4 data points:

$A = \begin{bmatrix} x_0^0 & x_0^1 & x_0^2 \\ x_1^0 & x_1^1 & x_1^2 \\ x_2^0 & x_2^1 & x_2^2 \\ x_3^0 & x_3^1 & x_3^2 \\ \end{bmatrix}$

When we plug in our specific x values we get this:

$A = \begin{bmatrix} 1^0 & 1^1 & 1^2 \\ 2^0 & 2^1 & 2^2 \\ 3^0 & 3^1 & 3^2 \\ 4^0 & 4^1 & 4^2 \\ \end{bmatrix}$

Calculating it out we get this:

$A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \\ \end{bmatrix}$

Next we need to calculate the matrix $A^TA$, which we do below by multiplying the transpose of A by A:

$A^TA = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ \end{bmatrix} * \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \\ \end{bmatrix} = \begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

Next we need to find the inverse of that matrix to get $(A^TA)^{-1}$. The inverse is:

$(A^TA)^{-1} = \begin{bmatrix} 31/4 & -27/4 & 5/4 \\ -27/4 & 129/20 & -5/4 \\ 5/4 & -5/4 & 1/4 \\ \end{bmatrix}$

The next thing we need to calculate is $A^TY$, which is the transpose of A multiplied by all of the Y values of our data:

$A^TY = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \\ 1 & 4 & 9 & 16 \\ \end{bmatrix} * \begin{bmatrix} 5 \\ 16 \\ 31 \\ 50 \\ \end{bmatrix} = \begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

And finally, to calculate the coefficients of our quadratic function, we need to calculate $(A^TA)^{-1}*A^TY$:

$(A^TA)^{-1}*A^TY = \begin{bmatrix} 31/4 & -27/4 & 5/4 \\ -27/4 & 129/20 & -5/4 \\ 5/4 & -5/4 & 1/4 \\ \end{bmatrix} * \begin{bmatrix} 102 \\ 330 \\ 1148 \\ \end{bmatrix} = \begin{bmatrix} -2 & 5 & 2 \\ \end{bmatrix}$

Those coefficients are listed in power order of x, so the first value -2 is the coefficient for x^0, 5 is the coefficient for x^1 and 2 is the coefficient for x^2. That gives us the equation:

$y=2x^2+5x-2$

If you plug in the x values from our data set, you’ll find that this curve perfectly fits all 4 data points.

It won’t always be (and usually won’t be) that a resulting curve matches the input set for all values. It just so happened that this time it does. The only guarantee you’ll get when fitting a curve to the data points is that the squared distance of the point to the curve (distance on the Y axis only, so vertical distance), is minimized for all data points.

Now that we’ve worked through the math, let’s make some observations and make it more programmer friendly.

# Making it Programmer Friendly

Let’s look at the $A^TA$ matrix again:

$\begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

One thing you probably noticed right away is that it’s symmetric across the diagonal. Another thing you may have noticed is that there are only 5 unique values in that matrix.

As it turns out, those 5 values are just the sum of the x values, when those x values are raised to increasing powers.

• If you take all x values of our data set, raise them to the 0th power and sum the results, you get 4.
• If you take all x values of our data set, raise them to the 1st power and sum the results, you get 10.
• If you take all x values of our data set, raise them to the 2nd power and sum the results, you get 30.
• If you take all x values of our data set, raise them to the 3rd power and sum the results, you get 100.
• If you take all x values of our data set, raise them to the 4th power and sum the results, you get 354.

Further more, the power of the x values in each part of the matrix is the zero based x axis index plus the zero based y axis index. Check out what i mean below, which shows which power the x values are taken to before being summed for each location in the matrix:

$\begin{bmatrix} 0 & 1 & 2 \\ 1 & 2 & 3 \\ 2 & 3 & 4 \\ \end{bmatrix}$

That is interesting for two reasons…

1. This tells us that we only really need to store the 5 unique values, and that we can reconstruct the full matrix later when it’s time to calculate the coefficients.
2. It also tells us that if we’ve fit a curve to some data points, but then want to add a new data point, that we can just raise the x value of our new data point to the different powers and add it into these 5 values we already have stored. In other words, the $A^TA$ matrix can be incrementally adjusted as new data comes in.

This generalizes beyond quadratic functions too luckily. If you are fitting your data points with a degree N curve, the $A^TA$ matrix will have N+1 rows, and N+1 columns, but will only have (N+1)*2-1 unique values stored in it. Those values will be the sum of the x values taken from the 0th power up to the (N+1)*2-2th power.

As a concrete example, a cubic fit will have an $A^TA$ array that is 4×4, which will only have 7 unique values stored in it. Those values will be the x values raised to the 0th power and summed, all the way up to the x values raised to the 6th power and summed.

So, the $A^TA$ matrix has a fixed storage amount of (degree+1)*2 – 1 values, and it can be incrementally updated.

That is great, but there is another value we need to look at too, which is the $A^TY$ vector. Let’s see that again:

$\begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

There are some patterns to this vector too luckily. You may have noticed that the first entry is the sum of the Y values from our data set. It’s actually the sum of the y values multiplied by the x values raised to the 0th power.

The next number is the sum of the y values multiplied by the x values raised to the 1st power, and so on.

To generalize it, each entry in that vector is the sum of taking the x from each data point, raising it to the power that is the index in the vector, and multiplying it by the y value.

• Taking each data point’s x value, raising it to the 0th power, multiplying by the y value, and summing the results gives you 102.
• Taking each data point’s x value, raising it to the 1st power, multiplying by the y value, and summing the results gives you 330.
• Taking each data point’s x value, raising it to the 2nd power, multiplying by the y value, and summing the results gives you 1148.

So, this vector is incrementally updatable too. When you get a new data point, for each entry in the vector, you take the x value to the specific power, multiply by y, and add that result to the entry in the vector.

This generalizes for other curve types as well. If you are fitting your data points with a degree N curve, the $A^TY$ vector will have N+1 entries, corresponding to the powers: 0,1,…N.

As a concrete example, a cubic fit will have an $A^TY$ vector of size 4, corresponding to the powers: 0,1,2,3.

Combining the storage needs of the values needed for the $A^TA$ matrix, as well as the values needed for the $A^TY$ vector, the amount of storage we need for a degree N curve fit is 3*N+2 values.

# Algorithm Summarized

Here is a summary of the algorithm:

1. First decide on the degree of the fit you want. Call it N.
2. Ensure you have storage space for 3*N+2 values and initialize them all to zero. These represent the (N+1)*2-1 values needed for the $A^TA$ matrix values, as well as the N+1 values needed for the $A^TY$ vector.
3. For each data point you get, you will need to update both the $A^TA$ matrix values, as well as the $A^TY$ vector valuess. (Note that there are not the same number of values in ATA and ATY!)
• for(i in ATA) ATA[i] += x^i
• for(i in ATY) ATY[i] += x^i*y
4. When it’s time to calculate the coefficients of your polynomial, convert the ATA values back into the $A^TA$ matrix, invert it and multiply that by the $A^TY$ value.

Pretty simple right?

# Not Having Enough Points

When working through the mathier version of this algorithm, you may have noticed that if we were trying to fit using a degree N curve, that we needed N+1 data points at minimum for the math to even be able to happen.

So, you might ask, what happens in the real world, such as in a pixel shader, where we decide to do a cubic fit, but end up only getting 1 data point, instead of the 4 minimum that we need?

Well, first off, if you use the programmer friendly method of incrementally updating ATA and ATY, you’ll end up with an uninvertible matrix (0 determinant), but that doesn’t really help us any besides telling us when we don’t have enough data.

There is something pretty awesome hiding here though. Let’s look at the ATA matrix and ATY values from our quadratic example again.

$A^TA = \begin{bmatrix} 4 & 10 & 30 \\ 10 & 30 & 100 \\ 30 & 100 & 354 \\ \end{bmatrix}$

$A^TY = \begin{bmatrix} 102 & 330 & 1148 \\ \end{bmatrix}$

The above values are for a quadratic fit. What if we wanted a linear fit instead? Well… the upper left 2×2 matrix in ATA is the ATA matrix for the linear fit! Also, the first two values in the ATY vector is the ATY vector if we were doing a linear fit.

$A^TA = \begin{bmatrix} 4 & 10 \\ 10 & 30 \\ \end{bmatrix}$

$A^TY = \begin{bmatrix} 102 & 330 \\ \end{bmatrix}$

You can verify that the linear fit above is correct if you want, but let’s take it down another degree, down to approximating the fit with a point. They become scalars instead of matrices and vectors:

$A^TA = 4 \\ A^TY = 102$

If we take the inverse of ATA and multiply it by ATY, we get:

$1/4 * 102 = 25.5$

if you average the Y values of our input data, you’ll find that it is indeed 25.5, so we have verified that it does give us a degree 0 fit.

This is neat and all, but how can we KNOW if we’ve collected enough data or not? Do we just try to invert our ATA matrix, and if it fails, try one degree lower, repeatedly, until we succeed or fail at a degree 0 approximation? Do we maybe instead store a counter to keep track of how many points we have seen?

Luckily no, and maybe you have already put it together. The first value in the ATA array actually TELLS you how many points you have been given. You can use that to decide what degree you are going to have to actually fit the data set to when it’s time to calculate your coefficients, to avoid the uninvertible matrix and still get your data fit.

# Interesting Tid Bits

Something pretty awesome about this algorithm is that it can work in a multithreaded fashion very easily. One way would be to break apart the work into multiple job threads, have them calculate ATA and ATY independently, and then sum them all together on the main thread. Another way to do it would be to let all threads share the same ATA and ATY storage, but to use an atomic add operation to update them.

Going the atomic add route, I think this could be a relatively GPU friendly algorithm. You could use actual atomic operations in your shader code, or you could use alpha blending to add your result in.

Even though we saw it in the last section, I’ll mention it again. If you do a degree 0 curve fit to data points (aka fitting a point to the data), this algorithm is mathematically equivalent to just taking the average y value. The ATA values will have a single value which is the sum of the x values to the 0th degree, so will be the count of how many x items there are. The ATY values will also have only a single value, which will be the sum of the x^0*y values, so will be the sum of the y values. Taking the inverse of our 1×1 ATA matrix will give us one divided by how many items there are, so when we multiply that by the ATA vector which only has one item, it will be the same as if we divided our Y value sum by how many data points we had. So, in a way, this algorithm seems to be some sort of generalization of averaging, which is weird to me.

Another cool thing: if you have the minimum number of data points for your degree (aka degree+1 data points) or fewer, you can actually use the ATA and ATY values to get back your ORIGINAL data points – both the X and the Y values! I’ll leave it as an exercise for you, but if you look at it, you will always have more equations than you do unknowns.

If reconstructing the original data points is important to you, you could also have this algorithm operate in two modes.

Basically, always use the ATA[0] storage space to count the number of data points you’ve been given, since that is it’s entire purpose. You can then use the rest of the storage space as RAW data storage for your 2d input values. As soon as adding another value would cause you to overflow your storage, you could process your data points into the correct format of storing just ATA and ATY values, so that from then on, it was an approximation, instead of explicit point storage.

When decoding those values, you would use the ATA[0] storage space to know whether the rest of the storage contained ATA and ATY values, or if they contained data points. If they contained data points, you would also know how many there were, and where they were in the storage space, using the same logic to read data points out as you used to put them back in – basically like saying that the first data point goes immediately after ATA[0], the second data point after that, etc.

The last neat thing, let’s say that you are in a pixel shader as an exmaple, and that you wanted to approximate 2 different values for each pixel, but let’s say that the X value was always the same for these data sets – maybe you are approximating two different values over depth of the pixel for instance so X of both data points is the depth, but the Y values of the data points are different.

If you find yourself in a situation like this, you don’t actually need to pay the full cost of storage to have a second approximation curve.

Since the ATA values are all based on powers of the x values only, the ATA values would be the same for both of these data sets. You only need to pay the cost of the ATY values for the second curve.

This means that fitting a curve costs an initial 3*degree+2 in storage, but each additional curve only costs degree+1 in storage.

Also, since the ATA storage for a curve of degree N also contains the same values used for a curve of degree N-1, N-2, etc, you don’t have to use the same degree when approximating multiple values using the same storage. Your ATA just has to be large enough to hold the largest degree curve, and then you can have ATY values that are sized to the degree of the curve you want to use to approximate each data set.

This way, if you have limited storage, you could perhaps cubically fit one data set, and then linearly fit another data set where accuracy isn’t as important.

For that example, you would pay 11 values of storage for the cubic fit, and then only 2 more values of storage to have a linear fit of some other data.

# Example Code

There is some example code below that implements the ideas from this post.

The code is meant to be clear and readable firstly, with being a reasonably decent implementation second. If you are using this in a place where you want high precision and/or high speeds, there are likely both macro and micro optimizations and code changes to be made. The biggest of these is probably how the matrix is inverted.

You can read more on the reddit discussion: Reddit: Incremental Least Squares Curve Fitting

Here’s a run of the example code:

Here is the example code:

#include <stdio.h>
#include <array>

//====================================================================
template<size_t N>
using TVector = std::array<float, N>;

template<size_t M, size_t N>
using TMatrix = std::array<TVector<N>, M>;

template<size_t N>
using TSquareMatrix = TMatrix<N,N>;

typedef TVector<2> TDataPoint;

//====================================================================
template <size_t N>
float DotProduct (const TVector<N>& A, const TVector<N>& B)
{
float ret = 0.0f;
for (size_t i = 0; i < N; ++i)
ret += A[i] * B[i];
return ret;
}

//====================================================================
template <size_t M, size_t N>
void TransposeMatrix (const TMatrix<M, N>& in, TMatrix<N, M>& result)
{
for (size_t j = 0; j < M; ++j)
for (size_t k = 0; k < N; ++k)
result[k][j] = in[j][k];
}

//====================================================================
template <size_t M, size_t N>
void MinorMatrix (const TMatrix<M, N>& in, TMatrix<M-1, N-1>& out, size_t excludeI, size_t excludeJ)
{
size_t destI = 0;
for (size_t i = 0; i < N; ++i)
{
if (i != excludeI)
{
size_t destJ = 0;
for (size_t j = 0; j < N; ++j)
{
if (j != excludeJ)
{
out[destI][destJ] = in[i][j];
++destJ;
}
}
++destI;
}
}
}

//====================================================================
template <size_t M, size_t N>
float Determinant (const TMatrix<M,N>& in)
{
float determinant = 0.0f;
TMatrix<M - 1, N - 1> minor;
for (size_t j = 0; j < N; ++j)
{
MinorMatrix(in, minor, 0, j);

float minorDeterminant = Determinant(minor);
if (j % 2 == 1)
minorDeterminant *= -1.0f;

determinant += in[0][j] * minorDeterminant;
}
return determinant;
}

//====================================================================
template <>
float Determinant<1> (const TMatrix<1,1>& in)
{
return in[0][0];
}

//====================================================================
template <size_t N>
bool InvertMatrix (const TSquareMatrix<N>& in, TSquareMatrix<N>& out)
{
// calculate the cofactor matrix and determinant
float determinant = 0.0f;
TSquareMatrix<N> cofactors;
TSquareMatrix<N-1> minor;
for (size_t i = 0; i < N; ++i)
{
for (size_t j = 0; j < N; ++j)
{
MinorMatrix(in, minor, i, j);

cofactors[i][j] = Determinant(minor);
if ((i + j) % 2 == 1)
cofactors[i][j] *= -1.0f;

if (i == 0)
determinant += in[i][j] * cofactors[i][j];
}
}

// matrix cant be inverted if determinant is zero
if (determinant == 0.0f)
return false;

// calculate the adjoint matrix into the out matrix
TransposeMatrix(cofactors, out);

// divide by determinant
float oneOverDeterminant = 1.0f / determinant;
for (size_t i = 0; i < N; ++i)
for (size_t j = 0; j < N; ++j)
out[i][j] *= oneOverDeterminant;
return true;
}

//====================================================================
template <>
bool InvertMatrix<2> (const TSquareMatrix<2>& in, TSquareMatrix<2>& out)
{
float determinant = Determinant(in);
if (determinant == 0.0f)
return false;

float oneOverDeterminant = 1.0f / determinant;
out[0][0] =  in[1][1] * oneOverDeterminant;
out[0][1] = -in[0][1] * oneOverDeterminant;
out[1][0] = -in[1][0] * oneOverDeterminant;
out[1][1] =  in[0][0] * oneOverDeterminant;
return true;
}

//====================================================================
template <size_t DEGREE>  // 1 = linear, 2 = quadratic, etc
class COnlineLeastSquaresFitter
{
public:
COnlineLeastSquaresFitter ()
{
// initialize our sums to zero
std::fill(m_SummedPowersX.begin(), m_SummedPowersX.end(), 0.0f);
std::fill(m_SummedPowersXTimesY.begin(), m_SummedPowersXTimesY.end(), 0.0f);
}

{
// add the summed powers of the x value
float xpow = 1.0f;
for (size_t i = 0; i < m_SummedPowersX.size(); ++i)
{
m_SummedPowersX[i] += xpow;
xpow *= dataPoint[0];
}

// add the summed powers of the x value, multiplied by the y value
xpow = 1.0f;
for (size_t i = 0; i < m_SummedPowersXTimesY.size(); ++i)
{
m_SummedPowersXTimesY[i] += xpow * dataPoint[1];
xpow *= dataPoint[0];
}
}

bool CalculateCoefficients (TVector<DEGREE+1>& coefficients) const
{
// initialize all coefficients to zero
std::fill(coefficients.begin(), coefficients.end(), 0.0f);

// calculate the coefficients
return CalculateCoefficientsInternal<DEGREE>(coefficients);
}

private:

template <size_t EFFECTIVEDEGREE>
bool CalculateCoefficientsInternal (TVector<DEGREE + 1>& coefficients) const
{
// if we don't have enough data points for this degree, try one degree less
if (m_SummedPowersX[0] <= EFFECTIVEDEGREE)
return CalculateCoefficientsInternal<EFFECTIVEDEGREE - 1>(coefficients);

// Make the ATA matrix
TMatrix<EFFECTIVEDEGREE + 1, EFFECTIVEDEGREE + 1> ATA;
for (size_t i = 0; i < EFFECTIVEDEGREE + 1; ++i)
for (size_t j = 0; j < EFFECTIVEDEGREE + 1; ++j)
ATA[i][j] = m_SummedPowersX[i + j];

// calculate inverse of ATA matrix
TMatrix<EFFECTIVEDEGREE + 1, EFFECTIVEDEGREE + 1> ATAInverse;
if (!InvertMatrix(ATA, ATAInverse))
return false;

// calculate the coefficients for this degree. The higher ones are already zeroed out.
TVector<EFFECTIVEDEGREE + 1> summedPowersXTimesY;
std::copy(m_SummedPowersXTimesY.begin(), m_SummedPowersXTimesY.begin() + EFFECTIVEDEGREE + 1, summedPowersXTimesY.begin());
for (size_t i = 0; i < EFFECTIVEDEGREE + 1; ++i)
coefficients[i] = DotProduct(ATAInverse[i], summedPowersXTimesY);
return true;
}

// Base case when no points are given, or if you are fitting a degree 0 curve to the data set.
template <>
bool CalculateCoefficientsInternal<0> (TVector<DEGREE + 1>& coefficients) const
{
if (m_SummedPowersX[0] > 0.0f)
coefficients[0] = m_SummedPowersXTimesY[0] / m_SummedPowersX[0];
return true;
}

// Total storage space (# of floats) needed is 3 * DEGREE + 2
// Where y is number of values that need to be stored and x is the degree of the polynomial
TVector<(DEGREE + 1) * 2 - 1>   m_SummedPowersX;
TVector<DEGREE + 1>             m_SummedPowersXTimesY;
};

//====================================================================
template <size_t DEGREE>
void DoTest(const std::initializer_list<TDataPoint>& data)
{
printf("Fitting a curve of degree %zi to %zi data points:\n", DEGREE, data.size());

COnlineLeastSquaresFitter<DEGREE> fitter;

// show data
for (const TDataPoint& dataPoint : data)
printf("  (%0.2f, %0.2f)\n", dataPoint[0], dataPoint[1]);

// fit data
for (const TDataPoint& dataPoint : data)

// calculate coefficients if we can
TVector<DEGREE+1> coefficients;
bool success = fitter.CalculateCoefficients(coefficients);
if (!success)
{
printf("ATA Matrix could not be inverted!\n");
return;
}

// print the polynomial
bool firstTerm = true;
printf("y = ");
bool showedATerm = false;
for (int i = (int)coefficients.size() - 1; i >= 0; --i)
{
// don't show zero terms
if (std::abs(coefficients[i]) < 0.00001f)
continue;

showedATerm = true;

// show an add or subtract between terms
float coefficient = coefficients[i];
if (firstTerm)
firstTerm = false;
else if (coefficient >= 0.0f)
printf(" + ");
else
{
coefficient *= -1.0f;
printf(" - ");
}

printf("%0.2f", coefficient);

if (i > 0)
printf("x");

if (i > 1)
printf("^%i", i);
}
if (!showedATerm)
printf("0");
printf("\n\n");
}

//====================================================================
int main (int argc, char **argv)
{
// Point - 1 data points
DoTest<0>(
{
TDataPoint{ 1.0f, 2.0f },
}
);

// Point - 2 data points
DoTest<0>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
}
);

// Linear - 2 data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
}
);

// Linear - 3 colinear data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 6.0f },
}
);

// Linear - 3 non colinear data points
DoTest<1>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 5.0f },
}
);

// Quadratic - 3 colinear data points
DoTest<2>(
{
TDataPoint{ 1.0f, 2.0f },
TDataPoint{ 2.0f, 4.0f },
TDataPoint{ 3.0f, 6.0f },
}
);

// Quadratic - 3 data points
DoTest<2>(
{
TDataPoint{ 1.0f, 5.0f },
TDataPoint{ 2.0f, 16.0f },
TDataPoint{ 3.0f, 31.0f },
}
);

// Cubic - 4 data points
DoTest<3>(
{
TDataPoint{ 1.0f, 5.0f },
TDataPoint{ 2.0f, 16.0f },
TDataPoint{ 3.0f, 31.0f },
TDataPoint{ 4.0f, 16.0f },
}
);

// Cubic - 2 data points
DoTest<3>(
{
TDataPoint{ 1.0f, 7.0f },
TDataPoint{ 3.0f, 17.0f },
}
);

// Cubic - 1 data point
DoTest<3>(
{
TDataPoint{ 1.0f, 7.0f },
}
);

// Cubic - 0 data points
DoTest<3>(
{
}
);

system("pause");
return 0;
}


# Feedback

There’s some interesting feedback on twitter.

Here’s an interactive demo to let you get a feel for how least squares curve fitting behaves:
Least Squares Curve Fitting

A good online polynomial curve fitting calculator

By the way, the term for an algorithm which works incrementally by taking only some of the data at a time is called an “online algorithm”. If you are ever in search of an online algorithm to do X (whatever X may be), using this term can be very helpful when searching for such an algorithm, or when asking people if such an algorithm exists (like on stack exchange). Unfortunately, online is a bit overloaded in the modern world, so it can also give false hits (;
Wikipedia: Online algorithm

## Evaluating Points on Analytical Surfaces and in Analytical Volumes Using the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

This extension shows how to use the technique to evaluate points on surfaces and inside of volumes, where those surfaces and volumes are defined either by Bezier curves or polynomials (Tensor products of polynomials to be more specific).

As an example of what this post will allow you to do:

• By taking a single sample of a 3d RGBA volume texture, you’ll be able to get a bicubic interpolated value (a bicubic surface).
• Alternately, taking a single sample of a 3d RGBA volume texture will allow you to get a linear interpolation between two biquadratic surfaces (a linear/biquadratic volume).
• This post also covers how to extend this to higher degree surfaces and volumes.

Here are two images generated by the WebGL2 demos I made for this post which utilize this technique for rendering surfaces, fog volumes, and solid volumes. (link to the demos at bottom of post!)

All textures are size 2 on each axis which makes it a cache friendly technique (you can grow the texture sizes for piecewise curves/surfaces/volumes though). It leverages the hardware interpolation which makes it a relatively computationally inexpensive technique, and it supports all polynomials within the limitations of floating point math, so is also very flexible and expressive. You could even extend this to rational polynomial surfaces and volumes which among other things would allow perfect representations of conic sections.

The animated Bezier curve images in this post came from wikipedia. Go have a look and drop them a few bucks if you find wikipedia useful!
Wikipedia: Bézier curve

# Curves

If you’ve read my curve paper and understand the basics you can skip this section and go onto the section “Before Going Into Surfaces”.

Let’s talk about how to store curves of various degrees in textures and evaluate points on them using the GPU Texture sampler. We’ll need this info when we are working with surfaces and volumes because a higher degree curve is dual to a section of lower degree surface or an even lower degree volume.

The three ways we’ll be talking about controlling the order of curves are:

1. Texture Dimensionality – 1d texture vs 2d texture vs 3d texture vs 4d texture.
2. Number of Color Channels – How many color channels are used? R? RG? RGB? RGBA?
3. Multiple Texture Samples – Doing multiple texture reads.

## Texture Dimensionality

By texture dimensionality I mean how many dimensions the texture has. In all cases, the size of the texture is going to be 2 on each axis.

Starting with a 1d texture, we have a single texture coordinate (u) to sample along. As we change the u value from 0 to 1, we are just linearly interpolating between the two values. A 1d texture that has 2 pixels in it can store a degree 1 curve, also known as a linear Bezier curve. With linear texture sampling, the GPU hardware will do this linear interpolation for you.

The equation for linear interpolation between two values A and B which are at t=0 and t=1 respectively is:
$A*(1-t) + B*t$

Here’s the 1d texture:

Here’s a linear curve:

Going to a 2d texture it gets more interesting. We now have two texture coordinates to sample along (u,v). Using linear sampling, the hardware will do bilinear interpolation (linear interpolation across each axis) to get the value at a specific (u,v) texture coordinate.

Here is the equation for bilinear interpolation between 4 values A,B,C,D which are at texture coordinates (0,0), (1,0), (0,1), (1,1) respectively, being sampled at (u,v):

$(A*(1-u) + B*u)*(1-v) + (C*(1-u) + D*u) * v$

That equation interpolates from A to B by u (x axis), and from C to D by u (x axis), and then interpolates from the first result to the second by v (y axis). Note that it doesn’t actually matter which axis is interpolated by first. An equivelant equation would be one that interpolates from A to B by v (y axis) and from B to C by v (y axis) and then between those results by u (x axis).

With that equation, something interesting starts to happen if you use the same value (t) for u and v, expand and simplify, and end up at this equation:

$A*(1-t)^2 + (B+C)*(1-t)t + Dt^2$

That equation is very close to the quadratic Bezier formula, which is below:

$A*(1-t)^2 + B*2(1-t)t + Ct^2$

To get to that equation, we just make B and C the same value (B), and rename D to C since that letter is unused. This tells us how we need to set up our 2d texture such that when we sample along the diagonal, we get the correct point on our quadratic Bezier curve:

Here’s a quadratic Bezier curve in action. You can see how it is a linear interpolation between two linear interpolations, just like taking a bilinearly interpolated sample on our texture is.

Taking this to a 3d texture, we now have three texture coordinates to sample along (u,v,w). Again, with linear sampling turned on, the hardware will do trilinear interpolation to get the value at a specific (u,v,w) texture coordinate.

If we follow the same process as the 2d texture, we will wind up with the equation for a cubic Bezier curve:

$A*(1-t)^3 + B*3(1-t)^2t + C*3(1-t)t^2 + Dt^3$

Here’s how the texture is laid out:

Here’s a cubic Bezier curve in action, where you can see 3 levels of linear interpolations, just like how trilinear interpolation works:

While I have never used a 4d texture it appears that directx supports them and there looks to be an OpenGL extension to support them as well.

If we took this to a 4d texture, we would end up with the equation for a quartic curve. If you have trouble visualizing what a 4d texture even looks like, you aren’t alone. You have four texture coordinates to sample along (u,v,w,t). When you sample it, there are two 3d volume textures that are sampled at (u,v,w), resulting in two values as a result. These values are then interpolated by t to give you the final value. A fourth dimensional texture lookup is just an interpolation between 2 three dimensional texture lookups. That is true of all dimensional texture lookups in fact. An N dimensional texture lookup is just the linear interpolation between two N-1 dimensional texture lookups. For example, a three dimensional texture lookup is just an interpolation between 2 two dimensional texture lookups. This “hierchical interpolation” is the link I noticed between texture interpolation and the De Casteljau algorithm, since that is also a hierchical interpolation algorithm, just with fewer values interpolated between.

Here’s how the 4d texture is laid out:

Here’s the quartic Bezier equation, which is what you get the answer to if you sample a 4d texture at (t,t,t,t):

$A*(1-t)^4 + B*4(1-t)^3t + C*6(1-t)^2t^2 + D*4(1-t)t^3 + Et^4$

Here’s a quartic Bezier curve in action, showing 4 levels of linear interpolation, just like how quadrilinear interpolation works with 4d textures:

So, the bottom line of this section is that if we sample along the diagonal of an N dimensional texture which has one color channel, we will get points on a degree N curve.

## Number of Color Channels

Another way we can control the degree of a curve stored in a texture is by the number of color channels that are stored in the texture.

In the section above we showed a 1d texture that stored a linear curve. it had only one color channel:

Let’s add another color channel. A,B will be stored in the red channel, and B,C will be stored in the green channel:

When we read that texture at location (t), we will get the following values:

1. R: The linear interpolation between A and B at time t.
2. G: The linear interpolation between B and C at time t.

Now, if we just lerp between R and G in our shader, for time t, we will get the point at time t, on the cubic Bezier curve defined by control points A,B,C.

Pretty cool right?

What happens if we add another color channel, blue?

Well, when we sample the texture at time t, we get the following values:

1. R: The linear interpolation between A and B at time t.
2. G: The linear interpolation between B and C at time t.
3. B: The linear interpolation between C and D at time t.

We can combine these values using the quadratic Bezier curve formula, as if these were each a control point:

$R*(1-t)^2 + G*2(1-t)t + Bt^2$

The result we get is a point on the CUBIC curve defined by the four control points A,B,C,D.

In the previous section, it took a 3d volume texture to calculate a cubic curve. In this section we were able to do it with a 1d RGB texture, but it came at the cost of of having to do some calculation in the shader code after sampling the texture to combine the color channels and get the final result.

How exactly does adding a color channel affect the degree though? Each color channel added increases the degree by 1.

You can see this is true by seeing in the last section how a 3 dimensional texture can evaluate a cubic, and a 4 dimensional texture can evalaute a quartic, but the 4th dimensional texture was just two 3 dimensional textures. Adding a second color channel just doubles the size of your data (and adding two tripples, and adding three quadruples), so having a 3d volume texture that has two color channels is the same as having a 4d volume texture with a single color channel. In both cases, you are just interpolating between two 3d texture samples.

## Multiple Texture Samples

Multiple texture samples is the last way to control curve degree that we are going to talk about.

Taking extra texture samples is a lot like adding color channels.

If you have a 1d RGB texture, you get a result of 3 lerps – R,G,B – which you can use to calculate a cubic curve point (order 3). If you take a second sample, you get R0,G0,B0,R1,G1,B1 which is a result of 6 lerps, which gives you a point on a sextic curve (order 6).

If you have a 2d RGBA texture, you get the result of 4 quadratic interpolations – R,G,B,A – which gives you an order 5 curve point. Taking another texture read gives you 8 quadratic interpolation results, which you can put together to make an order 9 curve point. Taking a third texture read would get you up to order 13.

Just like adding color channels, taking extra texture samples requires you to combine the multiple results in your shader, which increases computational cost.

Besides that, you are also doing more texture reads, which can be another source of performance loss. The textures are small (up to 2x2x2x2) so are texture cache friendly, but if you have multiple textures, it could start to add up I’m sure.

IMO this option should be avoided in favor of the others, when possible.

# Before Going Into Surfaces

Before we start on surfaces, I want to mention a few things.

Even though we’ve been talking about Bezier curves specifically, a previous post explained how to convert any polynomial from power basis form into Bernstein basis form (aka you can turn any polynomial into a Bezier curve that is exactly equivelant). So, this generalizes to polynomials, and even rational polynomials if you do division in your shader code, but I’ll point you towards that post for more information on that: Evaluating Polynomials with the GPU Texture Sampler.

You can also extend the above for piecewise curves easily enough. You just set up a different curve (or surface or volume, as we describe below) for different ranges of your parameter space values. From time 0 to 1, you may use one texture, and from time 1 to 2, you may use another. Better yet, you would store both curves in a single texture, and just make the texture be a little larger, instead of having two separate textures.

Also, many other types of curves – B-splines, nurbs – can be broken down exactly into piecewise Bezier curves (rational, if the source curve is rational). Check these links for more info:
Algorithms for B-Spline Curves
Wikipedia: De Boor’s Algorithm.

# Surfaces

Finally onto surfaces!

I’m going to show how to extend the curve calculation technique to calculating points on Bezier rectangles. A Bezier rectangle is a rectangular surface which has one or more bezier curves across the X axis and one or more bezier curves across the Y axis. The degree of the curve on each axis doesn’t need to match so it could be quadratic on one axis and cubic on the other as an example.

To actually evaluate a point on the surface at location (u,v), you evaluate a point on each x axis curve for time u, and then you use those resulting values as control points in another curve that you evaluate at time v.

Just like linear interpolation, it doesn’t matter which axis you evaluate first for a Bezier rectangle surface so you could switch the order of the axis evaluation if you want to.

The image above shows a bicubic surface, the blue lines show the x axis cubic bezier curves, while the yellow lines show the y axis cubic bezier curves. Those lines are called “isolines” or “isocurves”. The 16 control points are shown in magenta.

Another name for a Bezier rectangle is a tensor product surface. This is a more generalized term as it isn’t limited to Bezier curves.

Note: there is another type of Bezier surface called a Bezier Triangle but I haven’t worked much with them so can’t say if any of these techniques work with them or not. It would be interesting to explore how these techniques apply to Bezier triangles, if at all.

Hopefully it should come as no surprise that a 2d texture using regular bilinear interpolation is in fact a Bezier rectangle which is linear on the x axis and linear on the y axis. It has a degree of (1,1) and is stored in a 2d texture (2×2 pixels), where the four control points are just stored in the four pixels. You just sample the texture at (u,v) to get that point on the surface. Pretty simple stuff.

Order (1,1) Bezier Rectangle:

Something interesting to note is that while the isolines (edges) of the rectangle are linear, the surface itself is curved. In fact, we know that the diagonal of this surface is in fact a quadratic Bezier curve because we calculate curves by sampling along the diagonal! (if the middle corners are different, it’s the same as if they were both replaced with the average of their values).

There are other ways to store this Degree (1,1) surface in a texture besides how i described. You could also have a 1 dimensional texture with two color channels, where you sample it along the u axis, and then interpolate your R and G values, using the v axis value. This would come at the cost of doing a lerp in the shader code, instead of having the texture sampler hardware do it for you.

Now that the simplest case is out of the way, how about the next simplest? What if we want a surface where we linearly interpolate between two quadratic curves? That is, what if we want to make a degree (2,1) Bezier rectangle?

Order (2,1) Bezier Rectangle:

Well if you think about it geometrically, we can store a quadratic curve in a 2d texture (2×2) with a single color channel. To linearly interpolate between two of those, we need two of those to interpolate between. So, we need a 3d texture, since that is just an interpolation between two 2d textures.

When we sample that texture, we use the coordinates (u,u,v). That will make it quadratic in u, but linear in v.

Stepping up the complexity again, what if we wanted to make a biquadratic surface – aka degree (2,2)?

Order (2,2) Bezier Rectangle:

Well, to make a quadratic curve we need 3 control points, so for a biquadratic surface we need 3 quadratic curves to quadratically interpolate between.

One way to do this would be with a 4d texture, sampling along (u,u,v,v) to make it quadratic in both u and v.

But, because 4d textures are kind of exotic and may not be supported, we can achieve this by instead having a 3d texture with two color channels: R,G.

When we sample that texture, we sample at (u,u,v) to get two values: R,G. Next we linearly interpolate from R to G using v. This makes us quadratic in both u and v.

There are other ways to encode this surface as well, but i’ll leave that to you to think about if you want to (:

Lastly, what if we wanted a bicubic surface? A cubic curve has 4 control points, so we need 4 cubic curves to cubically interpolate between to make our final surface.

Order (3,3) Bezier Rectangle:

Thinking back to the first section, a 3d texture can evaluate a cubic curve. Since we need four cubic curves, let’s just use all four color channels RGBA. We would sample our texture at (u,u,u) to get four cubic curves in RGBA and then would use the cubic Bezier formula to combine those four values using v into our final result.

# Surfaces Generalized

Generalizing surface calculations a bit, there are basically two steps.

First is you need to figure out what your requirements for the x axis is as far as texture storage for the desired degree you want. From there, you figure out what degree you want on your y axis, and that degree is what you multiply the x axis texture storage requirements for.

It can be a little bit like tetris trying to figure out how to fit various degree surfaces into various texture sizes and layouts, but it gets easier with a little practice.

It’s also important to remember that the x axis being the first axis is by convention only. It could easily be the y axis that defines the texture storage requirements, and is multiplied by the degree of the x axis.

# Volumes

Volumes aren’t a whole lot more complex than surfaces, but they are a lot hungrier for texture space and linear interpolations!

Extending the generalization of surfaces, you once again figure out requirements for the x axis, multiply those by the degree of the y axis, and then multiply that result by the degree of the z axis.

The simplest case for volumes is the trilinear case, aka the Degree (1,1,1) Bezier rectangle.

Order (1,1,1) Bezier Cube:

It’s a bit difficult to understand what’s going on in that picture by seeing the data as just fog density, so the demos let you specify a surface threshold such that if the fog is denser than that amount, it shows it as a surface. Here is the same trilinear Bezier volume with a surface threshold.

Order (1,1,1) Bezier Cube:

You just store your 8 values in the 8 corners of the 2x2x2 texture cube, and sample at (u,v,w) to get your trilinear result.

The next simplest case is that you want to quadratically interpolate between two linear surfaces – a Degree (1,1,2) Bezier rectangle.

Order (1,1,2) Bezier Cube:

To do this, you need 3 bilinear surfaces to interpolate between.

One way to do this would be to have a 2d Texture with R,G,B color channels. Sample the texture at (u,v), then quadratically interpolate R,G,B using w.

Another way to do this would be to have a 3d texture with R and G channels. When sampling, you sample the 3d texture at (u,v,w) to get your R and G results. You then linearly interpolate from R to G by w to get the final value.

Yet another way to do this would be to use a 4d texture if you have support for it, and sample along (u,v,w,w) to get your curve point using only hardware interpolation.

The next simplest volume type is a linear interpolation between two biquadratic surfaces – a Degree (2,2,1) Bezier rectangle.

Order (2,2,1) Bezier Cube:

From the surfaces section, we saw we could store a biquadratic surface in a 3d texture using two color channels R,G. After sampling at (u,u,v) you interpolate from R to G by v.

To make a volume that linearly interpolates between two biquadratic surfaces, we need two biquadratic surfaces, so need to double the storage we had before.

We can use a 3d texture with 4 color channels to make this happen by storing the first biquadratic in R,G and the second in B,A, sampling this texture at (u,u,v). Next, we interpolate between R and G by v, and also interpolate between B and A by v. Lastly, we linearly interpolate between those two results using w.

The next higher surface would be a triquadratic volume, which is degree (2,2,2). Since you can store a biquadratic surface in a 3d 2x2x2 texture with two color channels, and a triquadratic volume needs 3 of those, we need a 3d texture 6 color channels. Since that doesn’t exist, we could do something like store 2 of the quadratic surfaces in a 2x2x2 RGBA texture, and the other quadratic surface in a 2x2x2 RG texture. We would take two texture samples and combine the 6 results into our final value.

Tricubic is actually pretty simple to conceptualize luckily. We know that we can store a bicubic surface in a 3d 2x2x2 RGBA texture. We also know that we would need 4 of those if we want to make a tricubic volume. So, we could do 4 texture reads (one for each of our bicubic surfaces) and then combine those 4 samples across w to get our final volume value.

# Closing

Hopefully you were able to follow along and see that this stuff is potentially pretty powerful.

Some profiling needs to be done to better understand the performance characteristics of using the texture sampler in this way, versus other methods of curve, surface and volume calculation. I have heard that even when your texture samples are in the texture cache, that it can still take like ~100 cycles to get the information back on a texture read. That means that this is probably not going to be as fast as using shader instructions to calculate the points on the curve. However, if you are compute bound and can offload some work to the texture sampler, or if you are already using a texture to store 1d/2d/3d data (or beyond) that you can aproximate with this technique, that you will have a net win.

One thing I really like about this is that it makes use of non programmable hardware to do useful work. It feels like if you were compute bound, that you could offload some work to the texture sampler if you had some polynomials to evaluate (or surfaces/volumes to sample), and get some perf back.

I also think this could possibly be an interesting way to make concise representations (and evaluation) of non polygonal models. I imagine it would have to be piecewise to make things that look like real world objects, but you do have quite a bit of control with Bezier curves, surfaces and volumes, especially if you use rational ones by doing a divide in your shader.

Here’s a few specifica areas I think this technique could help out with:

• Higher order texture interpolation with fewer samples – You’d have to preprocess textures and would spend more memory on them, but it may be worth while in some situations for higher quality results with a single texture read.
• 2D signed distance field rendering – SDF textures are great for making pseudo vector art. They do break down in some cases and at some magnification levels though. It would be interesting to see if using this technique could improve things either with higher order interpolation, or maybe by encoding (signed) distances differently. Possibly also just useful for describing 2d vector art in a polynomial form?
• 3d signed distance field rendering – Ray marching can make use of signed distance fields to render 3d objects. It can also make use of functions which can only give you inside or outside status based on a point. It would be interesting to explore encoding and decoding both of these types of functions within textures using this technique, to sample shapes during ray marching.

If you are interested in the above, or curious to learn more, here are some good links!

If you have any questions, corrections, feedback, ideas for extensions, etc please let me know! You can leave a comment below, or contact me on twitter at @Atrix256.

## Feedback / Ideas

@anders_breakin had some ideas that could possibly pan out:

1. The derivative of a Bezier curve is another Bezier curve (Derivatives of a Bézier Curve). You could encode the derivative curve(s) in a texture and use that to get the normal instead of using the central differences method. That might give higher quality normals, but should also decrease the number of texture reads needed to get the normal.
2. If you want more accuracy, you may subdivide the curve into more numerous piecewise curves. The texture interpolator only has 8 bits of decimal precision (X.8 fixed point) when interpolating, but if you give it less of the curve/surface/volume to interpolate over at a time, it seems like that would result in more effective precision.

@Vector_GL suggested reading the values in the vertex shader and using the results in the pixel shader. I think something like this could work where you read the control points in the VS, and pass them to the PS, which would then be able to ray march the tensor product surface by evaluating it without texture reads. So long as you have fewer VS instances than PS instances (the triangles are not subpixel!) that this could be an interesting thing to try. It doesn’t take advantage of the texture interpolator, but maybe there would be a way to combine the techniques. If not, this still seems very pragmatic.

I was thinking maybe this could be done via “rasterization” by drawing a bunch of unit cubes and having the PS do the ray marching. With some careful planning, you could probably use Z-testing on this too, to quickly cull hidden pixels without having to ray march them.

## Failed Experiment: The GPU Texture Sampler is Turing Complete But That Fact is Pretty Useless

While it’s true that the GPU texture sampler can evaluate digital logic circuits, it turns out there’s a much better and simpler way to evaluate logic with textures. That better and simpler way isn’t even that useful unfortunately!

This post will show the path I took from the initially intriguing possibilities to the more mundane final answer. You may be able to see mistakes in my reasoning along the way, or be able to get to the punch line sooner (:

This was meant to be an extension to a paper I wrote talking about how you can evaluate Bezier curves by storing only the control points in a texture and then sampling along the texture diagonal:
GPU Texture Sampler Bezier Curve Evaluation

The ideas from this post started with a tweet from @marcosalvi:

Because the last post showed how to evaluate arbitrary polynomials using the texture sampler, and digital circuits can be described as as polynomials in Algebraic Normal Form (ANF), that means we can use the texture sampler to evaluate digital logic circuits. Let’s check it out!

First up, we need to be able to convert logic into ANF. Oddly enough, I already have a post about how to do that, with working C++ source code, so go check it out: Turning a Truth Table Into A digital Circuit (ANF).

As an example, let’s work with a circuit that takes 3 input bits, and adds them together to make a 2 bit result. We’ll need one ANF expression per output bit. $O_0$ will be the 1’s place output bit (least significant bit), and $O_1$ will be the 2’s place output bit (most significant bit). Our 3 input bits will be u,v,w.

$O_0 = u \oplus v \oplus w$
$O_1 = uv \oplus vw \oplus uw$

If we want to use our polynomial evaluation technique, we need equations that are univariate (one variable) instead of multivariate (multiple variables). We can try just using a single variable x in place of u,v and w. Remember that in ANF, you work with polynomials mod 2 (aka $\mathbb{Z}_2$), and that XOR ($\oplus$) is addition while AND is multiplication. This gives the formulas below:

$O_0 = x + x + x = 3x$
$O_1 = xx + xx + xx = 3x^2$

The next thing we need to use the technique is to know the Bezier control points that make a Bezier curve that is equivalent to this polynomial. Since we have 3 input variables into our digital circuit, if they were all 3 multiplied together (ANDed together), we would have a cubic equation, so we need to convert those polynomials to cubic Bernstein basis polynomials. We can use the technique from the last post to get the control points of that equivalent curve.

$O_0 \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 1 & 2 & 3 \\ 3 & 3 / 3 = 1 & 1 & 1 & \\ 0 & 0 / 3 = 0 & 0 & & \\ 0 & 0 / 1 = 0 & & & \\ \end{array}$

$O_1 \begin{array}{c|c|c|c|c} 0 & 0 / 1 = 0 & 0 & 1 & 3 \\ 0 & 0 / 3 = 0 & 1 & 2 & \\ 3 & 3 / 3 = 1 & 1 & & \\ 0 & 0 / 1 = 0 & & & \\ \end{array}$

Now that we have our control points, we can set up our textures to evaluate our two cubic Bezier curves (one for $O_0$, one for $O_1$). We’ll need to use 3d textures and we’ll need to set up the control points like the below, so that when we sample along the diagonal of the texture we get the points on our curves.

The picture below shows where each control point goes, to set up a cubic Bezier texture. The blue dot is the origin (0,0,0) and the red dot is the extreme value of the cube (1,1,1). The grey line represents the diagonal that we sample along.

Coincidentally, our control points for the $O_0$ curve are actually 0,1,2,3 so that cube above is what our 3d texture needs to look like for the $O_0$ curve.

Below is what the $O_1$ curve’s 3d texture looks like. Note that in reality, we could store these both in a single 3d texture, just use say the red color channel for $O_0$ and the green color channel for $O_1$.

Now that we have our textures set up let’s try it out. Let’s make a table where we have our three input bits, and we use those as texture coordinates in our 3d textures (the texture cubes above) to see what values we get. (Quick note – things are slightly simplified here vs reality. The pixel’s actual value is at a half pixel offset from the texture coordinates, so we’d be sampling between (0.5,0.5,0.5) and (1.5,1.5,1.5) instead of from (0,0,0) to (1,1,1), but we can ignore that detail for now to make this stuff clearer.)

$\begin{array}{|c|c|c|c|c|} \hline u & v & w & O_1 & O_0 \\ \hline 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 & 2 \\ 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1 & 2 \\ 1 & 1 & 0 & 1 & 2 \\ 1 & 1 & 1 & 3 & 3 \\ \hline \end{array}$

Now, let’s modulus the result by 2 since ANF expects to work mod 2 ($\mathbb{Z}_2$ to be more precise), and put the decimal value of the result next to it.

$\begin{array}{|c|c|c|c|c|c|} \hline u & v & w & O_1 \% 2 & O_0 \% 2 & Result \\ \hline 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 1 & 1 & 0 & 2 \\ 1 & 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 0 & 2 \\ 1 & 1 & 0 & 1 & 0 & 2 \\ 1 & 1 & 1 & 1 & 1 & 3 \\ \hline \end{array}$

It worked! The result value is the count of the input bits set to 1.

Unfortunately we have a problem. When we converted the multivariate equation into a univariate equation, we just replaced u,v,w with x. This is only valid if the function is symmetric – if u,v,w can be interchanged with each other and not affect the result of the function. This bit adding digital circuit we made happened to have that property, but most digital circuits do not have that property – most of the time, not all input bits are treated equal. If we made a circuit that added two 2-bit numbers and have a 3-bit result for instance, the high bits of the input numbers have a very different meaning than the low bits and this technique falls apart. (Quick note – we are actually doing the reverse of the polynomial blossoming thing i mentioned in the last post. Blossoming is the act of taking a univariate function and breaking it into a multivariate function that is linear in each variable. The term is called symmetric multiaffine equation if you want to find out more about that.)

This turns out not to be a deal breaker though because it turns out we didn’t have to do a lot of the work that we did to get these volume textures. It turns out we don’t need to calculate the Bezier curve control points, and we don’t even need to make an ANF expression of the digital circuit we want to evaluate.

Let’s recap what we are trying to do. We have 3 input values which are either 0 or 1, we have a 3d texture which is 2x2x2, and we are ultimately using those 3 input values as texture coordinates (u,v,w) to do a lookup into a texture to get a single bit value out.

Here’s a big aha moment. We are just making a binary 3d lookup table, so can take our truth table of whatever it is we are trying to do, and then directly make the final 3d textures described above.

Not only does it work for the example we gave, with a lot less effort and math, it also works for the broken case I mentioned of the function not being symmetric, and not all input bits being equal.

Something else to note is that because we are only sampling at 0 or 1, we don’t need linear texture interpolation at all and can use nearest neighbor (point) sampling on our textures for increased performance. Also because the texture data is just a binary 0 or 1, we could use 1 bit textures.

The second aha moment comes up when you realize that all we are doing is taking some number of binary input bits, using those as texture coordinates, and then looking up a value in a texture.

You can actually use a 1D texture for this!

You take your input bits and form an integer, then look up the value at that pixel location. You build your texture lookup table using this same mapping.

So… it turns out this technique led to a dead end. It was just extra complexity to do nothing special.

Before it all fell apart, I was also thinking this might be a good avenue for doing homomorphic encryption on the GPU, but I don’t believe this aids that at all. (Super Simple Symmetric Leveled Homomorphic Encryption Implementation)

# But Wait – Analog Valued Logic?

One thought I had while all this was unraveling was that maybe this was still useful, because if you put an analog value in (not a 0 or 1, but say 0.3), that maybe this could be used as a sort of “Fuzzy Logic” type logic evaluation.

Unfortunately, it looks like that doesn’t work either!

You can see how it breaks down and some more info here:
Computer Science Stack Exchange: Using analog values with Algebraic Normal Form?

# Oh Well

Sometimes when exploring new frontiers (even if they are just new to us) we hit dead ends, our ideas fail etc. It happens. It’s part of the learning process, and also is useful sometimes to know what doesn’t work and why, instead of just always knowing what DOES work.

Anyways… posts on using the texture sampler for calculating points on data surfaces and data volumes are coming next (:

To give a brief taste of how that is going to play out:

• Doing a single texture read of a 3d RGBA texture can give you a triquadratic interpolated value.
• Alternately, doing a single texture read of a 3d RGBA texture can give you a bicubic interpolated value.

## Evaluating Polynomials with the GPU Texture Sampler

This is an extension of a paper I wrote which shows how to use the linear texture sampling capabilities of the GPU to calculate points on Bezier curves. You store the control points in the texture, then sample along the texture’s diagonal to get points on the curve:
GPU Texture Sampler Bezier Curve Evaluation

I’ve been thinking about the items in the “future work” section and found some interesting things regarding polynomials, logic gates, surfaces and volumes. This is the first post, which deals with evaluating polynomials.

# Evaluating Polynomials

One of the main points of my paper was that N-linear interpolation (linear, bilinear, trilinear, etc) can be used to evaluate the De Casteljau algorithm since both things are just linear interpolations of linear interpolations. (Details on bilinear interpolation here: Bilinear Filtering & Bilinear Interpolation).

This meant that it was also able to calculate Bernstein Polynomials (aka the algebraic form of Bezier curves), since Bernstein polynomials are equivalent to the De Casteljau algorithm.

I started looking around to see what would happen if you messed around with the De Casteljau algorithm a bit, like interpolate at one level by
$t^2$ or $t*0.5+0.5$ or by a constant or by another variable completely. My hope was that I’d be able to make the technique more generic and open it up to a larger family of equations, so people weren’t limited to just Bernstein polynomials.

That opened up a pretty deep rabbit hole on polynomial blossoming and something called Symmetric Multiaffine Functions. There are some great links in the answer here:
Math Stack Exchange: Modifying and Generalizing the De Casteljau Algorithm

In the end, it turned out to be pretty simple though. It turns out that any polynomial can be converted back and forth from “Power Basis” (which looks like $Ax^2+Bx+C$) to “Bernstein Basis” (which looks like $A(1-t)^2+B(1-t)t+Ct^2$) so long as they are the same degree.

This isn’t the result I was expecting but it is a nice result because it’s simple. I think there is more to be explored by sampling off the diagonal, and using different t values at different stages of interpolation, but this result is worth sharing.

By the way, you could also use curve fitting to try and approximate a higher degree function with a lower degree one, but for this post, I’m only going to be talking about exact conversion from Bernstein polynomials to Power polynomials.

Since we can convert power basis polynomials to Bernstein polynomials, and the technique already works for Bernstein polynomials, that means that if we have some random polynomial, say $y=2x^3+4x+2$, that we can make this technique work for that too. The technique got a little closer to arbitrary equation evaluation. Neat!

# Converting Power Basis to Bernstein Basis

I found the details of the conversion process at Polynomial Evaluation and Basis Conversion which was linked to by Math Stack Exchange: Convert polynomial curve to Bezier Curve control points.

This is best explained working through examples, so let’s start by converting a quadratic polynomial from power basis to Bernstein basis.

$y=2x^2+8x+3$

The first thing we do is write the coefficients vertically, starting with the $x^0$ coefficient, then the $x^1$ coefficient and continuing on to the highest value $x^n$:

$\begin{array}{c} 3 \\ 8 \\ 2 \\ \end{array}$

Next, we need to divide by the Binomial Coefficients (aka the row of Pascal’s Triangle which has the same number of items as we have coefficients). In this case we need to divide by: 1,2,1.

$\begin{array}{c|c} 3 & 3 / 1 = 3 \\ 8 & 8 / 2 = 4 \\ 2 & 2 / 1 = 2 \\ \end{array}$

Now we generate a difference table backwards. it’s hard to explain what that is in words, but if you notice, each value is the sum of the value to the left of it, and the one below that.

$\begin{array}{c|c|c|c} 3 & 3 / 1 = 3 & 7 & 13 \\ 8 & 8 / 2 = 4 & 6 & \\ 2 & 2 / 1 = 2 & & \\ \end{array}$

We are all done. The control points for the Bezier curve are on the top row (ignoring the left most column). They are 3,7,13 which makes it so we have the following two equations being equal. The first is in power basis, the second is in Bernstein basis.

$y=2x^2+8x+3$
$y=3(1-x)^2+14(1-x)x+13x^2$

Note: don’t forget that Bezier curves multiply the control points by the appropriate row in Pascal’s triangle. That’s where the 14 comes from in the middle term of the Bernstein polynomial. We are multiplying the control points 3,7,13 by the row in Pascal’s triangle 1,2,1 to get the final coefficients of 3,14,13.

Let’s have Wolfram Alpha help us verify that they are equal.

Yep, they are equal! If you notice the legend of the graph, wolfram actually converted the Bernstein form back to power basis, and you can see that they are exactly equivalent.

You can also write the Bernstein form like the below, which i prefer, using $t$ instead of $x$ and also setting $s=1-t$.

$y=3s^2+14st+13t^2$

Cubic Function

A cubic function is not that much harder than a quadratic function. After this, you should see the pattern and be able to convert any degree easily.

$y=5x^3+9x-4$

Again, the first thing we do is write the coefficients vertically, starting with the constant term. Note that we don’t have an $x^2$ term, so it’s coefficient is 0.

$\begin{array}{c} -4 \\ 9 \\ 0 \\ 5 \\ \end{array}$

We next divide by the Pascal’s triangle row 1,3,3,1.

$\begin{array}{c|c} -4 & -4 / 1 = -4 \\ 9 & 9 / 3 = 3 \\ 0 & 0 / 3 = 0 \\ 5 & 5 / 1 = 5 \\ \end{array}$

Now, make the difference table going backwards again:

$\begin{array}{c|c|c|c|c} -4 & -4 / 1 = -4 & -1 & 2 & 10 \\ 9 & 9 / 3 = 3 & 3 & 8 & \\ 0 & 0 / 3 = 0 & 5 & & \\ 5 & 5 / 1 = 5 & & & \\ \end{array}$

Our Bezier control points are along the top: -4,-1,2,10. Keeping in mind that the coefficients for a cubic bezier curve are multiplied by 1,3,3,1 we can make the Bernstein form and put it next to our original formula:

$y=5x^3+9x-4$
$y=-4(1-x)^3-3(1-x)^2x+6(1-x)x^2+10x^3$

Let’s check in wolfram alpha again:
Wolfram Alpha: graph y=5x^3+9x-4, y=-4(1-x)^3-3x(1-x)^2+6x^2(1-x)+10x^3, from 0 to 1

And here it is in the cleaner form:

$y=-4s^3-3s^2t+6st^2+10t^3$

## Some Notes On Calculating Polynomials with the Texture Sampler

You may notice that in the comparison graphs i only plotted the graphs from 0 to 1 on the x axis (aka the t axis). The equations are actually equivalent outside of that range as well, but the technique from my paper only works from the 0 to 1 range because it relies on built in hardware pixel interpolation. This may sound like a big limitation, but if you know the minimum and maximum value of x that you want to plug into your equation at runtime, you can convert your x into a percent between those values, get the resulting polynomial, convert it to Bernstein form, set up the texture, and then at runtime convert your input parameter into that percent when you do the lookup. In other words, you squeeze the parts of the function you care about into the 0 to 1 range.

Another issue you will probably hit is that standard RGBA8 textures have only 8 bits per channel and can only store values between 0 and 1. Since the texture is supposed to be storing your control points, that is bad news.

One way to get around this is to find the largest coefficient value and divide the others by this value. This will put the coefficients into the 0 to 1 range, which will be able to be stored in your texture. After sampling the texture, you multiply the result by that scaling value to get the correct answer.

Scaling won’t help having both negative and positive coefficients though. To handle negative coefficients, you could map the 0-1 space to be from -1 to 1, similar to how we often do it with normal maps and other signed data stored in textures. After doing the lookup you’d have to unmap it too of course.

You could also solve negative values and scaling problems by squishing the y axis into the 0 to 1 space by subtracting the minimum and dividing by the maximum minus the minimum, similarly to how we squished the x range into 0 to 1.

If you instead move to an RGBAF32 texture, you’ll have a full 32 bit float per color channel and won’t have problems with either large values or negative values. You will still have to deal with x only going from 0 to 1 though.

I also want to mention that the hardware texture interpolation works in a X.8 fixed point format. There are more details in my paper, but that means that you’ll get some jagged looking artifacts on your curve instead of a smoothly varying value. If that is a problem for you in practice, my paper talks about a few ways to mitigate that issue.

Before moving on, I wanted to mention that it’s easy to support rational polynomials using this method as well. A rational polynomial is when you divide one polynomial by another polynomial, and relates to rational Bezier curves, where you divide one curve by another curve (aka you give weights to control points). Rational curves are more powerful and in fact you can perfectly represent sine and cosine with a quadratic rational polynomial. More info on that in my paper.

To calculate rational polynomials, you just encode the numerator polynomial in one color channel, and the denominator polynomial in another color channel. After you sample the texture and get the result of your calculation, you divide the numerator value by the denominator value. It costs one division in your shader code, but that’s pretty cheap for the power it gives you!

Regarding the texture size requirements to store a polynomial of a specific degree…

Every dimension of the texture, and every color channel in that texture, adds a degree.

However, to get the benefit of the degree increase from the color channel, you need to do a little more math in the shader – check my paper for more details!

So, if you wanted to store a quadratic polynomial in a texture, you would need either a 2d texture with 1 color channel, or you could do it with a 1d texture that had 2 color channels.

If you wanted to store a cubic polynomial in a texture, you could use a 3d texture with 1 color channel, or a 2d texture with two color channels (there would be some waste here) or a 1d texture with three color channels.

For a polynomial that had a maximum degree term of 6, you could use a 3d volume texture that had 3 color channels: RGB.

If you need to evaluate a very high degree polynomial, you can actually take multiple texture samples and combine them.

For instance, if you had a 2d texture with a single color channel, you could do a single texture read to get a quadratic.

If you linearly interpolated between those two quadratics, you would end up with a cubic.

That isn’t a very high degree curve but is easier to grasp how they combine.

Taking this up to RGBA 3d volume textures, a single texture read will get you a curve of degree 6. If you do another read, it will take it to degree 7. Another read gets you to 8, another to 9, etc.

With support for 4d textures, an RGBA texture read would give you a degree 7 curve. Another read would boost it to 8, another to 9, another to 10, etc.

Regarding the specific sizes of the textures, in all cases the texture size is “2” on each dimension because we are always just linearly interpolating within a hyper cube of pixel values. You can increase the size of the texture for piecewise curves, check out the paper for more details on that and other options.

## Closing

Hopefully you found this useful or interesting!

There may not have been much new information in here for the more math inclined people, but I still think it’s worth while to explicitly show how the technique works for both Bernstein polynomials as well as the more common power basis polynomials.

I still think it would be interesting to look at what happens when you sample off of the diagonal, and also what happens if you use different values at different stages of the interpolation. As an example, instead of just looking up a texture at (t,t) for the (u,v) value to get a quadratic curve point, what if we look up by (t,t^2)? At first blush, it seems like by doing that we may be able to boost a curve to a higher degree, maybe at the cost of some reduced flexibility for the specific equations we can evaluate?

Next up I’ll be writing up some more extensions to the paper involving logic gates, surfaces, and volumes.

Have any feedback, questions or interesting ideas? Let me know!

## The Secret to Writing Fast Code / How Fast Code Gets Slow

This is a “soft tech” post. If that isn’t your thing, don’t worry, I’ll be returning to some cool “hard tech” and interesting algorithms after this. I’ve been abusing the heck out of the GPU texture sampler lately, so be on the lookout for some posts on that soon (;

I’m about to show you some of the fastest code there is. It’s faster than the fastest real time raytracer, it’s faster than Duff’s Device.

Heck, despite the fact that it runs on a classical computer, it runs faster than Shor’s Algorithm which uses quantum computing to factor integers so quickly that it breaks modern cryptographic algorithms.

This code also runs faster than Grover’s Algorithm which is another quantum algorithm that can search an unsorted list in O(sqrt(N)).

Even when compiled in debug it runs faster than all of those things.

Are you ready? here it is…

// Some of the fastest code the world has ever seen
int main (int argc, char **argc)
{
return 0;
}


Yes, the code does nothing and that is precisely why it runs so fast.

# The Secret to Writing Fast Code

The secret to writing fast code, no matter what you are writing is simple: Don’t do anything that is too slow.

Let’s say you started with a main() function like i showed above and you decided you want to make a real time raytracer that runs on the CPU.

First thing you do is figure out what frame rate you want it to run at, at the desired resolution. From there, you know how many milliseconds you have to render each frame, and now you have a defined budget you need to stay inside of. If you stay in that budget, you’ll consider it a real time raytracer. If you go outside of that budget, it will no longer be real time, and will be a failed program.

You may get camera control working and primary rays intersecting a plane, and find you’ve used 10% of your budget and 90% of the budget remains. So far so good.

Next up you add some spheres and boxes, diffuse and specular shade them with a directional light and a couple point lights. You find that you’ve used 40% of your budget, and 60% remains. We are still looking good.

Next you decide you want to add reflection and refraction, allowing up to 3 ray bounces. You find you are at 80% of your budget and are still looking good. We are still running fast enough to be considered real time.

Now you say to yourself “You know what? I’m going to do 4x super sampling for anti aliasing!”, so you shoot 4 rays out per pixel instead of 1 and average them.

You profile and uh oh! You are at 320% of your budget! Your ray tracer is no longer real time!

What do you do now? Well, hopefully it’s obvious: DON’T DO THAT, IT’S TOO SLOW!

So you revert it and maybe drop in some FXAA as a post processing pass on your render each frame. Now you are at 95% of your budget let’s say.

Now you may want to add another feature, but with only 5% of your budget left you probably don’t have much performance to spare to do it.

So, you implement whatever it is, find that you are at 105% of your budget.

Unlike the 4x super sampling, which was 220% overbudget, this new feature being only 5% over budget isn’t THAT much. At this point you could profile something that already exists (maybe even your new feature) and see if you can improve it’s performance, or if you can find some clever solution that gives you a performance boost, at the cost of things you don’t care about, you can do that to get some performance back. This is a big part of the job as a successful programmer / software engineer – make trade offs where you gain benefits you care about, at the cost of things you do not care about.

At this point, you can also decide if this new feature is more desired than any of the existing features. If it is, and you can cut an old feature you don’t care about anymore, go for it and make the trade.

Rinse and repeat this process with new features and functionality until you have the features you want, that fit within the performance budget you have set.

Follow this recipe and you too will have your very own real time raytracer (BTW related:Making a Ray Traced Snake Game in Shadertoy).

Maintaining a performance budget isn’t magic. It’s basically subtractive synthesis. Carve time away from your performance budget by adding a feature, then optimize or remove features if you are over budget. Rinse and repeat until the sun burns out.

Ok, so if it’s so easy, why do we EVER have performance problems?

## How Fast Code Gets Slow

Performance problems come up when we are not paying attention. Sometimes we cause them for ourselves, and sometimes things outside of our control cause them.

The biggest way we cause performance problems for ourselves is by NOT MEASURING.

If you don’t know how your changes affect performance, and performance is something you care about, you are going to have a bad time.

If you care about performance, measure performance regularly! Profile before and after your changes and compare the differences. Have automated tests that profile your software and report the results. Understand how your code behaves in the best and worst case. Watch out for algorithms that sometimes take a lot longer than their average case. Stable algorithms make for stable experiences (and stable frame rates in games). This is because algorithms that have “perf spikes” sometimes line up on the same frame, and you’ll have more erratic frame rate, which makes your game seem much worse than having a stable but lower frame rate.

But, again, performance problems aren’t always the programmers fault. Sometimes things outside of our control change and cause us perf problems.

Well, let’s say that you are tasked with writing some very light database software which keeps track of all employee’s birthdays.

Maybe you use a hash map to store birthdays. The key is the string of the person’s name, and the value is a unix epoch timestamp.

Simple and to the point. Not over-engineered.

Now, someone else has a great idea – we have this database software you wrote, what if we use it to keep track of all of our customers and end user birthdays as well?

So, while you are out on vacation, they make this happen. You come back and the “database” software you made is running super slow. There are hundreds of thousands of people stored in the database, and it takes several seconds to look up a single birthday. OUCH!

So hotshot, looks like your code isn’t so fast huh? Actually no, it’s just that your code was used for something other than the original intended usage case. If this was included in the original specs, you would have done something different (and more complex) to handle this need.

This was an exaggerated example, but this sort of thing happens ALL THE TIME.

If you are working on a piece of software, and the software requirements change, it could turn any of your previous good decisions into poor decisions in light of the new realities.

However, you likely don’t have time to go back and re-think and possibly re-work every single thing you had written up to that point. You move onward and upward, a little more heavy hearted.

The target moved, causing your code to rot a bit, and now things are likely in a less than ideal situation. You wouldn’t have planned for the code you have with the info you have now, but it’s the code you do have, and the code you have to stick with for the time being.

Every time that happens, you incur a little more tech debt / code complexity and likely performance problems as well.

You’ll find that things run a little slower than they should, and that you spend more time fighting symptoms with small changes and somewhat arbitrary rules – like telling people not to use name lengths more than 32 characters for maximum performance of your birthday database.

Unfortunately change is part of life, and very much part of software development, and it’s impossible for anyone to fully predict what sort of changes might be coming.

Those changes are often due to business decisions (feedback on product, jockying for a new position in the marketplace, etc), so are ultimately what give us our paychecks and are ultimately good things. Take it from me, who has worked at ~7 companies in 15 years. Companies that don’t change/adapt die.

So, change sucks for our code, but it’s good for our wallets and keeps us employed 😛

Eventually the less than ideal choices of the past affecting the present will reach some threshold where something will have to be done about it. This will likely happen at the point that it’s easier to refactor some code, than to keep fighting the problems it’s creating by being less than ideal, or when something that really NEEDS to happen CAN’T happen without more effort than the refactor would take.

When that happens, the refactor comes in, where you DO get to go back and rethink your decisions, with knowledge of the current realities.

The great thing about the refactor is that you probably have a lot of stuff that your code is doing which it doesn’t really even NEED to be doing.

Culling that dead functionality feels great, and it’s awesome watching your code become simple again. It’s also nice not having to explain why that section of code behaves the way it does (poorly) and the history of it coming to be. “No really, I do know better, but…!!!”

One of the best feelings as a programmer is looking at a complex chunk of code that has been a total pain, pressing the delete key, and getting a little bit closer back to the fastest code in the world:

// Some of the fastest code the world has ever seen
int main (int argc, char **argc)
{
return 0;
}


PS: Another quality of a successful engineer is being able to constantly improve software as it’s touched. If you are working in an area of code, and you see something ugly that can be fixed quickly and easily, do it while you are there. Since the only constant in software development is change, and change causes code quality to continually degrade, make yourself a force of continual code improvement and help reverse the flow of the code flowing into the trash can.

## Engines

In closing, I want to talk about game engines – 3rd party game engines, and re-using an engine from a different project. This also applies to using middleware.

Existing engines are great in that when you and your team know how to use them, you can get things set up very quickly. It lets you hit the ground running.

However, no engine is completely generic. No engine is completely flexible.

That means that when you use an existing engine, there will be some amount of features and functionality which were made without your specific usage case in mind.

You will be stuck in the world where from day 1 you are incurring the tech debt type problems I describe above, but you will likely be unable or unwilling to refactor everything to suit your needs specifically.

I don’t mention this to say that engines are bad. Lots of successful games have used engines made by other people, or re-used engines from previous projects.

However, it’s a different kind of beast using an existing engine.

Instead of making things that suit your needs, and then using them, you’ll be spending your time figuring out how to use the existing puzzle pieces to do what you want. You’ll also be spending time backtracking as you hit dead ends, or where your first cobbled together solution didn’t hold up to the new realities, and you need to find a new path to success that is more robust.

Just something to be aware of when you are looking at licensing or re-using an engine, and thinking that it’ll solve all your problems and be wonderful. Like all things, it comes at a cost!

Using an existing engine does put you ahead of the curve: At day 1 you already have several months of backlogged technical debt!

Unfortunately business realities mean we can’t all just always write brand new engines all the time. It’s unsustainable :/

Agree / Disagree / Have something to say?

## Minimizing Code Complexity by Programming Declaratively

Writing good code is something all programmers aspire to, but the definition of what actually makes good code can be a bit tricky to pin down. The idea of good code varies from person to person, from language to language, and also varies between problem domains. Web services, embedded devices and game programming are few software domains that all have very different needs and so also have very different software development styles, methods and best practices.

I truly believe that we are in the stone ages of software development (ok, maybe the bronze age?), and that 100 years from now, people will be doing things radically differently than we do today because they (or we) will have figured out better best practices, and the languages of the day will usher people towards increased success with decreased effort.

This post is on something called declarative programming. The idea is nothing new, as prolog from 1972 is a declarative language, but the idea of declarative programming is something I don’t think is talked about enough in the context of code quality.

By the end of this read, I hope you will agree that programming declaratively by default is a good best practice that pertains to all languages and problem domains. If not, leave a comment and let me know why!

## Declarative vs Imperative Programming

Declarative programming is when you write code that says what to do. Imperative programming is when you write code that says how to do it.

Below is some C++ code written imperatively. How long does it take you to figure out what the code is doing?

	int values[4] = { 8, 23, 2, 4 };
int sum = 0;
for (int i = 0; i < 4; ++i)
sum += values[i];
int temp = values[0];
for (int i = 0; i < 3; ++i)
values[i] = values[i + 1];
values[3] = temp;


Hopefully it didn’t take you very long to understand the code, but you had to read it line by line and reason about what each piece was doing. It may not be difficult, but it wasn’t trivial.

Here is the same code with some comments, which helps it be understandable more quickly, assuming the comments haven’t become out of date (:

	// Declare array
int values[4] = { 8, 23, 2, 4 };

// Calculate sum
int sum = 0;
for (int i = 0; i < 4; ++i)
sum += values[i];

// Rotate array items one slot left.
int temp = values[0];
for (int i = 0; i < 3; ++i)
values[i] = values[i + 1];
values[3] = temp;


Here is some declarative code that does the same thing:

	int values[4] = { 8, 23, 2, 4 };
int sum = SumArray(values);
RotateArrayIndices(values, -1);


The code is a lot quicker and easier to understand. In fact the comments aren’t even needed anymore because the code is basically what the comments were.

Comments are often declarative, saying what to do right next to the imperative code that says how to do it. If your code is also declarative though, there is no need for the declarative comments because they are redundant! In fact, if you decide to start trying to write code more declaratively, one way to do so is if you ever find yourself writing a declarative comment to explain what some code is doing, wrap it in a new function, or see if there is an existing function you ought to be using instead.

As a quick tangent, you can use the newer C++ features to make code more declarative, like the below. You arguably should be doing that when possible, if your code base uses STL, a custom STL implementation, or an in house STL type replacement, but I want to stress that this is a completely separate topic than whether or not we should be using new C++ features. Some folks not used to STL will find the below hard to read compared to the last example, which takes away from the main point. So, if you aren’t a fan of STL due to it’s readability (I agree!), or it’s performance characteristics (I also agree!), don’t worry about it. For people on the other side of the fence, you can take this as a pro STL argument though, as it does make code more declarative, if the readability and perf things aren’t impacting you.

	std::array<int,4> values = { 8, 23, 2, 4 };
int sum = std::accumulate(values.begin(), values.end(), 0);
std::rotate(values.begin(), values.begin() + 1, values.end());


## We Already Live in a Semi-Declarative World

When reading the tip about using (declarative) comments as a hint for when to break some functionality out into another function, you may be thinking to yourself: “Wait, isn’t that just the rule about keeping functions small, like to a few lines per function?”

Yeah, that is true. There is overlap between that rule and writing declarative code. IMO declarative code is a more general version of that rule. That rule is part of making code declarative, and gives some of the benefits, but isn’t the whole story.

The concept of D.R.Y. “Don’t Repeat Yourself” also ends up causing your code to become more declarative. When you are repeating yourself, it’s often because you are either duplicating code, or because there is boiler plate code that must be added in multiple places to make something work. By applying D.R.Y. and making a single authoritative source of your information or work, you end up taking imperative details out of the code, thus making what remains more declarative. For more information on that check out this post of mine: Macro Lists For The Win

## TDD

If your particular engineering culture uses TDD (test driven development), you may also say “Hey, this isn’t actually anything special, this is also what you get when you use TDD.”

Yes, that is also true. Test driven development forces you to write code such that each individual unit of work is broken up into it’s own contained, commonly stateless, function or object.

It’s suggested that the biggest value of TDD comes not from the actual testing, but from how TDD forces you to organize your code into small logical units of work, that are isolatable from the rest of the code.

In other words, TDD forces you to make smaller functions that do exactly what they say by their name and nothing else. Sound familiar? Yep, that is declarative programming.

## Compilers, Optimizers and Interpreters

The whole goal of compilers, optimizers and interpreters is to make it so you the coder can be more declarative and less imperative.

Compilers make it so you don’t have to write assembly (assembly being just about as imperative as you can get!). You can instead write higher level concepts about what you want done – like loop over an array and sum up the values – instead of having to write the assembly (or machine code!) to load values into memory or registers, do work, and write them back out to memory or registers.

Similarly, the whole goal of optimizers are to take code where you describe what you want to happen, and find a way to do the equivalent functionality in a faster way. In other words, you give the WHAT and it figures out the HOW. That is declarative programming.

Interestingly, switch statements are declarative as well. You tell the compiler what items to test for at run time but leave it up to the compiler to figure out how to test for them. It turns out that switch statements can decide at compile time whether they want to use binary searching, if/else if statements, or other tricks to try and make an efficient lookup for the values you’ve provided.

Surprised to hear that? Give this a read: Something You May Not Know About the Switch Statement in C/C++

Similarly, profile guided optimization (PGO) is a way for the optimizer to know how your code actually behaves at runtime, to get a better idea at what machine code it ought to generate to be more optimal. Again, it’s taking your more declarative high level instructions, and creating imperative low level instructions that actually handle the HOW of doing what your code wants to do in a better way.

## C#

If you’ve spent any time using C#, I’ll bet you’ve come to the same conclusion I have: If it takes you more than one line of code to do a single logical unit of work (read a file into a string, sort a list, etc), then you are probably doing it wrong, and there is probably some built in functionality already there to do it for you.

When used “correctly”, C# really tends to be declarative.

## C++ Advancements Towards Being Declarative

In the old days of C, there were no constructors or destructors. This meant that you had to code carefully and initialize, deinitialize things at just the right time.

These were imperative details that if you got wrong, would cause bugs and make a bad day for you and the users of your software.

C++ improved on this problem by adding constructors and destructors. You could now put these imperative details off in another section and then not worry about it in the bulk of the code. C++ made C code more declarative by letting you focus more on the WHAT to do, and less on HOW to do it, in every line of code.

In more recent years, we’ve seen C++ get a lot of upgrades, many of which make C++ more declarative. In other words, common things are now getting language and/or STL library support.

For one, there are many operations built in which people used to do by hand that are now built in – such as std::sort or std::qsort. You no longer have to write out a sorting algorithm imperatively, you just use std::sort and move on.

Another really good example of C++ getting more declarative is lambdas. Lambdas look fancy and new, but they are really just a syntactic shortcut to doing something we could do all along. When you make a lambda, the compiler makes a class for you that overloads the parentheses operator, has storage for your captures and captures those captures. A struct that looks like this is called a functor and has existed for a long time before lambdas ever entered C++. The only difference is that if you want to use a functor now, you don’t have to go through a bunch of nitty gritty imperative details for making your functor class. Now, you just defined a lambda and move on.

## Domain Specific Languages

Domain specific languages – aka DSLs – exist to let people write code meant for specific purposes. Some examples of DSLs are:

• HTML – what we use to make static webpages
• XSLT – a language to transform XML data into other data
• SQL – a language to query information from databases
• Regex – a language to search text

Believe it or not, DSL is a synonym of declarative programming languages.

HTML for instance completely cuts out things like loops, memory allocation and image loading, and lets you just specify how a web page should look. HTML is declarative because you deal only with the issues in the problem space, not with the imperative details of how to make it all happen.

It’s similar for the others in the list, and other DSLs not on the list. They all try to remove complexity you don’t care about to try and distill a language that deals only with the things in the problem space.

## Our Job As Programmers

As programmers, it’s only one part of our job to make “good code” that is easy to read and easy to maintain, but many non programmers would laugh to hear that we spend so much time thinking about that.

The other part of our job is the end result of what our program does. This is what non programmers focus more heavily on of course, and is ultimately what makes software successful or not – at least in the short term. Software needs to do good stuff well to be successful, but if you don’t make good code, you are going to sink your business in bugs, inflexibility, maintenance costs, etc.

Programmers mainly either write code for use by other programmers (such as libraries and APIs), or they make software for use by other people.

In both cases, the job is really that we are trying to hide away imperative details (implementation complexity) and give our customers a domain specific language to do what they want to do in the easiest and best way possible. It’s very important in both cases that the users of your API or the users of your software don’t have to deal with things outside the problem space. They want to work declaratively, saying only what to do, and have our software handle the imperative details of how to do it. They paid for the software so they didn’t have to deal with those details.

As an example, when you work in an excel spreadsheet and it does an average of a row of columns, it doesn’t make you decide whether it should use SIMD instructions to do the math or scalar instructions. You don’t really care, and it almost certainly doesn’t matter enough to anyone using excel which to do, so excel just does whatever it does internally to give you what you asked for.

It can be a challenge knowing what needs to be hidden away when making an API or software for users, but that comes from understanding what it is that your customers actually need and what they are trying to do, which is already a super important step.

The good news is that you don’t have to perfectly match the customers needs to improve their lives. Any imperative details that you can remove is a win. I’m not talking about taking away abilities that people want and should have, I’m talking about removing “chores”, especially ones that if done wrong can cause problems – like nulling out a pointer after deleting it, or initializing all members of a class when creating an object, or the details of loading an image into memory.

None of this should really be that surprising to anyone, but hopefully thinking of these things in a declarative vs imperative way formalizes and generalizes some ideas.

## Why Wouldn’t You Program Declaratively?

Purely declarative programming means that you only describe the things you care about and nothing else. If this is true, why wouldn’t you ALWAYS program declaratively? In fact, why do imperative languages even exist? Why would you ever want to waste time dealing with what you by definition did not care about?

Well, for one, it’s impossible to nail down what it is exactly that people do and do not care about, especially in something like a programming language which is likely to be used for lots of different purposes by lots of different people. It’s been real eye opening seeing the diverse needs of the C++ community in recent years for instance. As a C++ game programmer, surrounded by primarily C++ game programmers, I thought I knew what the language needed, but there are lots of things I never considered because I don’t have a need for, unlike some other C++ programmers out there.

Another big point is that declarative languages by definition are a sort of black box. You tell it what to do but not how. It has to figure out the details of how to do it in a good way. The problem is that the compiler (or similar process) has limited abilities to make these choices, and also has limited information about the problem space.

For instance, a declarative language may let you work with a set and say “put item X into the set” and “does item Y exist in this set?”. You can imagine it could perhaps use a hash table, where each hash bucket was a linked list of values. This way, when you queried if the item Y was in the set, it could hash it, then do comparisons against whatever items were in that bucket.

That implementation is fairly reasonable for many programs.

What if instead, you want to keep a set of unique visitors to a popular website, like say google.com? That set is going to use a ton of memory and/or be very slow because it’s going to be HUGE.

In that case, someone is likely to go with a probabilistic algorithm perhaps (maybe a bloom filter), where it’s ok that the answer isn’t exactly right, because the memory usage and computation time drops off significantly going probabilistic, and actually makes the feature possible.

The declarative language is likely not going to be smart enough to figure out that it should use a probabilistic algorithm, and nobody ever told it that it could.

Sure, you could add probabilistic set support to the declarative language, and people could specifically ask for it when they need it (they care about it, so it should be part of the DSL), but we could make this argument about many other things. The point is just that without super smart AI and lots more information (and freedom to make decisions independently of humans), a declarative language is going to be pretty limited in how well it can do in all situations.

Because of this, it’s important for the programmer to be able to profile processing time and other resource usage, and be able to “go imperative” where needed to address any problems that come up.

This is similar to how when writing C++, when we REALLY NEED TO, we can write some inline assembly. The C++ is the more declarative language, that allows you to write more imperative assembly when you need to.

It’s important to note that I’m not saying that declarative programming is inherently slower than imperative programming though. Declarative languages can actually be much faster and more efficient with resources believe it or not. In the example at the beginning of the post where i used std::rotate to replace a loop that moved items in an array, it’s possible that std::rotate uses a memmove to move the bulk of the items, instead of an item by item copy like what I coded. That would be a much better solution, especially for large array sizes.

So, declarative programming isn’t necessarily slower than imperative programming, but, for the times it isn’t doing well enough, we need a way to turn off “auto pilot” mode and give imperative instructions for how to do something better.

In more human terms: If you asked someone to go get the mail, you likely will say “can you get my mail? here’s the key and it’s in box 62.”. You wouldn’t tell the person how to walk to the door, open it, walk out, close it, etc. However, if there were special instructions such as “please check the package locker too”, you would give those details.

Basically, you give only the details that are needed, as simply as possible, but reserve the right to give as fine grained details as needed, when they are needed.

So, i propose this:

• We as programmers ought to be programming declaratively by default, only resorting to imperative programming when we need to.
• Our job is to empower our customers to work declaratively by making them good DSLs (aka interfaces), but we should remember that it might be important to let them also go more imperative when needed.

Here are some interesting links about managing code complexity and writing high quality code:
Functions Should Be Short And Sweet, But Why?
Bitsquid: Managing Coupling
Thoughts on Declarative and Imperative Languages
Declarative vs. Imperative Programming for the Web

## Low Tech Homomorphic Encryption

Homomorphic encryption is a special type of encryption that lets you do calculations on encrypted values as if they weren’t encrypted. One reason it’s desired is that secure computing could be done in the cloud, if practical homomorphic encryption were available.

Homomorphic encryption has been a hot research topic since 2009, when Craig Gentry figured out a way to do it while working on his PhD. Since then, people have been working on making it better, faster and more efficient.

You can read more about a basic incarnation of his ideas in my blog posts:
Super Simple Symmetric Leveled Homomorphic Encryption Implementation
Improving the Security of the Super Simple Symmetric Leveled Homomorphic Encryption Implementation

This post is about a low tech type of homomorphic encryption that anyone can easily do and understand. There is also some very simple C++ that implements it.

This idea may very well be known about publically, but I’m not aware of any places that talk about it. I may just be ignorant of them though so ::shrug::

## Quick Update

I’ve gotten some feedback on this article, the most often feedback being that this is obfuscation not encryption. I think that’s a fair assessment as the secret value you are trying to protect is in no way transformed, but is just hidden. This post could easily be titled Homomorphic Obfuscation, and perhaps should be.

To see other feedback and responses to this post, check out the reddit links at the bottom!

## The Idea

The idea is actually super simple:

1. Take the value you want to encrypt.
2. Hide it in a list of a bunch of other random values, and remember where it is in the list. The position in the list is your key.
3. Send this list to an untrusted party.
4. They do the same calculation on every item in the list and send it back.
5. Since you know which value was your secret value, you know which answer is the one you care about.

At the end of that process, you have the resulting value, and they have no idea what value was your secret value. You have done, by definition, homomorphic encryption!

There is a caveat of course… they know that your secret value was ONE of the values on the list.

## Security Details

The thing here is that security is a sliding scale between resource usage (computation time, RAM, network bandwidth, etc) and security.

The list size is your security parameter in this case.

A larger list of random values means that it takes longer to transfer the data, more memory to store it, it takes longer to do the homomorphic computations, but the untrusted party is less sure about what your secret value is.

On the other hand, a shorter list is faster to transmit, easier to store, quicker to compute with, but the untrusted party has a better idea what your secret value is.

For maximal security you can just take this to the extreme – if your secret value is a 32 bit floating point number, you could make a list with all possible 2^32 floating point numbers in it, have them do the calculation and send it back. You can even do an optimization here and not even generate or send the list, but rather just have the person doing the calculations generate the full 2^32 float list, do the calculations, and send you the results.

That gets pretty big pretty fast though. That list would actually be 16 gigabytes, but the untrusted party would know almost nothing about your value, other than it can be represented by a 32 bit floating point number.

Depending on your security needs, you might be ok with shortening your list a bit to bring that number down. Making your list only be one million numbers long (999,999 random numbers and one value you actually care about), your list is only 3.8 megabytes.

## Some Interesting Abilities

Using this homomorphic encryption, like other homomorphic encryption, you can do computation involving multiple encrypted values. AKA you could multiply two encrypted values together. To do this, you are going to need to encrypt all values involved using the same key. In other words, they are going to have to be at the same index in each of their respective lists of random numbers.

Something else that is interesting is that you can also encode MULTIPLE secret values in your encrypted value list. You could have 1 secret value at index 50 and another at index 100 for instance. Doing this, you get a sort of homomorphic SIMD setup.

Homomorphic SIMD is actually a real thing in other homomorphic encryption methods as well. Check out this paper for instance:
Fully Homomorphic SIMD Operations

The only problem with homomorphic SIMD is that adding more secret values to the same encrypted list decreases the security, since there are more values in the list that you don’t want other people to know about.

You can of course also modify encrypted values by unencrypted values. You could multiply an encrypted value by 3, by multiplying every value in the list by 3.

## Extending to Public Key Cryptography

If you wanted to use asymmetric key cryptography (public/private keys) instead of symmetric key cryptography, that is doable here too.

What you would do is have the public key public as per usual, and that key would be used in a public key algorithm to encrypt the index of the secret value in the random list.

Doing this, the person who has the private key would be able to receive the list and encrypted index, decrypt the index, and then get the secret value out using that index.

## Sample Code Tests

The sample code only does Symmetric key encryption, and does these 3 tests:

1. Encrypts two floating point numbers into a single list, SIMD style, does an operation on the encrypted values, then unencrypts and verifies the results.
2. Does the same with two sets of floats (three floats in each set), to show how you can make encrypted values interact with each other. Does the operation, then unencrypts and verifies the results.
3. Encrypts three values of a 3 byte structure, does an operation on the encrypted values, then unencrypts and verifies the results.

All secret data was hidden in lists of 10,000,000 random values. That made the first two tests (the ones done with 4 byte floats) have encrypted files of 38.1MB (40,000,000 bytes), and the last test (the one done with a 3 byte struct) had a file size of 28.6 MB (30,000,000 bytes).

Here are the timing of the above tests:

## Sample Code

/*

Written by Alan Wolfe
http://blog.demofox.org

*/

#pragma once
#include <vector>
#include <random>

// A static class with template functions in it.
// A namespace would be nice, except I want to hide some things as private.
class LTHE
{
public:

//=================================================================================
template <typename T>
static bool Encrypt (std::vector<T> values, size_t listSize, const char* fileName, std::vector<size_t>& keys, bool generateKeys = true)
{
// Make sure we have a list that is at least as long as the values we want to encrypt
if (values.size() > listSize)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): values.size() > listSize.\n");
return false;
}

// Generate a list of keys if we are told to
// Ideally you want to take the first M items of a cryptographically secure shuffle
// of N items.
// This could be done with format preserving encryption or some other method
// to make it not roll and check, and also more secure random.
if (generateKeys)
{
keys.clear();
for (size_t i = 0, c = values.size(); i < c; ++i)
{
size_t newKey;
do
{
newKey = RandomInt<size_t>(0, listSize - 1);
}
while (std::find(keys.begin(), keys.end(), newKey) != keys.end());
keys.push_back(newKey);
}
}

// make a file of random values, size of T, count of <listSize>
FILE *file = fopen(fileName, "w+b");
if (!file)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for writing.\n", fileName);
return false;
}

// Note: this may not be the most efficient way to generate this much random data or
// write it all to the file.
// In a real crypto usage case, you'd want a crypto secure random number generator.
// You'd also want to make sure the random numbers had the same properties as your
// input values to help anonymize them better.
// Like if your numbers are not whole numbers, you don't want to generate only whole numbers.
// Or if your numbers are salaries, you may not want purely random values, but more "salaryish"
// looking numbers.
// You could alternately just do all 2^N possible values which would definitely anonymize
// the values you wanted to encrypt.  This is maximum security, but also takes most
// memory and most processing time.
size_t numUint32s = (listSize * sizeof(T)) / sizeof(uint32_t);
size_t numExtraBytes = (listSize * sizeof(T)) % sizeof(uint32_t);
for (size_t i = 0; i < numUint32s; ++i)
{
uint32_t value = RandomInt<uint32_t>();
if (fwrite(&value, sizeof(value), 1, file) != 1)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not write random numbers (uint32s).\n");
fclose(file);
return false;
}
}
for (size_t i = 0; i < numExtraBytes; ++i)
{
uint8_t value = RandomInt<uint8_t>();
if (fwrite(&value, sizeof(value), 1, file) != 1)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not write random numbers (extra bytes).\n");
fclose(file);
return false;
}
}

// Now put the values in the file where they go, based on their key
for (size_t i = 0, c = values.size(); i < c; ++i)
{
long pos = (long)(keys[i] * sizeof(T));
if (fseek(file, pos, SEEK_SET) != 0)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not fseek.\n");
fclose(file);
return false;
}
if (fwrite(&values[i], sizeof(values[i]), 1, file) != 1)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not write secret value.\n");
fclose(file);
return false;
}
}

// close file and return success
fclose(file);
return true;
}

//=================================================================================
template <typename T, typename LAMBDA>
static bool TransformHomomorphically (const char* srcFileName, const char* destFileName, const LAMBDA& function)
{
// open the source and dest file if we can
FILE *srcFile = fopen(srcFileName, "rb");
if (!srcFile)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for reading.\n", srcFileName);
return false;
}
FILE *destFile = fopen(destFileName, "w+b");
if (!destFile)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for writing.\n", destFileName);
fclose(srcFile);
return false;
}

// Process the data in the file and write it back out.
// This could be done much better.
// We could read more from the file at once.
// We could use SIMD.
// We could do this on the GPU for large data sets and longer transformations! Assuming data transfer time isn't too prohibitive.
// We could decouple the disk access from processing, so it was reading and writing while it was processing.
const size_t c_bufferSize = 1024;
std::vector<T> dataBuffer;
dataBuffer.resize(c_bufferSize);
do
{
// read data from the source file

// transform the data
for (size_t i = 0; i < elementsRead; ++i)
dataBuffer[i] = function(dataBuffer[i]);

// write the transformed data to the dest file
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not write transformed elements.\n");
fclose(srcFile);
fclose(destFile);
return false;
}
}
while (!feof(srcFile));

// close files and return success
fclose(srcFile);
fclose(destFile);
return true;
}

//=================================================================================
template <typename T, typename LAMBDA>
static bool TransformHomomorphically (const char* src1FileName, const char* src2FileName, const char* destFileName, const LAMBDA& function)
{
// open the source and dest file if we can
FILE *srcFile1 = fopen(src1FileName, "rb");
if (!srcFile1)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for reading.\n", src1FileName);
return false;
}
FILE *srcFile2 = fopen(src2FileName, "rb");
if (!srcFile2)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for reading.\n", src2FileName);
fclose(srcFile1);
return false;
}
FILE *destFile = fopen(destFileName, "w+b");
if (!destFile)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for writing.\n", destFileName);
fclose(srcFile1);
fclose(srcFile2);
return false;
}

// Process the data in the file and write it back out.
// This could be done much better.
// We could read more from the file at once.
// We could use SIMD.
// We could do this on the GPU for large data sets and longer transformations! Assuming data transfer time isn't too prohibitive.
// We could decouple the disk access from processing, so it was reading and writing while it was processing.
const size_t c_bufferSize = 1024;
std::vector<T> dataBuffer1, dataBuffer2;
dataBuffer1.resize(c_bufferSize);
dataBuffer2.resize(c_bufferSize);
do
{
// read data from the source files

{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Different numbers of elements in each file!\n");
fclose(srcFile1);
fclose(srcFile2);
fclose(destFile);
return false;
}

// transform the data
for (size_t i = 0; i < elementsRead1; ++i)
dataBuffer1[i] = function(dataBuffer1[i], dataBuffer2[i]);

// write the transformed data to the dest file
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not write transformed elements.\n");
fclose(srcFile1);
fclose(srcFile2);
fclose(destFile);
return false;
}
}
while (!feof(srcFile1));

// close files and return success
fclose(srcFile1);
fclose(srcFile2);
fclose(destFile);
return true;
}

//=================================================================================
template <typename T>
static bool Decrypt (const char* fileName, std::vector<T>& values, std::vector<size_t>& keys)
{
// Open the file if we can
FILE *file = fopen(fileName, "rb");
if (!file)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not open %s for reading.\n", fileName);
return false;
}

// Read the values from the file.  The key is their location in the file.
values.clear();
for (size_t i = 0, c = keys.size(); i < c; ++i)
{
long pos = (long)(keys[i] * sizeof(T));
if (fseek(file, pos, SEEK_SET) != 0)
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not fseek.\n");
fclose(file);
return false;
}
T value;
{
fprintf(stderr, "ERROR in " __FUNCTION__ "(): Could not decrypt value for key.\n");
fclose(file);
return false;
}
values.push_back(value);
}

// Close file and return success
fclose(file);
return true;
}

private:
template <typename T>
static T RandomInt (T min = std::numeric_limits<T>::min(), T max = std::numeric_limits<T>::max())
{
static std::random_device rd;
static std::mt19937 mt(rd());
static std::uniform_int<T> dist(min, max);
return dist(mt);
}
};


And here is the test program, main.cpp:

#include <stdio.h>
#include "LTHE.h"
#include <chrono>

//=================================================================================
// times a block of code
struct SBlockTimer
{
SBlockTimer()
{
m_start = std::chrono::high_resolution_clock::now();
}

~SBlockTimer()
{
std::chrono::duration<float> seconds = std::chrono::high_resolution_clock::now() - m_start;
printf("    %0.2f seconds\n", seconds.count());
}

std::chrono::high_resolution_clock::time_point m_start;
};

//=================================================================================
float TransformDataUnitary (float& value)
{
return (float)sqrt(value * 2.17f + 0.132);
}

//=================================================================================
float TransformDataBinary (float& value1, float value2)
{
return (float)sqrt(value1 * value1 + value2 * value2);
}

//=================================================================================
struct SStruct
{
uint8_t x, y, z;

static SStruct Transform (const SStruct& b)
{
SStruct ret;
ret.x = b.x * 2;
ret.y = b.y * 3;
ret.z = b.z * 4;
return ret;
}

bool operator != (const SStruct& b) const
{
return b.x != x || b.y != y || b.z != z;
}
};

//=================================================================================
int Test_FloatUnitaryOperation ()
{
printf("\n----- " __FUNCTION__ " -----\n");

// Encrypt the data
printf("Encrypting data:  ");
std::vector<float> secretValues = { 3.14159265359f, 435.0f };
std::vector<size_t> keys;
{
SBlockTimer timer;
if (!LTHE::Encrypt(secretValues, 10000000, "Encrypted.dat", keys))
{
fprintf(stderr, "Could not encrypt data.\n");
return -1;
}
}

// Transform the data
printf("Transforming data:");
{
SBlockTimer timer;
if (!LTHE::TransformHomomorphically<float>("Encrypted.dat", "Transformed.dat", TransformDataUnitary))
{
fprintf(stderr, "Could not transform encrypt data.\n");
return -2;
}
}

// Decrypt the data
printf("Decrypting data:  ");
std::vector<float> decryptedValues;
{
SBlockTimer timer;
if (!LTHE::Decrypt("Transformed.dat", decryptedValues, keys))
{
fprintf(stderr, "Could not decrypt data.\n");
return -3;
}
}

// Verify the data
printf("Verifying data:   ");
{
SBlockTimer timer;
for (size_t i = 0, c = secretValues.size(); i < c; ++i)
{
if (TransformDataUnitary(secretValues[i]) != decryptedValues[i])
{
fprintf(stderr, "decrypted value mismatch!\n");
return -4;
}
}
}

return 0;
}

//=================================================================================
int Test_FloatBinaryOperation ()
{
printf("\n----- " __FUNCTION__ " -----\n");

// Encrypt the data
printf("Encrypting data:  ");
std::vector<float> secretValues1 = { 3.14159265359f, 435.0f, 1.0f };
std::vector<float> secretValues2 = { 1.0f, 5.0f, 9.0f };
std::vector<size_t> keys;
{
SBlockTimer timer;
if (!LTHE::Encrypt(secretValues1, 10000000, "Encrypted1.dat", keys))
{
fprintf(stderr, "Could not encrypt data.\n");
return -1;
}
if (!LTHE::Encrypt(secretValues2, 10000000, "Encrypted2.dat", keys, false)) // reuse the keys made for secretValues1
{
fprintf(stderr, "Could not encrypt data.\n");
return -1;
}
}

// Transform the data
printf("Transforming data:");
{
SBlockTimer timer;
if (!LTHE::TransformHomomorphically<float>("Encrypted1.dat", "Encrypted2.dat", "Transformed.dat", TransformDataBinary))
{
fprintf(stderr, "Could not transform encrypt data.\n");
return -2;
}
}

// Decrypt the data
printf("Decrypting data:  ");
std::vector<float> decryptedValues;
{
SBlockTimer timer;
if (!LTHE::Decrypt("Transformed.dat", decryptedValues, keys))
{
fprintf(stderr, "Could not decrypt data.\n");
return -3;
}
}

// Verify the data
printf("Verifying data:   ");
{
SBlockTimer timer;
for (size_t i = 0, c = secretValues1.size(); i < c; ++i)
{
if (TransformDataBinary(secretValues1[i], secretValues2[i]) != decryptedValues[i])
{
fprintf(stderr, "decrypted value mismatch!\n");
return -4;
}
}
}

return 0;
}

//=================================================================================
int Test_StructUnitaryOperation ()
{
printf("\n----- " __FUNCTION__ " -----\n");

// Encrypt the data
printf("Encrypting data:  ");
std::vector<SStruct> secretValues = { {0,1,2},{ 3,4,5 },{ 6,7,8 } };
std::vector<size_t> keys;
{
SBlockTimer timer;
if (!LTHE::Encrypt(secretValues, 10000000, "Encrypted.dat", keys))
{
fprintf(stderr, "Could not encrypt data.\n");
return -1;
}
}

// Transform the data
printf("Transforming data:");
{
SBlockTimer timer;
if (!LTHE::TransformHomomorphically<SStruct>("Encrypted.dat", "Transformed.dat", SStruct::Transform))
{
fprintf(stderr, "Could not transform encrypt data.\n");
return -2;
}
}

// Decrypt the data
printf("Decrypting data:  ");
std::vector<SStruct> decryptedValues;
{
SBlockTimer timer;
if (!LTHE::Decrypt("Transformed.dat", decryptedValues, keys))
{
fprintf(stderr, "Could not decrypt data.\n");
return -3;
}
}

// Verify the data
printf("Verifying data:   ");
{
SBlockTimer timer;
for (size_t i = 0, c = secretValues.size(); i < c; ++i)
{
if (SStruct::Transform(secretValues[i]) != decryptedValues[i])
{
fprintf(stderr, "decrypted value mismatch!\n");
return -4;
}
}
}

return 0;
}

//=================================================================================
int main (int argc, char **argv)
{
// test doing an operation on a single encrypted float
int ret = Test_FloatUnitaryOperation();
if (ret != 0)
{
system("pause");
return ret;
}

// test doing an operation on two encrypted floats
ret = Test_FloatBinaryOperation();
if (ret != 0)
{
system("pause");
return ret;
}

// test doing an operation on a single 3 byte struct
ret = Test_StructUnitaryOperation();
if (ret != 0)
{
system("pause");
return ret;
}

printf("\nAll Tests Passed!\n\n");
system("pause");
return 0;
}


If you found this post interesting or useful, or you have anything to add or talk about, let me know!

Reddit discussion:
r/programming
r/cryptography

## A Data Point for MSVC vs Clang Code Generation

I’m a windows PC game developer using MSVC 2015 update 1, working in C++.

More and more, I hear people talk about how great clang is, and that it generates much better code than MSVC, among other niceties.

I have been chalking it up to maybe just being a fad, but keeping my feelers out there to see if I can get some concrete comparitive info.

Well, I’ve stumbled on some of that concrete comparitive info and want to share it with anyone else who is wondering if clang really is better than MSVC. This is just one data point, but it feels like it’s just the tip of the iceberg.

I wasn’t really sure what he was getting at, so clicked the link to check it out (http://godbolt.org/g/WDpPYq).

It turned out to be very relevant to my interest, because his particular example is comparing a value to a bunch of compile time constants. That is basically the core of what I’ve been looking into with my last few posts asking whether code or data was faster!

This first particular example is comparing a compile time constant to other compile time constants, so the code completely melts away and at runtime just returns the compile time calculated result. That isn’t very interesting, but it is nice to see that clang did so much at compile time. FWIW MSVC was able to do this all at compile time as well, so they are even so far.

What is more interesting is what happens when you test a compile time unknown against compile time constants. Let’s check it out… (https://godbolt.org/g/dKBDSK)

What the assembly does is subtract 1 from the input (it’s unsigned so if 0, wraps around to max int value), and then compares it against 5 to know if it’s in the group or not. Clang realized the numbers were continuous and so made a nice optimization.

In this case, MSVC did similar in release x64:

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
bool match = false;
(void)std::initializer_list<bool>{ (match = match || u == t)... };
return match;
}

int main(int argc, char** argv)
{
00007FF62DB61000  lea         eax,[rcx-1]
00007FF62DB61003  cmp         eax,4
00007FF62DB61006  setbe       al
return one_of(argc, 1, 2, 3, 4, 5);
00007FF62DB61009  movzx       eax,al
}
00007FF62DB6100C  ret


But in x86 release it did a bunch of if/else if/else if’s!

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
bool match = false;
(void)std::initializer_list<bool>{ (match = match || u == t)... };
return match;
}

int main(int argc, char** argv)
{
00331002  in          al,dx
return one_of(argc, 1, 2, 3, 4, 5);
00331003  mov         eax,dword ptr [argc]
00331006  cmp         eax,1
00331009  je          main+26h (0331026h)
0033100B  cmp         eax,2
0033100E  je          main+26h (0331026h)
00331010  cmp         eax,3
00331013  je          main+26h (0331026h)
00331015  cmp         eax,4
00331018  je          main+26h (0331026h)
0033101A  cmp         eax,5
0033101D  je          main+26h (0331026h)
0033101F  xor         al,al
00331021  movzx       eax,al
}
00331024  pop         ebp
00331025  ret
return one_of(argc, 1, 2, 3, 4, 5);
00331026  mov         al,1
00331028  movzx       eax,al
}
0033102B  pop         ebp
0033102C  ret


You are probably asking “what does clang do in x86?” well it turns out it does the same thing as in x64, it doesn’t fall back to if/else if/else if like MVSC does (proof: add -m32 in goldbolt. https://godbolt.org/g/khnrtO). One point to clang!

What if the numbers are not so continuous though? It turns out it can actually switch to using a binary search! (https://godbolt.org/g/iBkqja)

MSVC on the other hand just does a bunch of if/else if/else if tests, in both x86 release and x64 release.

#include <initializer_list>

template<typename U, typename ... T>
bool one_of(U&& u, T && ... t)
{
bool match = false;
(void)std::initializer_list<bool>{ (match = match || u == t)... };
return match;
}

int main(const int argc, const char *[])
{
00007FF6C05A1000  cmp         ecx,1AB42h
00007FF6C05A1006  je          main+63h (07FF6C05A1063h)
00007FF6C05A1008  cmp         ecx,40Fh
00007FF6C05A100E  je          main+63h (07FF6C05A1063h)
00007FF6C05A1010  cmp         ecx,0B131h
00007FF6C05A1016  je          main+63h (07FF6C05A1063h)
00007FF6C05A1018  cmp         ecx,93BBh
00007FF6C05A101E  je          main+63h (07FF6C05A1063h)
00007FF6C05A1020  cmp         ecx,121Bh
00007FF6C05A1026  je          main+63h (07FF6C05A1063h)
00007FF6C05A1028  cmp         ecx,0EE9h
00007FF6C05A102E  je          main+63h (07FF6C05A1063h)
00007FF6C05A1030  cmp         ecx,0E1Fh
00007FF6C05A1036  je          main+63h (07FF6C05A1063h)
00007FF6C05A1038  cmp         ecx,995h
00007FF6C05A103E  je          main+63h (07FF6C05A1063h)
00007FF6C05A1040  cmp         ecx,5FEh
00007FF6C05A1046  je          main+63h (07FF6C05A1063h)
00007FF6C05A1048  cmp         ecx,5BFh
00007FF6C05A104E  je          main+63h (07FF6C05A1063h)
00007FF6C05A1050  cmp         ecx,5
00007FF6C05A1053  je          main+63h (07FF6C05A1063h)
00007FF6C05A1055  cmp         ecx,0FFFEh
00007FF6C05A105B  je          main+63h (07FF6C05A1063h)
return one_of(argc, 1471, 2453, 3817, 45361, 37819, 109378, 1534, 4635, 1039, 3615, 5, 65534);
00007FF6C05A105D  xor         al,al
00007FF6C05A105F  movzx       eax,al
}
00007FF6C05A1062  ret
return one_of(argc, 1471, 2453, 3817, 45361, 37819, 109378, 1534, 4635, 1039, 3615, 5, 65534);
00007FF6C05A1063  mov         al,1
00007FF6C05A1065  movzx       eax,al
}
00007FF6C05A1068  ret


## Closing

This is just one data point about how clang is better than MSVC, but it is a data point. I’m betting there are more if we looked for them.

This makes me wonder how switch statements do in clang vs msvc, and also makes me wonder if clang ever uses jump tables or more advanced data structures in either switch statements, or other code that does comparison against a potentially large number of compile time constants. Those thoughts are driven by this things seen in this article: Something You May Not Know About the Switch Statement in C/C++

The examples above used C++14 level C++ to implement “one_of”. If you can use C++17 level C++, you can also implement it this way, which also does the binary search (Also written by Jason Turner):
https://godbolt.org/g/RZgjRQ

PS wouldn’t it be nice if godbolt supported MSVC so we could do this sort of analysis on MSVC code? It’s in the works, but unsure when it’ll be available. Apparently licensing isn’t the issue, so lets hope it comes sooner rather than later! If you want it, maybe ping @mattgodbolt and let him know how much you want that functionality (:

Have any other clang vs MSVC info? If so, I’d love to hear about it!