Taking a Stroll Between The Pixels

This post relates to a paper I wrote which talks about (ab)using linear texture interpolation to calculate points on Bezier curves. Extensions generalize it to Bezier surfaces and (multivariate) polynomials. All that can be found here: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

The original observation was that if you sample along the diagonal of a 2×2 texture, that as output you get points on a quadratic Bezier curve with the control points of the curve being the values of the pixels like in the image below. When I say you get a quadratic Bezier curve, I mean it literally, and exactly. One way of looking at what’s going on is that the texture interpolation is literally performing the De Casteljau algorithm. (Note: if the “B” values are not equal in the setup below, the 2nd control point will be the average of these two values, which an extension abuses to fit more curves into a smaller number of pixels.)

An item that’s been on my todo list for a while is to look and see what happens when you sample off of the 45 degree diagonal between the pixel values. I was curious about questions like:

• What if we sampled across a different line?
• What if we samples across a quadratic curve like by having $y=x^2$?
• What if we sampled on a circle or a sine wave?
• How does the changed sampling patterns work in higher dimensions – like trilinear or quadrilinear interpolation?

After accidentally coming across the answer to the first question, it was time to look into the other ones too!

PS – if wondering “what use can any of this possibly have?” the best answer I have there is data compression for data on the GPU. If you can fit your data with piecewise rational polynomials, the ideas of this technique could be useful for storing that data in a concise way (pixels in a texture) that are also quickly and easily decoded by the GPU. The ideas from this post allows for more curve types when fitting and storing your data, beyond piecewise rational polynomials. It’s also possible to store higher order curves and surfaces into smaller amounts of texture data.

Quick Setup: Bilinear Interpolation Formula

Bilinear interpolation is available on modern GPUs as a way of getting sub-pixel detail. In the olden days, when zooming into a texture, the square pixels just got larger because nearest neighbor filtering was used. In modern times, when looking at the space between pixel values, bilinear interpolation is used to fill in the details better than nearest neighbor does.

You can describe bilinear interpolation as interpolating two values across the x axis and interpolating between the results across the y axis (reversing the order of axes also works). Mathematically, that can look like this:

$z = (A(1-x) + Bx)(1-y) + (C(1-x)+Dx)y$

Where x and y are values between 0 and 1 describing where the point is between the pixels, and A,B,C,D are the values of the 4 nearest pixels, which form a box around the point we are calculating. A = (0,0), B = (1,0), C = (0,1) and D = (1,1).

With some algebra, you can get that equation into a power series form which is going to be easier to work with in our experiments:

$z = (A-B-C+D)xy + (B-A)x + (C-A)y + A$

Now that we have our formula, we can begin! 🙂

Sampling Along Other Lines

So, if we sample along the diagonal from A to D, we know that we get a quadratic equation out. What happens if we sample along other lines though?

My guess before I knew the answer to this was that since the 45 degree angle line is quadratic (degree 2), and that horizontal and vertical lines were linear (degree 1), that sampling along other lines must be a fractional degree polynomial between 1 and 2. It turns out that isn’t the answer, but I wonder if there’s a way to interpret the “real answer” as a fractional polynomial?

Anyways, wikipedia clued me in: https://en.wikipedia.org/wiki/Bilinear_interpolation#Nonlinear

The interpolant is linear along lines parallel to either the x or the y direction, equivalently if x or y is set constant. Along any other straight line, the interpolant is quadratic

What that means is that if you walk along a horizontal or vertical line, it’s going to be linear. Any other line will be quadratic.

Let’s try it out.

Remembering that the equation for a linear function is $y=mx+b$ let’s literally replace $y$ with $mx+b$ and see what we get out.

So, we start with the power series bilinear interpolation polynomial:

$z = (A-B-C+D)xy + (B-A)x + (C-A)y + A$

Which becomes this after substitution:

$z = (A-B-C+D)x(mx+b) + (B-A)x + (C-A)(mx+b) + A$

After some expansion and simplification we get this:

$z = (Am-Bm-Cm+Dm)x^2+(Ab-Bb-Cb+Db+Cm-Am+B-A)x+Cb-Ab+A$

This formula tells us the value we get if we have a bilinear interpolation of values A,B,C,D (aka a bilinear surface defined by those points), and we sample along the x,y line defined by $y=mx+b$.

It’s a very generalized function that’s hard to reason about much, but one thing is clear: it is a quadratic function! Whatever constant values you choose for A,B,C,D,m and b, you will get a quadratic polynomial (or lower degree, but never higher).

Here’s a shadertoy that shows curves generated by random sub pixel line segments on a random (white noise) RGB texture: https://www.shadertoy.com/view/XstBz7

(note that the rough edges of the curve are due to the fact that interpolation happens in X.8 fixed point format, so has pretty limited precision. Check the paper for more information and ways to address the issue.)

Let’s explore a bit by plugging in some values for $m$ and $b$ and see what happens for different types of lines.

m=0, b=0

Let’s see what happens when m is 0 and b is 0. In other words, lets see what happens when we sample along the line $y=0$.

Plugging those values in gives:

$z = (B-A)x + A$

interestingly, this is just a linear interpolation between A and B, which makes sense when looking at the graph of where we are sampling on the bilinear surface.

This goes along with what wikipedia told us: when one of the axes is constant (it’s a horizontal or vertical line) the result is linear.

m=1, b=0

Let’s try m = 1 and b = 0. That is the line: $y=x$. This graph shows where that is sampling from on the bilinear surface:

Plugging in the values gives us this equation:

$z = (A-B-C+D)x^2+(C+B-2A)x+A$

We get a quadratic out! This shouldn’t be too surprising. This is the original insight in the technique. This is also the formula for a quadratic Bezier curve with control points $A$, $(B+C)/2$, $D$.

m=2, b=1

Let’s try the line $y=2x+1$. Here’s the graph of where we are sampling on the bilinear surface:

Plugging in the values give us the equation:

$z = (2A-2B-2C+2D)x^2+(C+D-2A)x+C$

Once again we got a quadratic function when sampling along a line.

You might think it’s strange that the equation ends it “+C” instead of “+A”, but if you look at the graph it makes sense. The line literally starts at C when x is zero.

x=2u, y=3u

In the above examples we are only modifying the y variable, to be some function of x. What if we also want to modify the x variable?

One way to do this is to make a 3rd variable $u$ that goes from 0 to 1. Then we can make $x$ and $y$ be based on that variable.

Let’s see what happens when we use these two equations:

$y=2u$

$x=3u$

That makes us sample this line on the bilinear surface.

Plugging the functions of u in for x and y we get:

$z = (6A-6B-6C+6D)u^2+(2B+3C-5A)u+A$

It’s still a quadratic!

So we now know that when moving along a straight line on a bilinear surface, that you will get a quadratic function as output, except in the case of the line being horizontal or vertical. Note: if the bilinear surface is a plane, all lines on that surface will be linear functions, so this is another way to get a linear result. It could also be degenerate and give you a point result. You will never get a cubic result (or higher) when going along a straight line though.

What would happen though if instead of sampling along straight lines, we sampled on other shapes, like quadratic curves?

y=x*x

Let’s start with the function $y=x^2$. The path that is sampled is:

Going back to the power series form of bilinear interpolation, let’s plug $x^2$ in for y and see what we get out.

The starting equation:

$z = (A-B-C+D)xy + (B-A)x + (C-A)y + A$

becomes:

$z = (A-B-C+D)x(x^2) + (B-A)x + (C-A)(x^2) + A$

Which becomes:

$z = (A-B-C+D)x^3 + (C-A)x^2 + (B-A)x + A$

It’s a cubic equation!

Here is a shadertoy which follows this sampling path on random pixels: https://www.shadertoy.com/view/4sdBz7

Something neat about this example specifically is that a cubic equation has 4 coefficients, which are basically 4 control points. This example makes use of the values of the 4 pixels involved to come up with the 4 coefficients, so “doesn’t leave anything on the table” so to speak.

This is unlike sampling along line segments where you have 3 control points stored in 4 pixel values. One is a bit redundant in that case.

You can make use of that fact (I have for instance!), but sampling along a quadratic path to get a cubic curve feels like a natural fit.

x=u*u, y=u*u

Let’s see what happens when we move along both x and y quadratically.

Just like in the linear case, we have our 3rd variable u that goes from 0 to 1 and we have x and y be based on that variable. We will use these equations:

$x=u^2$

$y=u^2$

The sampling path looks like this:

When we plug those in we get this quartic function:

$z = (A-B-C+D)u^4 + (B+C-2A)u^2 + A$

You might be surprised to see what looks like a linear path. It’s just because at all times, x is the same value as y, even though they travel down the line non linearly.

Higher Order Curves: x=3u^2, y=2u^4

Let’s get a little more wild, using these equations:

$x=3u^2$

$y=u^4$

Which makes a sampling path of this:

Plugging in the equations, the bilinear interpolation equation:

$z = (A-B-C+D)xy + (B-A)x + (C-A)y + A$

becomes a hexic equation:

$z = (3A-3B-3C+3D)u^6 + (C-A)u^4 + (3B-3A)u^2 + A$

The shadertoy visualizes it on random pixels as per usual, but with u going from 0 to 1, it means that x goes from 0 to 3 (y is still 0-1), which makes some obvious discontinuities at the boundaries of pixels. In our pure math formulation, we wouldn’t have any of those, but since we are sampling a real texture, when we leave the safety of our (0,1) box, we enter a new box with different control points. https://www.shadertoy.com/view/4dtfz7

Trigonometric Function: y = sin(2*pi*x)

Let’s try $y=sin(2\pi x)$, which takes this path on the bilinear surface:

The bilinear interpolation equation becomes a trigonometric polynomial:

$z = (A-B-C+D)x*sin(2\pi x) + (B-A)x + (C-A)*sin(2\pi x) + A$

That has disconuities in it when texture sampling again, due to leaving the original pixel region, so here’s a better looking shadertoy, which is for $y=sin(2\pi x)*0.5+0.5$. It scales and shifts the y values to be between 0 and 1. https://www.shadertoy.com/view/4stfz7

Circle

Lastly, here’s sampling on a circle.

$x=sin(2 \pi u)*0.5+0.5$

$y=cos(2 \pi u)*0.5+0.5$

It follows this path:

Plugging the functions into the power series bilinear equation gives:

$z = (A-B-C+D)*(sin(2 \pi u)*0.5+0.5)*(cos(2 \pi u)*0.5+0.5) + (B-A)*(sin(2 \pi u)*0.5+0.5) + (C-A)*(cos(2 \pi u)*0.5+0.5) + A$

Something neat about sampling in a circle is that it’s continuous – note how the left side of the curves line up with the right side seamlessly. That seems like a pretty useful property.

Moving On

We went off into the weeds a bit, but hopefully you can see how there are a ton of possibilities for encoding and decoding data in a very small number of pixels by carefully crafting the path you sample along.

Compared to the simple “sample along the diagonal” technique, there is some added complexity and shader instructions though. Namely, any work you do to modify x or y before passing them to the linear texture interpolator needs to happen in shader code. That means this technique takes more ALU, but can mean it takes even less texture memory than the other method.

The last question from the top of the post is “What does this all mean in higher dimensional interpolation, like trilinear or quadrilinear?”

Well, it works pretty much the same was as bilinear but there are more dimensions to work with.

We saw that in 2 dimensional bilinear interpolation that when we made x and y be functions (either of each other, or of a 3rd variable u), that the resulting polynomial had a degree that was the degree of x plus the degree of y.

In 3 dimensions with trilinear interpolation, the resulting polynomial would have a degree that is the degree of x, plus the degree of y, plus the degree of z.

In 4 dimensions with quadrilinear, add to that the degree of w.

Let’s consider the case when we don’t want a single curve though, but want a surface or (hyper) volume.

As we’ve seen in the extension dealing with surfaces and volumes, if you have a degree N polynomial, you can break it apart into a multivariate polynomial (aka a surface or hyper volume) so long as the sum of the degrees of each axis adds up to N.

It’s basically what we were just talking about but in reverse.

One thing I think would be interesting to explore further would be to see what the limitations are when you take this “too far”.

For instance, a 2×2 texture can give you a quadratic if you sample along any straight line in the uv coordinates. If you first put the u coordinate through a cubic function, and put the v coordinate through a different cubic function, I think you should be able to make a bicubic surface.

The surface will be constrained to a subset of what a general bicubic surface is able to be shaped like, but you will get a bicubic surface. (basically there will be implicit control points that you don’t have control over unless you add more pixels, and do more sampling, or higher dimensional linear interpolation)

I’d like to see what the constraints there are and see if there’s any chance of getting any real use out of something like that.

Anyhow, thanks for reading! Any ideas, corrections, usage cases you have, whatever, hit me up!

@Atrix256

Prefix Sums and Summed Area Tables

Prefix sums and summed area tables let you sum up regions of arrays or grids in constant time.

If that sounds like it might not have many uses, that is another way of saying that it does discrete integration in constant time, and can also be made to do some kinds of convolution.

These things come up quite a bit in game development and graphics so is pretty interesting for things like depth of field, glossy reflections, and maybe image based lighting. Check the links at the end of the post to see these things in action in some pretty interesting ways.

One Dimension – Prefix Sums

Say that you have 10 numbers:

$\begin{array}{|l|c|c|c|c|c|c|c|c|c|c|} \hline \textbf{index} & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} & \textbf{6} & \textbf{7} & \textbf{8} & \textbf{9} \\ \hline \textbf{value} & 8 & 3 & 7 & 4 & 12 & 6 & 4 & 10 & 1 & 2 \\ \hline \end{array}$

To sum up numbers in a given range you have to manually add up the numbers in that range.

Summing the numbers at index 2 through 5 inclusively takes 3 adds and gives you the answer 29. (index 2 + index 3 + index 4 + index 5)

Summing the numbers at index 0 through index 9 inclusively (the whole table) takes 9 adds to get the answer 57.

Interestingly there is a way to preprocess this data such that summing any range takes only a single subtraction. The technique is called a prefix sum table and you make the table by having the number at each index be the sum from index 0 to that index inclusively.

Here is the prefix sum table for the numbers above:

$\begin{array}{|l|c|c|c|c|c|c|c|c|c|c|} \hline \textbf{index} & \textbf{0} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} & \textbf{6} & \textbf{7} & \textbf{8} & \textbf{9} \\ \hline \textbf{value} & 8 & 11 & 18 & 22 & 34 & 40 & 44 & 54 & 55 & 57 \\ \hline \end{array}$

Now, to find the sum of range a to b inclusively, you start with the value at index b, and subtract the value at index (a-1).

So, to sum the numbers at index 2 through 5 like we did before, we’d start with the value at index 5 which is 40, and we subtract the value at index (2-1) aka index 1, which is 11. That gives us a result of 29 like our manual summing did before.

To sum the numbers at index 0 through index 9, we’d start with the value at index 9, which is 57, and subtract the value at index -1. Since we don’t have anything before index 0, the sum for anything before index 0 is 0. That makes our result be 57-0 or 57, which we calculated before.

Let’s move on to 2D!

Two Dimensions – Making a Summed Area Table

In two dimensions, the same technique is called a summed area table, and things get only a little more complicated.

You start with a 2d grid of values like this:

$i = \begin{array}{|c|c|c|c|} \hline 3 & 2 & 1 & 8 \\ \hline 9 & 11 & 15 & 0 \\ \hline 8 & 4 & 7 & 6 \\ \hline 12 & 7 & 8 & 3 \\ \hline \end{array}$

Then you make a grid of the same size, where the value at a location is the sum of all the values in the rectangle going from (0,0) to (x,y) inclusive. Assuming that (0,0) is in the top left, that would give us this summed area table:

$I = \begin{array}{|c|c|c|c|} \hline 3 & 5 & 6 & 14 \\ \hline 12 & 25 & 41 & 49 \\ \hline 20 & 37 & 60 & 74 \\ \hline 32 & 56 & 87 & 104 \\ \hline \end{array}$

You can literally sum up all the values for each index to make the table if you want to, but you can also use this formula which lets you iteratively create the table by starting at (0,0) and expand outwards from there. As before, when reading out of bounds values, just use zero.

$I(x,y)=i(x,y)+I(x,y-1)+I(x-1,y)-I(x-1,y-1)$

Two Dimensions – Using a Summed Area Table

So we know that $I(x,y)$ is the sum of all the values in the rectangle from $(0,0)$ to $(x,y)$ inclusively, but what if we want to find the sum of a different rectangle? What if we have 4 points A,B,C,D and we want to know the sum of the numbers within that sub-rectangle?

With some cleverness we can calculate the sum inside this exact region.

First we get the value at point D, which gives us the sum of this rectangle:

Next, we subtract the value at point B, which gives us the sum of this rectangle:

The next step is to subtract the value at point C. The red area is a problem though as it has been subtracted out twice.

This is a problem that’s easily solved by adding the value at point A in, to give us our final result:

So, to summarize, using a summed area table to get the sum of all values in the rectangle defined by the points A,B,C,D is done by reading the values at points A,B,C,D and calculating: A+D-B-C

Storage Costs

When you want to store numbers added together, you are going to need storage larger than what you are storing the numbers in.

For instance, if you have the table below using 3 bits per value:
$I = \begin{array}{|c|c|} \hline 7 & 7 \\ \hline 7 & 7 \\ \hline \end{array}$

Turning that into a summed area table, you are going to hit overflow problems:
$I = \begin{array}{|c|c|} \hline 7 & 6 \\ \hline 6 & 4 \\ \hline \end{array}$

For summing up N items, you need $log_2{(N)}$ more bits of storage which means we would need 2 more bits of storage in this case for the 2×2 grid (4 samples), making it be 5 bits total per value (3 bits of storage + 2 extra bits to hold the sum of 4 values). That would let us store the proper table:

$I = \begin{array}{|c|c|} \hline 7 & 14 \\ \hline 14 & 28 \\ \hline \end{array}$

On twitter, Sam Littlewood (https://twitter.com/samlittlewood) shared some interesting info with me:

Using the previously shown 2×2 table of 3bit 7’s as an example, what this means is that if you are only ever going to want to ask about 1×1 ranges (which is pointless to use summed area tables for, but makes a nice simple example), you don’t need 2 extra bits, and in fact don’t need any extra bits in this case since a 1×1 range is just 1 sample, and $log_2{(1)}$ is 0.

Looking back at the summed area table that had roll over problems:
$I = \begin{array}{|c|c|} \hline 7 & 6 \\ \hline 6 & 4 \\ \hline \end{array}$

Let’s ask about the range (1,1) to (1,1). So we start with the value at index (1,1) which is 4. Next we add in the value at index (0,0) which is 7 and get 11. Keeping that in 3 bits (eg mod 8), that gives us a value of 3. Next we subtract the value at index (0,1) aka 6, which keeping it in 3 bits gives us 5. Subtracting index (1,0) from that (6 again) and keeping it in 3 bits gives us 7.

So, the sum of the numbers from (1,1) to (1,1) – aka the VALUE in the original table at (1,1) – is 7. Since we made the table, we know this is correct.

It works interestingly!

If we did a 2×2 lookup instead, it would fall apart. we’d need those 2 extra bits since we’d be summing 4 samples, and $log_2{(4)}$ is 2.

So, just to re-iterate… summed area tables do need increased storage per data item to store the sums. However, while most descriptions base that increased storage on the size of the image being made into a summed area table, it is actually based on the largest range you want to sum from that table, which may be smaller than the total size.

I have an idea I’d like to try (next blog post?) where instead of storing the sum of the rectangle at each position, you store the sum divided by the area. In other words, you store the average value for the rectangle.

Calculating the sum for a specific rectangle then becomes getting the 4 values, multiplying by their area, and then doing the usual math.

Apparently this is similar to an idea of using floating point numbers in SAT, which also sounds interesting! Thread from Bart Wronski (https://twitter.com/BartWronsk):

While my idea is similar to using floating point, a handful of people (especially Tom Forsyth! https://twitter.com/tom_forsyth) have made sure I know that using floating point with large textures (~screen sized and above) is not a good idea.

Tom says:
“The entries in the bottom-right of the table start having very similar magnitudes, so the difference between them is very noisy. This is super obvious with float16s where you only have 10 bits of precision, which is less than most current screen widths.”

Other Stuff

Bilinear Interpolation
If you are wondering whether you should use bilinear interpolation when using this technique (sample between pixels) or not, the answer is that you should. Bilinear interpolation is compatible with this technique and gives you the correct values for sub pixel sample points.

Higher Dimensions
This technique extends to 3 dimensions and beyond. The table still contains the sum of the numbers for the (hyper)rectangle from the origin to that specific index. The way you calculate the sum of a specific range is different in each dimension, but it’s similar, and you should be able to figure it out using the logic described in the 2d case!

Integrating / Summing Over Other Shapes

I had a thought on this that might not be so bad.

My thought was that if you had some shape you wanted to sum values over (aka integrate values over), that you could sum over the bounding box of the shape, divide by the area of the bounding box to get an average sum per unit for that area, and then multiply by the area of the shape you want to sum over.

This makes the assumption that the bounding box is representative of the data inside of the shape, so that makes this an approximation, but it might be good enough depending on your needs.

You might even try having a couple different summed area tables made from rotated versions of the image. That would allow you to get a tighter fitting bounding box in some situations.

I’m definitely not the first to think about how to do this though, and this is not the only way to do it. There is a link in the next section that talks about a different way to do it “Fast and Exact Convolution With Polygonal Filters” that also references a few other ways to do it.

Uses in Graphics / Other Links

Here is the paper from Franklin Crow in 1984 that introduces summed area tables as a way to get box filtered mipmapping on the fly without having to generate mipmaps in advance:

Here is a neat paper that talks about how to generate summed area tables efficiently on the GPU, and some interesting ways to use them for things like depth of field, glossy reflections, and refraction through frosted glass:

Here are some great reads from Fabien Giesen (https://twitter.com/rygorous) on doing fast blurs when the radius is very large. The second post also shows you how to do repeated box blurs to get tent filters, quadratic filters, cubic, etc and how they tend towards Gaussian. I’m sure there is some way to mix this concept with summed area tables to get higher order filters, but I haven’t found or worked out the details yet.
https://fgiesen.wordpress.com/2012/07/30/fast-blurs-1/
https://fgiesen.wordpress.com/2012/08/01/fast-blurs-2/

Here are some blog posts I made up explaining and demonstrating box blurs and Gaussian blurs:
https://blog.demofox.org/2015/08/18/box-blur/
https://blog.demofox.org/2015/08/19/gaussian-blur/

Bart also shared these really interesting links

“Fast and Exact Convolution With Polygonal Filters”
https://www.researchgate.net/publication/269699690_Fast_and_Exact_Convolution_with_Polygonal_Filters

“Fast Filter Spreading and its Applications”

“Filtering by repeated integration”
https://www.researchgate.net/publication/220721661_Filtering_by_repeated_integration

“Cinematic Depth Of Field: How to make big filters cheap”