Posts TODO

A place to keep track of blog posts I’d like to do:

Chebyshev curve fitting / interpolation – with simple working c++ code. Possibly rational chebyshev too. Chebyshev apparently is optimal for polynomial interpolation.
https://www.embeddedrelated.com/showarticle/152.php

Optimistic concurrency (in databases). Select data and version id per row. Update versionid = versionid+1, blah where blah and version id = version id. If rows affected is 0, that means something else beat you to the punch and you can deal with it however.

Cordic math. Every iteration in a loop gives you another bit of precision, since it’s basically a binary search.

2D SDFs for vector graphics – using modulus for “free repeating”. anti aliasing. use your shadertoy verlet physics game as an example?

Verlet Physics Keep last and current position to get implicit velocity. Simulate. Iterative constraint solving. Things “just work” pretty well.

Minkowski Portal Refinement A nice & simple algorithm for collision detection. Maybe talk about algorithm to get depth. Mention JGK, possibly do that after.

Deterministic Simulations using deterministic sim to eg decrease network traffic.

Quick Math: phi / goden ratio show how golden ratio conjugate is the same number past the decimal point. show how this is the only number that can do that. The main point being “i remember this fact, but don’t remember the number”. This fact lets you calculate the number.

Quick math: eulers constant show how e^x being the derivative (and integral) can only work for e. The main point being “i remember this fact, but don’t remember the number”. This fact lets you calculate the number.

Ear clipping – for turning polygons into triangles. extendable to 3d with tetrahedron clipping, and to higher dimensions as well.

Storageless Shuffle With Weights – this is like if you have 3 red marbles and 5 blue marbles, how would you use FPE to storagelessly shuffle them.

Recurrent neural networks (etc) for “time series” learninghttps://twitter.com/Peter_shirley/status/1066832031043149824?s=03

Markov Chain Monte Carlo – Eg. for decryption maybe try 2nd order or higher chains.
Maybe also try with rendering / numerical integration http://statweb.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf

Blue Noise AO – It’s common to use white noise for sampling and also sample rotation. Start from there and show how to use blue for sampling and also rotation!
https://learnopengl.com/Advanced-Lighting/SSAO
http://john-chapman-graphics.blogspot.com/2013/01/ssao-tutorial.html

Other blue noise usage cases – specific usage cases with easy to follow implementations
* fog shafts
* shadows (pcf)
* reflections
* dithering

Audio Stuff

Biquad – a better frequency filter

Compressor & Limiter – automatic volume adjustment to eg avoid clipping. Include “side chain” stuff.

How To: Data Lookups Via Raytracing

Real time raytracing is seemingly finally here. We have a real time raytracing API in directX, another API in vulkan, hardware support in nvidia cards, and this is only the beginning of this new phase of graphics.

Having hardware and API support for raytracing means we can use that to help with the usual raytraced graphics things – reflection, refraction, shadows and more – but it is also new hardware / API to abuse.

For instance, Inigo Quilez talks about how ray / triangle intersection could be used to do 3×3 linear equation solving: http://www.iquilezles.org/blog/?p=4666

In a similar style of hardware/API abuse (but not related to raytracing), I have shown how to make the linear texture interpolator calculate points on curves and surfaces when storing the control points in texels, and also showed how it can evaluate generic polynomials: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

In the not too distant future, I think it will be common to use raytracing for data lookups in real time graphics, much like we use textures currently. In fact, from a couple conversations I’ve had with folks on twitter, it seems as though some people are already doing this.

As strange as it sounds, raytracing has a significant advantage over texture lookups. A texture lookup is limited to data stored in a regular grid in pixels. Raytracing data lookups get their data from a mesh, with data stored in vertices (in a vertex buffer).

Storing data in the vertices of a mesh means that you can store data points wherever you want. If you need more detail in one area and less in another, you can put more vertices in the higher detail area, and fewer vertices in the low detail area. You could also store data in a blue noise sampling pattern to help fight aliasing problems you might have with the regular grid of a texture. Furthermore, you could actually have holes in your data for invalid data regions, by having holes in the mesh.

Essentially, a mesh is just a generalization of a texture. You are no longer locked to a grid!

How the data lookups are actually done is not too complex either.

For the 2d case where you have a function f(x,y), you would make a triangle mesh where the (x,y) position of each vertex was the location of a data point, and you would make the z value some constant such as 0.5.

To look up the value of the data for some (x,y) input, you could make a ray that started at (x,y,0) and went in direction (0,0,1). When you did your raytrace, you’d get as a result the triangle index and the barycentric coordinates of that triangle. From there you could look up the data from the 3 vertices of the triangle and use the barycentric coordinates to interpolate between the values. The interpolated value would be your result.

You can see how this process goes in the image below. The purple dot is the query location.

Just as there are volume textures to store 3d data in 3d textures, raytraced data lookups can also be extended to 3d.

For the 3d case where you have a function f(x,y,z), you would make a tetrahedral mesh where the (x,y,z) position of each vertex was the location of a data point.

To look up the value of the data for some (x,y,z) input, you need to be able to find what tetrahedron the point is in. To do this, you just shoot a ray starting at that (x,y,z) position in any arbitrary direction. That ray will hit one triangle in the tetrahedron. You then shoot a ray from the (x,y,z) position in the opposite direction to get a different triangle in the tetrahedron.

From these two triangles, you’ll have 6 vertices but only 4 will be unique. Those 4 vertices are the vertices of the tetrahedron. You can read the data from the vertices, calculate the barycentric coordinates of the point inside the tetrahedron, and then use those to interpolate the vertex data to get the result.

You can see how this process goes in the image below. The purple dot is the query location again.

The most immediate usage case I can think of for this technique would be for diffuse light probe grids. Whether you had a 2d or 3d light probe grid, you’d be able to make probes as dense as needed, or as sparse as you can get away with, in different sections of the geometry. You could also make holes in the mesh to make sure data didn’t interpolate through walls, leading to light leaking. You would use the techniques described above to interpolate the simplex data and get the result.

As this data is likely going to be relatively simple geometry compared to something like an actual game asset, it seems like it ought to be able to be pretty performant too.

Nathan Reed shared a really good idea on twitter, for doing the 3d lookup with only a single raytrace. The idea is that when you knew what triangle you hit, you could look in a table to get the fourth vertex based on whether you hit the triangle from the front or the back. One way to do this would be to have a buffer that had two vertex indices per triangle. The first index would be if you hit from positive, the second would be if you hit from negative.

That way, the index you’d look at would be [triangleIndex*2 + hitBackSide ? 1 : 0]. That data lookup ought to be a lot cheaper than a second raytrace!

Thread:

Can using raytracing to do data lookups extend to 4d and beyond? Probably, but I’m not sure how. Do you know how, or have any interesting usage cases? Share if so, it’d be interesting to hear πŸ™‚

PS – Apparently some folks are using raytracing for GPU physics. I haven’t heard any details of how other people are doing it, but I am looking forward to getting a chance to try it myself. I’m thinking Verlet physics of particles with constraints. That amounts to only needing the current and previous particle positions to get an implicit velocity, and then doing small incremental constraint solving steps to try and make things keep their shape etc. The end result is something like screen space particles / screen space physics, except it would have knowledge of the entire scene, whereas screenspace techniques only have knowledge of the gbuffer. I’ve heard that short ray trace queries run a lot faster (20x?) by not needing to traverse the acceleration structure (BVH) as widely. With luck I’ll give it a try and write a post up about it before too long.

Not All Blue Noise is Created Equal

Some ways I know of creating blue noise distributed samples are:

To be honest, the void and cluster algorithm is a “top tier” algorithm while filtering white noise is just kind of a hack, and Mitchell’s best candidate algorithm is decent, simple but a bit out dated too.

Let’s look at some 128×128 blue noise textures created via void and cluster (top) and white noise filtering (bottom). The images on the right are the frequencies of the images (DFT magnitude). Since blue noise is high frequency noise, having a darker middle means a higher quality result.

Note: the white noise filtering used 25 iterations and a sigma of 1.5 for the blurring.

They look pretty similar don’t they? It turns out they are actually pretty different which I found really surprising when I was told. I had to see this for myself.

Below we threshold both images to 10%. What I mean by that is that if we consider black to be 0 and white to be 1, we make an image where there is a black dot if the color is < 0.1, else we make a white dot.

Void and cluster is top, and white noise filtering is middle. On the bottom is blue noise sample points generated with a discrete version of Mitchell's best candidate algorithm.



As you can see, the filtered white noise has already fallen apart for our purposes. It's basically unusable for this usage case. Mitchell is doing fairly ok though.

Let's move on to 20%:


Mitchell is gaining some low frequencies (it isn't as dark in the middle) but the filtered white noise is starting to look a tiny bit better.

Here are the rest up to 90%:

30%:


40%:


50%:


60%:


70%:


80%:


90%:


So, void and cluster beat the pants off the other two methods.

Filtered white noise used for this purpose is no good and basically fell completely apart.

Mitchell was decent until the sample density got too high and then it failed. There are some parameters to tune with this algorithm so it's possible that it could do better, but in general the algorithm does poorly for high point densities. As an alternative, above 50% density, you could perhaps invert the colors and treat it as 100-density so that it was always working against < 50% density. Even at 50% density, it isn't that great though, at least with my setup.

shadertoy.com recently got a blue noise texture and luckily the blue noise texture was made with the void and cluster algorithm, so it's "the good stuff".

Another family of algorithms to generate blue noise are based on constrained Voronoi diagrams and relaxation to evolve starting sample points to be more blue. Those are good for generating specific point sets for a specific density, but differ from void and cluster which are designed to make a texture that works well for any density.

There are other algorithms out there as well with different properties, and new ones coming out all the time. SIGGRAPH is starting right now and I bet at least one or two new blue noise algorithms are shown πŸ˜›

Have any interesting blue noise info to share? I'd love to hear it! It feels like the rabbit hole here is a lot deeper than it seems.

Tiled Blue Noise

Blue noise textures tile well, but how well exactly? This page is a quick reference to answer that question.

I started with some blue noise textures made with the void and cluster algorithm that you can download here: http://momentsingraphics.de/?p=127

I took a 16×16, 32×32, 64×64, 256×256 and tiled each of those to make 512×512 images.

Here are the source RGBA images:




Here are the output images, where I used only the R channel of each image.

16×16:

32×32:

64×64:

128×128:

256×256:

Something interesting to note is that blue noise is supposed to tile well by nature. It is made up of high frequencies only, which means there aren’t low frequency patterns that show up and are visible to the eye.

Here’s an interesting read showing how that can be used to make textures (art) that tile better:
https://www.gamasutra.com/view/feature/131482/the_power_of_the_high_pass_filter.php

Blending an HDR color into a U8 Buffer

I stumbled on something that I found interesting, so wanted to share in case it was useful for other people too.

The c++ code that generated this images can be found on github at https://github.com/Atrix256/U8HDRPMA

I was implementing Inigo Quilez’ “Better Fog” which is REALLY REALLY cool. It looks way better than even the screenshots he has on his page, especially if you have multiple types of fog (distance fog, height fog, fog volumes):
http://www.iquilezles.org/www/articles/fog/fog.htm

I first had it implemented as a forward render, so was doing the fogging in the regular mesh rendering shader, with all calculations being done in 32 bit floats, writing out the final result to a RGBAU8 buffer. Things looked great and it was good.

I then decided I wanted to ray march the fog and get some light shafts in, so it now became a case where I had a RGBAU8 color render target, and I had the depth buffer that I could read to know pixel world position and apply fog etc.

The result was that I had a fog color that has an HDR fog color (it had color components greater than 1 from being “fake lit”) and I knew how opaque the fog was, so I just needed to lerp the existing pixel color to the HDR fog color by the opacity. The usual alpha blending equation (The “over” operator) is actually a lerp so I tried to use it as one.

Source Blend: Source Alpha
Dest Blend: 1 – Source Alpha
Operation: Add

That becomes this, which is the same as a lerp from DestColor to SrcColor using a lerp amount of SourceAlpha.

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{SrcColor} * \text{SrcAlpha}

BAM, that’s when the problem hit. My image looked very wrong, but only where the fog was thickest and brightest. I was thinking maybe it how i was integrating my fog but it wasn’t. So maybe it was an sRGB thing, but it wasn’t. Maybe it was how i was reconstructing my world position or pixel ray direction due to numerical issues? It wasn’t.

This went on and on until i realized: You can’t say “alpha blend (1.4, 0.3, 2.4) against the color in the U8 buffer using an alpha value of 0.5”. The HDR color is clamped before the alpha blend and you get the wrong result.

You can’t alpha blend an HDR color into a U8 buffer!

… or can you?!

Doing It

As it turns out, premultiplied alpha came to the rescue here, but let’s look at why. As we go, we are going to be modifying this image:

Mathematically speaking, alpha blending works like this:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{SrcColor} * \text{SrcAlpha}

Using the X axis as alpha, and an overlaid solid color of (1.6, 1.4, 0.8), that gives us this:

However, if you output a float4 from your shader that is \text{float4}(\text{SrcColor}, \text{SrcAlpha}), alpha works like the below, where \text{sat}() clamps values to be between 0 and 1:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{sat}(\text{SrcColor}) * \text{SrcAlpha}

So what happens, is that SrcColor gets clamped to be between 0 or 1 before the lerp happens, which makes the result much different:

However, using pre-multiplied alpha changes things. The float4 we return from the shader is now \text{float4}(\text{SrcColor*SrcAlpha}, \text{SrcAlpha}).

Our blend operations are now:

Source Blend: One
Dest Blend: 1 – Source Alpha
Operation: Add

That makes the blending equation become this:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{sat}(\text{SrcColor} * \text{SrcAlpha}) * 1

The \text{sat()} function changed to encompass the whole second term, instead of just SrcColor! That gives this result that matches the one we got when we did the lerp in shader code:

Quick Math

So visually things look fine, but let’s look real quick at the math involved.

If you lerp from 0.5 to 10.0 with a lerp factor of 0.2, you’d get 2.4. The equation for that looks like this:

0.5 * 0.8 + 10.0 * 0.2 = 0.4 + 2.0 = 2.4

This is what happens when doing the math in the forward rendered shader. You then write it out to a U8 buffer, which clips it and writes out a 1.0.

If you use alpha blending, it clamps the 10 to 1.0 before doing the lerp, which means that it lerps from 0.5 to 1.0 with a lerp factor of 0.2. That gives you a result of 0.6 which is VERY incorrect. This is why the HDR color blending to the U8 buffer didn’t work.

If you use premultiplied alpha blending instead, it clamps the 10.0*0.2 to 1, which means that it was 2 but becomes 1, and the result becomes 1.4. That gets clipped to 1.0 so gives you the same result as when doing it during the forward rendering, but allowing you to do it during a second pass.

0.5 * 0.8 + \text{sat}(10.0 * 0.2) = 0.4 + \text{sat}(2.0) = 0.4 + 1.0 = 1.4

This doesn’t just work for these examples or some of the time, it actually works for all inputs, all of the time. The reason for that is, the second term of the lerp is clipped to 0 to 1 and is added to the first term which is always correct. Both terms are always positive. That means that the second term can add the full range of available values (0 to 1) to the first term, and it is correct within that range. That means this technique will either give you the right answer or clip, but will only clip when it is supposed to anyways.

Closing

While I found this useful in a pinch, it’s worth noting that you may just want to use an HDR format buffer for doing this work instead of working in a U8 buffer. The reason why is even though this gives the same answer as doing the work in the shader code, BOTH implementations clip. That is… both implementations SHOULD be writing out values larger than 1.0 but the colors are clamped to being <= 1.0. This is important because if you are doing HDR lit fog (and similar), you probably want to do some sort of tone mapping to remap HDR colors to SDR colors, and once your colors clip, you've lost information that you need to do that remapping.

The red pixels below show where clipping happens:

Pathtraced Depth of Field & Bokeh

Let’s say you have a path tracer that can generate an image like this:

Adding depth of field (and bokeh) can make an image that looks like this:

The first image is rendered using an impossibly perfect pinhole camera (which is what we usually do in roughly real time graphics, in both rasterization and ray based rendering), and the second image is rendered using a simulated lens camera. This post is meant to explain everything you need to know to go from image 1 to image 2.

There is also a link to the code at the bottom of the post.

We are going to start off by looking at pinhole cameras – which can in fact have Bokeh too! – and then look at lens cameras.

If you don’t yet know path tracing basics enough to generate something like the first image, here are some great introductions:

Pinhole Camera

A pinhole camera is a box with a small hole – called an aperture – that lets light in. The light goes through the hole and hits a place on the back of the box called the “sensor plane” where you would have film or digital light sensors.

The idea is that the aperture is so small that each sensor has light hitting it from only one direction. When this is true, you have a perfectly sharp image of what’s in front of the camera. The image is flipped horizontally and vertically and is also significantly dimmer, but it’s perfectly sharp and in focus.

As you might imagine, a perfect pinhole camera as described can’t actually exist. The size of the hole is larger than a single photon, the thickness of the material is greater than infinitesimally small, and there are also diffraction effects that bend light as it goes through.

These real world imperfections make it so an individual sensor will get light from more than one direction through the aperture, making it blurrier and out of focus.

Reality is pretty forgiving though. Pinhole cameras that give decent results can be made easily, even with simple materials laying around the house (http://www.instructables.com/id/How-To-Make-A-Pinhole-Camera/).

You can even go deeper and make your own fairly high quality pinhole camera if you want: https://www.diyphotography.net/the-comprehensive-tech-guide-to-pinhole-photography/

As far as aperture size goes, the smaller the aperture, the sharper the image. The larger the aperture, the blurrier the image. However, smaller apertures also let in less light so are dimmer.

This is why if you’ve ever seen a pinhole camera exhibit at a museum, they are always in very dark rooms. That lets a smaller aperture hole be used, giving a sharper and more impressive result.

When using a pinhole camera with film, if you wanted a sharp image that was also bright, you could make this happen by exposing the film to light for a longer period of time. This longer exposure time lets more light hit the film, resulting in a brighter image. You can also decrease the exposure time to make a less bright image.

Real film has non linear reaction to different wavelengths of light, but in the context of rendered images, we can just multiply the resulting colors by a value as a post effect process (so, you can adjust it without needing to re-render the image with different exposure values!). A multiplier between 0 and 1 makes the image darker, while a multiplier greater than 1 makes the image brighter.

It’s important to note that with a real camera, longer exposure times will also result in more motion blur. To counter act this effect, you can get film that reacts more quickly or more slowly to light. This lets you have the aperture size you want for desired sharpness level, while having the exposure time you want for desired motion blur, while still having the desired brightness, due to the films ISO (film speed).

For a much deeper dive on these concepts, here is a really good read:
https://www.cambridgeincolour.com/tutorials/camera-exposure.htm

While aperture size matters, so does shape. When things are out of focus, they end up taking the shape of the aperture. Usually the aperture is shaped like something simple, such as a circle or a hexagon, but you can exploit this property to make for some really exotic bokeh effects. The image at the top of this post used a star of David shaped aperture for instance and this image below uses a heart shape.

Here’s two articles that talk about how to make your own bokeh mask for custom bokeh shapes for physical cameras:
https://photorec.tv/2017/02/diy-heart-shaped-bokeh-valentines-day/ (The image above is from this article!)
https://www.diyphotography.net/diy_create_your_own_bokeh/

Ultimately what is happening is convolution between the aperture and the light coming in. When something is in focus, the area of convolution is very small (and not noticeable). As it gets out of focus, it gets larger.

The last property I wanted to talk about is focal length. Adjusting focal length is just moving the sensor plane to be closer or farther away from the aperture. Adjusting the focal length gives counter intuitive results. The smaller the focal length (the closer the sensor plane is to the aperture), the smaller the objects appear. Conversely, the larger the focal length (the farther the sensor plane is from the aperture), the larger the objects appear.

The reason for this is because as the sensor plane gets closer, the field of view increases (the sensor can see a wider angle of stuff), and as it gets farther, the field of view decreases. It makes sense if you think about it a bit!

Pinhole Camera Settings Visualized

In the below, focal length and aperture radius are in “World Units”. For reference, the red sphere is 3 world units in radius. The path traced image is multiplied by an exposure multiplier before being shown on the screen and is only a post effect, meaning you can change the exposure without having to re-render the scene, since it’s just a color multiplier.

Here is a video showing how changing focal length affects the image. It ranges from 0.5 to 5.0. Wayne’s world, party time, excellent!

These next three images show how changing the aperture size affects brightness. This first image has an aperture size of 0.01 and an exposure of 3000.

This second image has an aperture size of 0.001 and the same exposure amount, making it a lot sharper, but also much darker.

This third image also has an aperture size of 0.001, but an exposure of 300,000. That makes it have the same brightness as the first image, but the same sharpness as the second image.

If you are wondering how to calculate how much exposure you need to get the same brightness with one aperture radius as another, it’s not too difficult. The amount of light coming through the aperture (aka the brightness) is multiplied by the area of the aperture.

When using a circular aperture, we can remember that the area of a circle is \pi * \text{radius}^2.

So, let’s say you were changing from a radius 10 aperture to a radius 5 aperture. The radius 10 circle has area of 100\pi, and the radius 5 circle has an area of 25\pi. That means that the radius 5 circle has 1/4 the area that the radius 10 circle does, which means you need to multiply your exposure by 4 to get the same brightness.

In the case of moving from radius 0.01 to 0.001, we are making the brightness be 1/100 of what it was, so we multiply the 3,000 by 100 to get the exposure value of 300,000.

Here is a video showing how aperture radius affects the sharpness of the image. The exposure is automatically adjusted to preserve brightness. Aperture radius ranges from 0.001 to 0.2.

In the next section we’ll talk about how to make different aperture shapes actually function, but as far as brightness and exposure goes, it’s the same story. You just need to be able to calculate the area of whatever shape (at whatever size it is) that you are using for your aperture shape. With that info you can calculate how to adjust the exposure when adjusting the aperture size.

Here are some different aperture shapes with roughly the same brightness (I eyeballed it instead of doing the exact math)

Circle:

Gaussian distributed circle:

Star of David:

Triangle:

Square:

Ring:

Even though it’s possible to do bokeh with a pinhole camera as you can see, there is something not so desirable. We get the nice out of focus shapes, but we don’t get any in focus part of the image to contrast it. The reason for this is that pinhole cameras have constant focus over distance. Pinhole camera image sharpness is not affected by an object being closer or farther away.

To get different focus amounts over different distances, we need to use a lens! Before we talk about lenses though, lets talk about how you’d actually program a pinhole camera as we’ve described it.

Programming A Pinhole Camera With Bokeh

With the concepts explained let’s talk about how we’d actually program this.

First you calculate a ray as you normally would for path tracing, where the origin is the camera position, and the direction is the direction of the ray into the world. Adding subpixel jittering for anti aliasing (to integrate over the whole pixel) is fine.

At this point, you have a pinhole camera that has a infinitesimally small aperture. To make a more realistic pinhole camera, we’ll need to calculate a new ray which starts on the sensor plane, and heads towards a random point on the aperture.

Important note: the position of the aperture is the same as the camera position. They are the same point!

Calculating the Point on the Sensor Plane

We first find where the ray would hit the sensor plane if it were 1 unit behind the aperture (which will be a negative amount of time). We put that point into camera space, multiply the z of the camera space by the focal length (this moves the sensor plane), and then put it back into world space to get the actual world space origin of the ray, starting at the sensor plane.

To calculate the plane equation for the sensor plane, the normal for that plane is the camera’s forward direction, and a point on that plane is the camera position minus the camera’s forward direction. Calculating the equation for that plane is just:

sensorPlane.xyz = cameraForward;
sensorPlane.w = -dot(cameraForward, (cameraPos - cameraForward));

Note that xyzw are ABCD in the plane equation Ax+By+Cz+d=0.

You can then do this to find the point where the ray hits the sensor plane:

float t = -(dot(cameraPos, sensorPlane.xyz) + sensorPlane.w) / dot(rayDirection sensorPlane.xyz);
sensorPos= cameraPos + rayDirection  * t;

From there, you do this to adjust the focal length and to get the world space starting position of the ray:

// convert the sensorPos from world space to camera space
float3 cameraSpaceSensorPos = mul(float4(sensorPos, 1.0f), viewMtx).xyz;

// elongate z by the focal length
cameraSpaceSensorPos.z *= DOFFocalLength;

// convert back into world space
sensorPos = mul(float4(cameraSpaceSensorPos, 1.0f), invViewMtx).xyz;

Now we know where the ray starts, but we need to know what direction it’s heading in still.

Calculating the Random Point on the Aperture

Now that we have the point on the sensor, we need to find a random point on the aperture to shoot the ray at.

To do that, we first calculate a uniform random point in a circle with radius “ApertureRadius”, since the aperture is a circle. Here is some code that does that (RandomFloat01() returns a random floating point number between 0 and 1):

float angle = RandomFloat01(state) * 2.0f * c_pi;
float radius = sqrt(RandomFloat01(state));
float2 offset = float2(cos(angle), sin(angle)) * radius * ApertureRadius;

If you wanted different shaped apertures for different shaped bokeh, you are only limited to whatever shapes you can generate uniformly random points on.

If we add that random offset to the camera position in camera space (multiply offset.x by the camera’s x axis, and offset.y by the camera’s y axis and add those to the camera position), that gives us a random point on the aperture. This is where we want to shoot the ray towards.

rayOrigin = sensorPlanePosition;
rayDirection = normalize(randomAperturePosition - sensorPlanePosition);

You can now use this ray to have a more realistic pinhole camera!

Brightness

If you want to be more physically correct, you would also multiply the result of your raytrace into the scene by the area of the aperture. This is the correct way to do monte carlo integration over the aperture (more info on monte carlo basics: https://blog.demofox.org/2018/06/12/monte-carlo-integration-explanation-in-1d/), but the intuitive explanation here is that a bigger hole lets in more light.

After you do that, you may find that you want to be able to adjust the aperture without affecting brightness, so then you’d go through the math I talk about before, and you’d auto calculate exposure based on aperture size.

When looking at the bigger picture of that setup, you’d be multiplying a number to account for aperture size, then you’d basically be dividing by that number to make it have the desired brightness – with a little extra to make it a little bit darker or brighter as the baseline brightness.

A more efficient way to do this would be to just not multiply by the aperture area, and apply an exposure to that result. That way, instead of doing something like dividing by 300,000 and then multiplying by 450,000, you would just multiply by 1.5, and it’d be easier for a human to work with.

Lens Cameras

Finally, onto lenses!

The simplest lens camera that you can make (and what I used) is to just put a convex lens inside the aperture.

Funny tangent: lens comes from the greek word for lentil. (https://jakubmarian.com/are-lens-and-lentil-related/)

A motivation for using lenses is that unlike pinhole cameras, you can increase the aperture size to let more light in, but still get a focused shot.

This comes at a cost though: there is a specific range of depth that is in focus. Other things that are too close or too far will appear blurry. Also, the larger the aperture, the smaller the “in focus range” will be.

From that perspective, it feels a bit silly simulating lenses in computer graphics, because there is no technical reason to simulate a lens. In computer graphics, it’s easier to make a sharper image than a blurry one, and if we want to adjust the image brightness, we just multiply the pixels by a constant.

Simulating a lens for depth of field and bokeh is purely a stylistic choice, and has nothing to do with a rendering being more correct!

How Convex Lenses Work

Convex lenses are also called converging lenses because they bend incoming light inwards to cross paths. Below is a diagram showing how the light travels from objects on the left side, through the lens, to the right side. The light meets on the other side of the lens at a focus point for each object. The orange “F” labels shows the focal distance of the lens.

If two points are the same distance from the lens on the axis perpendicular to the lens, their focal points will also be the same distance from the lens on that axis, on the other side of the lens.

This means that if we had a camera with a sensor plane looking through a lens, that there would be a focal PLANE on the other side of the lens, made up of the focus points of each point for each sensor on the sensor plane. Things closer than the focus plane would be blurry, and things farther than the focus plane would be blurry, but things near the focus plane would be sharper.

The distance from the camera (aperture) to the focal plane is based on the focal distance of the lens, and also how far back the sensor plane is. Once you have those two values, you could calculate where the focal plane is.

There is a simpler way though for us. We can skip the middle man and just define the distance from the camera to the focal plane, pretending like we calculated it from the other values.

This is also a more intuitive setting because it literally tells you where an object has to be to be in focus. It has to be that many units from the camera to be perfectly in focus.

Going this route doesn’t make our renderer any less accurate, it just makes it easier to work with.

Nathan Reed (@Reedbeta) has this information to add, to clarify how focus works on lens cameras (Thanks!):

The thing you change when you adjust focus on your camera is the “image distance”, how far the aperture is from the film, which should be greater than or equal to the lens focal length.

The farther the aperture from the sensor, the nearer the focal plane, and vice versa. 1/i + 1/o = 1/f.

And this good info too:

“focal length” of a lens is the distance from film plane at which infinite depth is in sharp focus, and is a property of the lens, eg “18mm lens”, “55mm lens” etc. The focal length to sensor size ratio controls the FOV: longer lens = narrower FOV

Programming A Lens Camera With Bokeh

Programming a lens camera is pretty simple:

  1. Calculate a ray like you normally would for a path tracer: the origin is the camera position, and the direction is pointed out into the world. Subpixel jitter is again just fine to mix with this.
  2. Find where this ray hits the focal plane. This is the focal point for this ray
  3. Pick a uniform random spot on the aperture
  4. Shoot the ray from the random aperture position to the focal point.

That’s all there is to it!

You could go through a more complex simulation where you shoot a ray from the sensor position to a random spot on the aperture, calculate the refraction ray, and shoot that ray into the world, but you’d come up with the same result.

Doing it the way I described makes it no less accurate(*)(**), but is simpler and computationally less expensive.

* You’ll notice that changing the distance to the focal plane doesn’t affect FOV like changing the focal distance did for the pinhole camera. If you did the “full simulation” it would.
** Ok technically this is a “thin lens approximation”, so isn’t quite as accurate but it is pretty close for most uses. A more realistic lens would also have chromatic aberration and other things so ::shrug::

You can optionally multiply the result of the ray trace by the aperture size like we mentioned in the pinhole camera to make the brightness be properly affected by aperture size. If you’d rather not fight with exposure multiplier calculations as you change aperture size though, feel free to leave it out.

Here are some links for more information on lenses:
http://www.physicsclassroom.com/class/refrn/Lesson-5/Converging-Lenses-Ray-Diagrams
https://computergraphics.stackexchange.com/questions/4344/depth-of-field-in-path-tracing-what-do-i-do-with-the-secondary-ray
https://en.wikipedia.org/wiki/Thin_lens
http://www.passmyexams.co.uk/GCSE/physics/concave-lenses-convex-lenses.html

Lens Camera Settings Visualized

This video shows the effect of the aperture size changing. Notice that the area in focus is smaller with a larger aperture radius.

This video shows the effect of the focal distance changing. Nothing too surprising here, it just changes what depth is in focus.

Photography is a Skill

Even after I had things implemented correctly, I was having trouble understanding how to set the parameters to get good Bokeh shots, as you can see from my early images below:


Luckily, @romainguy clued me in: “Longer focals, wider apertures, larger distance separation between subjects”

So what I was missing is that the bokeh is the stuff in the background, which you make out of focus, and you put the focal plane at the foreground objects you want in focus.

It’s a bit strange when you’ve implemented something and then need to go ask folks skilled in another skill set how to use what you’ve made hehe.

Other Links

Here’s some other links I found useful while implementing the code and writing this post:

https://en.wikipedia.org/wiki/Pinhole_camera_model#The_geometry_and_mathematics_of_the_pinhole_camera
https://en.wikipedia.org/wiki/Camera_lens#Theory_of_operation
https://www.scratchapixel.com/lessons/3d-basic-rendering/3d-viewing-pinhole-camera/virtual-pinhole-camera-model
https://en.m.wikipedia.org/wiki/Circle_of_confusion

My Code

My GPU path tracer that generated this images is up on github.

It’s a work in progress so is missing some things, has some todo notes, and maybe has some things that are incorrect in it, be warned! πŸ™‚

The code is here: https://github.com/Atrix256/FalcorPathTracer/releases/tag/v1.0

The path tracer uses nvidia’s “Falcor” api abstraction layer. As best as I can tell, just pulling down falcor to your machine and compiling it registers it *somewhere* such that projects that depend on falcor can find it. I’m not really sure how that works, but that worked for me on a couple machines I tried it on strangely.

This is the version / commit of Falcor I used:
https://github.com/NVIDIAGameWorks/Falcor/commit/0b561caae19e8325853166cc4c93d4763570774a

I wish I had a more fool proof way to share the code – like if it were to download the right version of falcor when you try to build it. AFAIK there isn’t a better way, but if there is I would love to hear about it.

Anyhow, happy rendering!! πŸ™‚

Monte Carlo Integration Explanation in 1D

Let’s say that you have a function y=\sin(x)^2 and you want to know what the area is under the curve between 0 and pi.

We could solve this specific problem by doing some algebra and calculus to get the exact answer analytically (which is \frac{\pi}{2}), but let’s pretend like we can’t, or don’t want to solve it that way.

Another way to solve this problem is to use Monte Carlo integration, which lets you solve it numerically and get an approximated answer.

How you would do that is like this:

  1. Pick a random number between 0 and pi.
  2. Plug that value into the function y=\sin(x)^2 as x to get a y value.
  3. Do this multiple times and take the average to get the average y value of the function.
  4. Pretending that the function is a rectangle, you can use the average y as the height of the rectangle, and use pi as the width because we are looking between 0 and pi.
  5. Multiply that width and height to get the area of a rectangle, which is the estimated area under the curve.

That’s all you need to do!

Monte Carlo integration is pretty powerful in how simple it is, and how it works really well even in extremely high dimensions.

As you might imagine, the more samples you take to get your average y value, the better your estimate is going to be. Unfortunately though, you have to quadruple the number of samples you have to cut the error in half, so it can take a while to get the correct answer (converge) if you need a high level of accuracy. (https://en.wikipedia.org/wiki/Monte_Carlo_method#Integration)

Here’s a C++ code snippet doing this process with 10,000 samples. Each time you run the program you’ll get a different estimate. If you take more samples, you’ll more reliably get a better answer.

double SimpleMonteCarlo()
{
    double rangeMin = 0;
    double rangeMax = 3.14159265359;

    size_t numSamples = 10000;

    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_real_distribution<double> dist(rangeMin, rangeMax);

    double ySum = 0.0;
    for (size_t i = 1; i <= numSamples; ++i)
    {
        double x = dist(mt);
        double y = sin(x)*sin(x);
        ySum += y;
    }
    double yAverage = ySum / double(numSamples);

    double width = rangeMax - rangeMin;
    double height = yAverage;

    return width * height;
}

Below is the output of the code ran 5 times. Note that the real answer is \frac{\pi}{2} which is 1.57079632679.

  1. 1.548451
  2. 1.554312
  3. 1.576727
  4. 1.578759
  5. 1.598686

(I’m actually a bit disturbed that the 5 runs are actually sorted from low to high but whatever …)

A problem with this being based on regular old random numbers (white noise) is that sometimes the numbers will clump, giving too much weighting to one area of the function, and leave empty space where another part of the function wasn’t sampled at all.

There are many different ways to deal with this situation but two of my favorites are…

  1. Blue Noise: https://blog.demofox.org/2018/01/30/what-the-heck-is-blue-noise/
  2. Low discrepancy sequences: https://blog.demofox.org/2017/05/29/when-random-numbers-are-too-random-low-discrepancy-sequences/

Both of those things give more even coverage over the sampling space which means that you won’t have as large gaps of missing information from your samples.

Another way to help this is stratified sampling, where you break the sampling space up into some number of sections, and choose random numbers within each section, making sure to have samples in each of the sections. That keeps the randomness, but gives more even coverage over the sampling space.

You might be tempted to just say “If I’m taking 100 samples, i’ll just sample every 1/100th of the space evenly”. That uniform / regular sampling has some problems including aliasing, but also loses some of the positive mathematical properties that random numbers can give you (like, being able to sample from non rational numbered locations!).

A variation on stratified sampling is a technique invented by Pixar called “jittered grid” where you do even sampling, but add a small random value to each sample.

There are lots and lots of other techniques which could make up a long list of blog posts, so we’ll stop there! πŸ™‚

More General Monte Carlo Integration

The last section was actually a simplified version of a Monte Carlo integration which was able to be simplified because it was using uniform random numbers.

Monte Carlo integration works with random numbers that have arbitrary distributions as well, not just uniform random numbers.

The process works mostly the same but there are a couple differences.

In the previous section, we got an average height and then multiplied by the width to get an estimate of the area under the curve, pretending that it was a rectangle.

The first change is to move the multiplication by the width into the loop. Instead of calculating an average height, we are instead calculating average rectangle areas.

Mathematically you get the same answer, so there’s nothing crazy there.

The second change is that instead of multiplying by the width, you divide by the probability of the number being chosen, that you plugged into the equation.

In the case of our function that we are taking samples of between 0 and pi, the probability of any single number being chosen in that range is \frac{1}{\pi}. When we divide by that, it means we end up just multiplying by pi, so it’s mathematically equivalent to what were were doing before!

Here’s the steps for the more generalized monte carlo integration:

  1. Pick a random number between 0 and pi using any random number distribution you’d like to.
  2. Plug that value into the function y=\sin(x)^2 as x to get a y value.
  3. Divide that y value by the probability of having chosen that number (otherwise known as PDF(x)) to get an estimated area of the function.
  4. Do this multiple times and take the average to get your result.

Here is some code to do the more general Monte Carlo integration, still using uniformly distributed random numbers.

double GeneralMonteCarlo()
{
    size_t numSamples = 10000;

    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_real_distribution<double> dist(0.0f, 1.0f);

    auto InverseCDF = [](double x) -> double
    {
        return x * c_pi;
    };

    auto PDF = [](double x) -> double
    {
        return 1.0f / c_pi;
    };

    double estimateSum = 0.0;
    for (size_t i = 1; i <= numSamples; ++i)
    {
        double rnd = dist(mt);
        double x = InverseCDF(rnd);
        double y = sin(x)*sin(x);
        double pdf = PDF(x);
        double estimate = y / pdf;

        estimateSum += estimate;
    }
    double estimateAverage = estimateSum / double(numSamples);

    return estimateAverage;
}

Interestingly, dividing by the PDF is the same mathematically as multiplying by width in the last section – it literally ends up being a multiplication by pi (the width). The only difference is that we pulled the multiply into the loop, instead of leaving it until the end.

As an optimization, you could definitely move the divide out again (and turn it into a multiply), but I wanted to present the code as close to the core concepts as possible.

Non Uniform Random Number Distributions

Let’s try sampling from a different random number distribution. Let’s generate random numbers which have a distribution of y=\sin(x). You can see it compared to the function we are integrating y=\sin(x)^2 below. They are fairly similarly shaped!

To use y=\sin(x) as a random number distribution for monte carlo integration, we’ll need to calculate the normalized PDF and we’ll also need to calculate the inverse CDF.

If you want to know more about PDFs and “whatever an inverse CDF may be”, give this a read: Generating Random Numbers From a Specific Distribution By Inverting the CDF

  • The function y=\sin(x) is normalized to this PDF: \mathit{PDF}(x) = \frac{\sin(x)}{2}
  • To generate numbers from that PDF, you take a random number x that is between 0 and 1 and plug it into this function, which is the inverse CDF: \mathit{CDF}^{-1}(x) = 2 \cdot \sin^{-1}(\sqrt{x})

Here is a code snippet doing monte carlo integration with this PDF and inverse CDF:

double ImportanceSampledMonteCarlo()
{
    size_t numSamples = 10000;

    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_real_distribution<double> dist(0.0, 1.0);

    auto InverseCDF = [](double x) -> double
    {
        return 2.0 * asin(sqrt(x));
    };

    auto PDF = [](double x) -> double
    {
        return sin(x) / 2.0f;
    };

    double estimateSum = 0.0;
    for (size_t i = 1; i <= numSamples; ++i)
    {
        double rng = dist(mt);
        double x = InverseCDF(rng);
        double y = sin(x)*sin(x);
        double pdf = PDF(x);
        double estimate = y / pdf;

        estimateSum += estimate;
    }
    double estimateAverage = estimateSum / double(numSamples);

    return estimateAverage;
}

To compare this versus uniform random sampling, I'll show the progress it makes over 50,000,000 samples first using uniform random numbers, then using the y=\sin(x) shaped PDF.

Uniform aka 1/pi:

sin(x):

You may notice that every 4x samples, the standard deviation (which is the square root of variance) drops in half, like we talked about before. This is why path tracing takes so long. If you don’t know what path tracing is, this is why modern animated movies take so long to render.

In the results, you can see that the variance of the estimates is a lot lower using this PDF that is shaped more like the function we are trying to integrate. We got a better, more reliable answer with fewer samples. Is that pretty cool? You bet it is! When you use a PDF shaped like the function you are integrating, to get better results faster, that is called importance sampling.

Bad Random Number Distributions

If you use a PDF which is shaped very differently from the function you are trying to integrate, you will get more variance and it will take longer to converge, which is a total bummer.

Let’s try y=(\frac{x}{\pi})^5, which doesn’t look much like the function we are trying to integrate at all:

Here is the PDF and inverse CDF:

  • \mathit{PDF}(x)=(\frac{x}{\pi})^5 \cdot \frac{6}{\pi}
  • \mathit{CDF}^{-1}(x)= (x*\pi^6)^{\frac{1}{6}}

Here it is with 50,000,000 samples:

And here is the uniform sampling again as a comparison:

As you can see, it is approaching the right answer, but is taking about 10 times as long to get the same results (amount of variance) compared to uniform sampling. Ouch!

Perfect Random Number Distributions

Let’s say that we got really lucky and somehow got the PDF and inverse CDF for a function that perfectly matched the function we were trying to integrate. What would happen then?

Let’s check it out by integrating the function y=\sin(x) by using a random number distribution which has the form y=\sin(x).

We already calculated the PDF and inverse CDF of that function earlier:

  • \mathit{PDF}(x) = \frac{\sin(x)}{2}
  • \mathit{CDF}^{-1}(x) = 2 \cdot sin^{-1}(\sqrt{x})

Here we do that with 50,000,000 samples:

WOW! As you can see, it had the right answer from the first sample, with zero variance (randomness) and it kept steady at that answer for all 50,000,000 samples.

This is a pretty neat concept, and if you know about “cosine weighted hemisphere sampling”, that does this exact thing.

Cosine weighted hemisphere samples are weighted such that you can remove the \cos(\theta) from the lighting calculations, because the random number distribution handles it for you.

It basically removes that part of randomness from the equations.

Unfortunately there are more variables and randomness in path tracing than just that term, but it helps.

Beyond this, you’d start look at other variance reduction techniques if you were interested, including multiple importance sampling.

Closing

Going into this blog post I thought “hey no sweat, i’ll make a few simple functions, calculate their PDFs, inverse CDFs and be on my way”.

I can’t believe how almost all the simple functions I tried ended up being impossible to take through the process.

for instance, you can take x=\sin(y) and solve for y to get y=\sin^{-1}(x), but if you try to solve x=\sin(y)+y for y, you are going to have a bad day!

I think in the future if I need to do something like this, I’d like to try fitting a curve to the (x,y) data points reordered as (y,x) data points, but there are many other methods for doing this sort of thing as well.

BTW if wondering how I was calculating std dev (aka square root of variance) while integrating, variance is “The average of the squared differences from the mean”. That means that if you know the correct answer of what you are trying to integrate, you can calculate the std dev like this:

        // Variance is "The average of the squared differences from the mean"
        double difference = integration - actualAnswer;
        double differenceSquared = difference * difference;
        averageDifferenceSquared = Lerp(averageDifferenceSquared, differenceSquared, 1.0 / double(i));
        double stdDev = sqrt(averageDifferenceSquared);
  • integration is the current average estimate (if you have taken 100 samples, it’s the average of the 100 samples)
  • actualAnswer is the known right answer
  • averageDifferenceSquared is also the variance
  • i is the number of samples you have taken, including the current one (aka start at 1, not 0)
  • If you are confused about me doing a lerp to calculate an average, give this a read: Incremental Averaging

Hope you enjoyed this write up!

Anders Lindqvist (@anders_breakin) is writing up a blog post explaining monte carlo, importance sampling, and multiple importance sampling that you might be interested in if you enjoyed this. Give him a follow, and it’ll be coming out soon πŸ™‚

Also, here is a really nice twitter thread talking about why importance sampling actually works:
https://twitter.com/Atrix256/status/1003487338633105409

Test

This is a test

double SimpleMonteCarlo()
{
    double rangeMin = 0;
    double rangeMax = 3.14159265359;

    size_t numSamples = 10000;

    std::random_device rd;
    std::mt19937 mt(rd());
    std::uniform_real_distribution dist(rangeMin, rangeMax);

    double ySum = 0.0;
    for (size_t i = 1; i <= numSamples; ++i)
    {
        double x = dist(mt);
        double y = sin(x)*sin(x);
        ySum += y;
    }
    double yAverage = ySum / double(numSamples);

    double width = rangeMax - rangeMin;
    double height = yAverage;

    return width * height;
}

That was a test

A Neat Trick For Compressing Networked State Data

I’m not a network programmer, but I do like cool algorithms.

Today at lunch a co-worker mentioned a really cool trick for when you need to continually send state across a network.

For instance, if you had a large struct of data containing information about all the players in the world, projectiles, enemies, and other game objects – and you wanted to send this information to a specific player so their game client could render the world appropriately / do collision detection / etc.

It goes like this:

  1. Get the initial state and send it (compressed).
  2. When it’s time to send an update to the state, XOR it against the previous state, compress that result and send it.
  3. Rinse and repeat.

The magic here is in the assumption that the state as a whole isn’t going to change much from update to update. If this assumption is true, when you do the XOR against the previous state, you are going to end up with a lot of zeroes, which compress very nicely, making for a small data payload.

On the other side, when you receive an update, you would just decompress it, XOR it against the previous state, and use that as the new state.

I thought that was pretty clever.

BTW when sending compressed data across the network, the smaller the message sent, the more of the network data is taken up by the “compression header”. To get an intuition for this, go compress an empty text file and notice how it got bigger. The header is there to give information about how to decompress the data (such as a huffman table, or whatever else).

One way to get around this is to have the server and client pre-agree on a compression header so that when they talk to each other, they can omit it.

Another way to get around this is to use an ADAPTIVE compression header.

What you do is on the client, when you decompress data, you process it using logic to improve an implicit compression header for use with the next communication. If the server does the same (keeping a context for each player individually), and if this code is deterministic, client and server will adapt their compression in the exact same ways. This allows each side to talk to each other without sending compression headers, but still changing compression on demand to match the needs of the specific data being sent.

Anyways, like I said, I’m not a network programmer, so back to graphics for me πŸ™‚

If you like network programming, Glenn Fiedler has written a lot of really interesting things on the subject.
https://gafferongames.com/
@gafferongames

Taking a Stroll Between The Pixels

This post relates to a paper I wrote which talks about (ab)using linear texture interpolation to calculate points on Bezier curves. Extensions generalize it to Bezier surfaces and (multivariate) polynomials. All that can be found here: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

The original observation was that if you sample along the diagonal of a 2×2 texture, that as output you get points on a quadratic Bezier curve with the control points of the curve being the values of the pixels like in the image below. When I say you get a quadratic Bezier curve, I mean it literally, and exactly. One way of looking at what’s going on is that the texture interpolation is literally performing the De Casteljau algorithm. (Note: if the “B” values are not equal in the setup below, the 2nd control point will be the average of these two values, which an extension abuses to fit more curves into a smaller number of pixels.)

An item that’s been on my todo list for a while is to look and see what happens when you sample off of the 45 degree diagonal between the pixel values. I was curious about questions like:

  • What if we sampled across a different line?
  • What if we samples across a quadratic curve like by having y=x^2?
  • What if we sampled on a circle or a sine wave?
  • How does the changed sampling patterns work in higher dimensions – like trilinear or quadrilinear interpolation?

After accidentally coming across the answer to the first question, it was time to look into the other ones too!

PS – if wondering “what use can any of this possibly have?” the best answer I have there is data compression for data on the GPU. If you can fit your data with piecewise rational polynomials, the ideas of this technique could be useful for storing that data in a concise way (pixels in a texture) that are also quickly and easily decoded by the GPU. The ideas from this post allows for more curve types when fitting and storing your data, beyond piecewise rational polynomials. It’s also possible to store higher order curves and surfaces into smaller amounts of texture data.

Quick Setup: Bilinear Interpolation Formula

Bilinear interpolation is available on modern GPUs as a way of getting sub-pixel detail. In the olden days, when zooming into a texture, the square pixels just got larger because nearest neighbor filtering was used. In modern times, when looking at the space between pixel values, bilinear interpolation is used to fill in the details better than nearest neighbor does.

You can describe bilinear interpolation as interpolating two values across the x axis and interpolating between the results across the y axis (reversing the order of axes also works). Mathematically, that can look like this:

z = (A(1-x) + Bx)(1-y) + (C(1-x)+Dx)y

Where x and y are values between 0 and 1 describing where the point is between the pixels, and A,B,C,D are the values of the 4 nearest pixels, which form a box around the point we are calculating. A = (0,0), B = (1,0), C = (0,1) and D = (1,1).

With some algebra, you can get that equation into a power series form which is going to be easier to work with in our experiments:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

For some deeper info on bilinear interpolation check out these links:
https://blog.demofox.org/2015/04/30/bilinear-filtering-bilinear-interpolation/
http://reedbeta.com/blog/quadrilateral-interpolation-part-1/
http://reedbeta.com/blog/quadrilateral-interpolation-part-2/
https://computergraphics.stackexchange.com/questions/7539/geometric-interpretation-of-this-bilinear-interpolation-equation/7541

Now that we have our formula, we can begin! πŸ™‚

Sampling Along Other Lines

So, if we sample along the diagonal from A to D, we know that we get a quadratic equation out. What happens if we sample along other lines though?

My guess before I knew the answer to this was that since the 45 degree angle line is quadratic (degree 2), and that horizontal and vertical lines were linear (degree 1), that sampling along other lines must be a fractional degree polynomial between 1 and 2. It turns out that isn’t the answer, but I wonder if there’s a way to interpret the “real answer” as a fractional polynomial?

Anyways, wikipedia clued me in: https://en.wikipedia.org/wiki/Bilinear_interpolation#Nonlinear

The interpolant is linear along lines parallel to either the x or the y direction, equivalently if x or y is set constant. Along any other straight line, the interpolant is quadratic

What that means is that if you walk along a horizontal or vertical line, it’s going to be linear. Any other line will be quadratic.

Let’s try it out.

Remembering that the equation for a linear function is y=mx+b let’s literally replace y with mx+b and see what we get out.

So, we start with the power series bilinear interpolation polynomial:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

Which becomes this after substitution:

z = (A-B-C+D)x(mx+b) + (B-A)x + (C-A)(mx+b) + A

After some expansion and simplification we get this:

z = (Am-Bm-Cm+Dm)x^2+(Ab-Bb-Cb+Db+Cm-Am+B-A)x+Cb-Ab+A

This formula tells us the value we get if we have a bilinear interpolation of values A,B,C,D (aka a bilinear surface defined by those points), and we sample along the x,y line defined by y=mx+b.

It’s a very generalized function that’s hard to reason about much, but one thing is clear: it is a quadratic function! Whatever constant values you choose for A,B,C,D,m and b, you will get a quadratic polynomial (or lower degree, but never higher).

Here’s a shadertoy that shows curves generated by random sub pixel line segments on a random (white noise) RGB texture: https://www.shadertoy.com/view/XstBz7

(note that the rough edges of the curve are due to the fact that interpolation happens in X.8 fixed point format, so has pretty limited precision. Check the paper for more information and ways to address the issue.)

Let’s explore a bit by plugging in some values for m and b and see what happens for different types of lines.

m=0, b=0

Let’s see what happens when m is 0 and b is 0. In other words, lets see what happens when we sample along the line y=0.

Plugging those values in gives:

z = (B-A)x + A

interestingly, this is just a linear interpolation between A and B, which makes sense when looking at the graph of where we are sampling on the bilinear surface.

This goes along with what wikipedia told us: when one of the axes is constant (it’s a horizontal or vertical line) the result is linear.

m=1, b=0

Let’s try m = 1 and b = 0. That is the line: y=x. This graph shows where that is sampling from on the bilinear surface:

Plugging in the values gives us this equation:

z = (A-B-C+D)x^2+(C+B-2A)x+A

We get a quadratic out! This shouldn’t be too surprising. This is the original insight in the technique. This is also the formula for a quadratic Bezier curve with control points A, (B+C)/2, D.

m=2, b=1

Let’s try the line y=2x+1. Here’s the graph of where we are sampling on the bilinear surface:

Plugging in the values give us the equation:

z = (2A-2B-2C+2D)x^2+(C+D-2A)x+C

Once again we got a quadratic function when sampling along a line.

You might think it’s strange that the equation ends it “+C” instead of “+A”, but if you look at the graph it makes sense. The line literally starts at C when x is zero.

x=2u, y=3u

In the above examples we are only modifying the y variable, to be some function of x. What if we also want to modify the x variable?

One way to do this is to make a 3rd variable u that goes from 0 to 1. Then we can make x and y be based on that variable.

Let’s see what happens when we use these two equations:

y=2u

x=3u

That makes us sample this line on the bilinear surface.

Plugging the functions of u in for x and y we get:

z = (6A-6B-6C+6D)u^2+(2B+3C-5A)u+A

It’s still a quadratic!

What About a Quadratic Path?

So we now know that when moving along a straight line on a bilinear surface, that you will get a quadratic function as output, except in the case of the line being horizontal or vertical. Note: if the bilinear surface is a plane, all lines on that surface will be linear functions, so this is another way to get a linear result. It could also be degenerate and give you a point result. You will never get a cubic result (or higher) when going along a straight line though.

What would happen though if instead of sampling along straight lines, we sampled on other shapes, like quadratic curves?

y=x*x

Let’s start with the function y=x^2. The path that is sampled is:

Going back to the power series form of bilinear interpolation, let’s plug x^2 in for y and see what we get out.

The starting equation:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

becomes:

z = (A-B-C+D)x(x^2) + (B-A)x + (C-A)(x^2) + A

Which becomes:

z = (A-B-C+D)x^3 + (C-A)x^2 + (B-A)x + A

It’s a cubic equation!

Here is a shadertoy which follows this sampling path on random pixels: https://www.shadertoy.com/view/4sdBz7

Something neat about this example specifically is that a cubic equation has 4 coefficients, which are basically 4 control points. This example makes use of the values of the 4 pixels involved to come up with the 4 coefficients, so “doesn’t leave anything on the table” so to speak.

This is unlike sampling along line segments where you have 3 control points stored in 4 pixel values. One is a bit redundant in that case.

You can make use of that fact (I have for instance!), but sampling along a quadratic path to get a cubic curve feels like a natural fit.

x=u*u, y=u*u

Let’s see what happens when we move along both x and y quadratically.

Just like in the linear case, we have our 3rd variable u that goes from 0 to 1 and we have x and y be based on that variable. We will use these equations:

x=u^2

y=u^2

The sampling path looks like this:

When we plug those in we get this quartic function:

z = (A-B-C+D)u^4 + (B+C-2A)u^2 + A

You might be surprised to see what looks like a linear path. It’s just because at all times, x is the same value as y, even though they travel down the line non linearly.

Shadertoy: https://www.shadertoy.com/view/Xdtfz7

Higher Order Curves: x=3u^2, y=2u^4

Let’s get a little more wild, using these equations:

x=3u^2

y=u^4

Which makes a sampling path of this:

Plugging in the equations, the bilinear interpolation equation:

z = (A-B-C+D)xy + (B-A)x + (C-A)y + A

becomes a hexic equation:

z = (3A-3B-3C+3D)u^6 + (C-A)u^4 + (3B-3A)u^2 + A

The shadertoy visualizes it on random pixels as per usual, but with u going from 0 to 1, it means that x goes from 0 to 3 (y is still 0-1), which makes some obvious discontinuities at the boundaries of pixels. In our pure math formulation, we wouldn’t have any of those, but since we are sampling a real texture, when we leave the safety of our (0,1) box, we enter a new box with different control points. https://www.shadertoy.com/view/4dtfz7

Trigonometric Function: y = sin(2*pi*x)

Let’s try y=sin(2\pi x), which takes this path on the bilinear surface:

The bilinear interpolation equation becomes a trigonometric polynomial:

z = (A-B-C+D)x*sin(2\pi x) + (B-A)x + (C-A)*sin(2\pi x) + A

That has disconuities in it when texture sampling again, due to leaving the original pixel region, so here’s a better looking shadertoy, which is for y=sin(2\pi x)*0.5+0.5. It scales and shifts the y values to be between 0 and 1. https://www.shadertoy.com/view/4stfz7

Circle

Lastly, here’s sampling on a circle.

x=sin(2 \pi u)*0.5+0.5

y=cos(2 \pi u)*0.5+0.5

It follows this path:

Plugging the functions into the power series bilinear equation gives:

z = (A-B-C+D)*(sin(2 \pi u)*0.5+0.5)*(cos(2 \pi u)*0.5+0.5) + (B-A)*(sin(2 \pi u)*0.5+0.5) + (C-A)*(cos(2 \pi u)*0.5+0.5) + A

Here’s the shadertoy: https://www.shadertoy.com/view/Xddfz7

Something neat about sampling in a circle is that it’s continuous – note how the left side of the curves line up with the right side seamlessly. That seems like a pretty useful property.

Moving On

We went off into the weeds a bit, but hopefully you can see how there are a ton of possibilities for encoding and decoding data in a very small number of pixels by carefully crafting the path you sample along.

Compared to the simple “sample along the diagonal” technique, there is some added complexity and shader instructions though. Namely, any work you do to modify x or y before passing them to the linear texture interpolator needs to happen in shader code. That means this technique takes more ALU, but can mean it takes even less texture memory than the other method.

The last question from the top of the post is “What does this all mean in higher dimensional interpolation, like trilinear or quadrilinear?”

Well, it works pretty much the same was as bilinear but there are more dimensions to work with.

We saw that in 2 dimensional bilinear interpolation that when we made x and y be functions (either of each other, or of a 3rd variable u), that the resulting polynomial had a degree that was the degree of x plus the degree of y.

In 3 dimensions with trilinear interpolation, the resulting polynomial would have a degree that is the degree of x, plus the degree of y, plus the degree of z.

In 4 dimensions with quadrilinear, add to that the degree of w.

Let’s consider the case when we don’t want a single curve though, but want a surface or (hyper) volume.

As we’ve seen in the extension dealing with surfaces and volumes, if you have a degree N polynomial, you can break it apart into a multivariate polynomial (aka a surface or hyper volume) so long as the sum of the degrees of each axis adds up to N.

It’s basically what we were just talking about but in reverse.

One thing I think would be interesting to explore further would be to see what the limitations are when you take this “too far”.

For instance, a 2×2 texture can give you a quadratic if you sample along any straight line in the uv coordinates. If you first put the u coordinate through a cubic function, and put the v coordinate through a different cubic function, I think you should be able to make a bicubic surface.

The surface will be constrained to a subset of what a general bicubic surface is able to be shaped like, but you will get a bicubic surface. (basically there will be implicit control points that you don’t have control over unless you add more pixels, and do more sampling, or higher dimensional linear interpolation)

I’d like to see what the constraints there are and see if there’s any chance of getting any real use out of something like that.

Anyhow, thanks for reading! Any ideas, corrections, usage cases you have, whatever, hit me up!

@Atrix256