How To: Data Lookups Via Raytracing

Real time raytracing is seemingly finally here. We have a real time raytracing API in directX, another API in vulkan, hardware support in nvidia cards, and this is only the beginning of this new phase of graphics.

Having hardware and API support for raytracing means we can use that to help with the usual raytraced graphics things – reflection, refraction, shadows and more – but it is also new hardware / API to abuse.

For instance, Inigo Quilez talks about how ray / triangle intersection could be used to do 3×3 linear equation solving: http://www.iquilezles.org/blog/?p=4666

In a similar style of hardware/API abuse (but not related to raytracing), I have shown how to make the linear texture interpolator calculate points on curves and surfaces when storing the control points in texels, and also showed how it can evaluate generic polynomials: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

In the not too distant future, I think it will be common to use raytracing for data lookups in real time graphics, much like we use textures currently. In fact, from a couple conversations I’ve had with folks on twitter, it seems as though some people are already doing this.

As strange as it sounds, raytracing has a significant advantage over texture lookups. A texture lookup is limited to data stored in a regular grid in pixels. Raytracing data lookups get their data from a mesh, with data stored in vertices (in a vertex buffer).

Storing data in the vertices of a mesh means that you can store data points wherever you want. If you need more detail in one area and less in another, you can put more vertices in the higher detail area, and fewer vertices in the low detail area. You could also store data in a blue noise sampling pattern to help fight aliasing problems you might have with the regular grid of a texture. Furthermore, you could actually have holes in your data for invalid data regions, by having holes in the mesh.

Essentially, a mesh is just a generalization of a texture. You are no longer locked to a grid!

How the data lookups are actually done is not too complex either.

For the 2d case where you have a function f(x,y), you would make a triangle mesh where the (x,y) position of each vertex was the location of a data point, and you would make the z value some constant such as 0.5.

To look up the value of the data for some (x,y) input, you could make a ray that started at (x,y,0) and went in direction (0,0,1). When you did your raytrace, you’d get as a result the triangle index and the barycentric coordinates of that triangle. From there you could look up the data from the 3 vertices of the triangle and use the barycentric coordinates to interpolate between the values. The interpolated value would be your result.

You can see how this process goes in the image below. The purple dot is the query location.

Just as there are volume textures to store 3d data in 3d textures, raytraced data lookups can also be extended to 3d.

For the 3d case where you have a function f(x,y,z), you would make a tetrahedral mesh where the (x,y,z) position of each vertex was the location of a data point.

To look up the value of the data for some (x,y,z) input, you need to be able to find what tetrahedron the point is in. To do this, you just shoot a ray starting at that (x,y,z) position in any arbitrary direction. That ray will hit one triangle in the tetrahedron. You then shoot a ray from the (x,y,z) position in the opposite direction to get a different triangle in the tetrahedron.

From these two triangles, you’ll have 6 vertices but only 4 will be unique. Those 4 vertices are the vertices of the tetrahedron. You can read the data from the vertices, calculate the barycentric coordinates of the point inside the tetrahedron, and then use those to interpolate the vertex data to get the result.

You can see how this process goes in the image below. The purple dot is the query location again.

The most immediate usage case I can think of for this technique would be for diffuse light probe grids. Whether you had a 2d or 3d light probe grid, you’d be able to make probes as dense as needed, or as sparse as you can get away with, in different sections of the geometry. You could also make holes in the mesh to make sure data didn’t interpolate through walls, leading to light leaking. You would use the techniques described above to interpolate the simplex data and get the result.

As this data is likely going to be relatively simple geometry compared to something like an actual game asset, it seems like it ought to be able to be pretty performant too.

Nathan Reed shared a really good idea on twitter, for doing the 3d lookup with only a single raytrace. The idea is that when you knew what triangle you hit, you could look in a table to get the fourth vertex based on whether you hit the triangle from the front or the back. One way to do this would be to have a buffer that had two vertex indices per triangle. The first index would be if you hit from positive, the second would be if you hit from negative.

That way, the index you’d look at would be [triangleIndex*2 + hitBackSide ? 1 : 0]. That data lookup ought to be a lot cheaper than a second raytrace!

Thread:

Can using raytracing to do data lookups extend to 4d and beyond? Probably, but I’m not sure how. Do you know how, or have any interesting usage cases? Share if so, it’d be interesting to hear 🙂

PS – Apparently some folks are using raytracing for GPU physics. I haven’t heard any details of how other people are doing it, but I am looking forward to getting a chance to try it myself. I’m thinking Verlet physics of particles with constraints. That amounts to only needing the current and previous particle positions to get an implicit velocity, and then doing small incremental constraint solving steps to try and make things keep their shape etc. The end result is something like screen space particles / screen space physics, except it would have knowledge of the entire scene, whereas screenspace techniques only have knowledge of the gbuffer. I’ve heard that short ray trace queries run a lot faster (20x?) by not needing to traverse the acceleration structure (BVH) as widely. With luck I’ll give it a try and write a post up about it before too long.

Blending an HDR color into a U8 Buffer

I stumbled on something that I found interesting, so wanted to share in case it was useful for other people too.

The c++ code that generated this images can be found on github at https://github.com/Atrix256/U8HDRPMA

I was implementing Inigo Quilez’ “Better Fog” which is REALLY REALLY cool. It looks way better than even the screenshots he has on his page, especially if you have multiple types of fog (distance fog, height fog, fog volumes):
http://www.iquilezles.org/www/articles/fog/fog.htm

I first had it implemented as a forward render, so was doing the fogging in the regular mesh rendering shader, with all calculations being done in 32 bit floats, writing out the final result to a RGBAU8 buffer. Things looked great and it was good.

I then decided I wanted to ray march the fog and get some light shafts in, so it now became a case where I had a RGBAU8 color render target, and I had the depth buffer that I could read to know pixel world position and apply fog etc.

The result was that I had a fog color that has an HDR fog color (it had color components greater than 1 from being “fake lit”) and I knew how opaque the fog was, so I just needed to lerp the existing pixel color to the HDR fog color by the opacity. The usual alpha blending equation (The “over” operator) is actually a lerp so I tried to use it as one.

Source Blend: Source Alpha
Dest Blend: 1 – Source Alpha
Operation: Add

That becomes this, which is the same as a lerp from DestColor to SrcColor using a lerp amount of SourceAlpha.

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{SrcColor} * \text{SrcAlpha}

BAM, that’s when the problem hit. My image looked very wrong, but only where the fog was thickest and brightest. I was thinking maybe it how i was integrating my fog but it wasn’t. So maybe it was an sRGB thing, but it wasn’t. Maybe it was how i was reconstructing my world position or pixel ray direction due to numerical issues? It wasn’t.

This went on and on until i realized: You can’t say “alpha blend (1.4, 0.3, 2.4) against the color in the U8 buffer using an alpha value of 0.5”. The HDR color is clamped before the alpha blend and you get the wrong result.

You can’t alpha blend an HDR color into a U8 buffer!

… or can you?!

Doing It

As it turns out, premultiplied alpha came to the rescue here, but let’s look at why. As we go, we are going to be modifying this image:

Mathematically speaking, alpha blending works like this:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{SrcColor} * \text{SrcAlpha}

Using the X axis as alpha, and an overlaid solid color of (1.6, 1.4, 0.8), that gives us this:

However, if you output a float4 from your shader that is \text{float4}(\text{SrcColor}, \text{SrcAlpha}), alpha works like the below, where \text{sat}() clamps values to be between 0 and 1:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{sat}(\text{SrcColor}) * \text{SrcAlpha}

So what happens, is that SrcColor gets clamped to be between 0 or 1 before the lerp happens, which makes the result much different:

However, using pre-multiplied alpha changes things. The float4 we return from the shader is now \text{float4}(\text{SrcColor*SrcAlpha}, \text{SrcAlpha}).

Our blend operations are now:

Source Blend: One
Dest Blend: 1 – Source Alpha
Operation: Add

That makes the blending equation become this:

\text{DestColor} = \text{DestColor} * (1 - \text{SourceAlpha}) + \text{sat}(\text{SrcColor} * \text{SrcAlpha}) * 1

The \text{sat()} function changed to encompass the whole second term, instead of just SrcColor! That gives this result that matches the one we got when we did the lerp in shader code:

Quick Math

So visually things look fine, but let’s look real quick at the math involved.

If you lerp from 0.5 to 10.0 with a lerp factor of 0.2, you’d get 2.4. The equation for that looks like this:

0.5 * 0.8 + 10.0 * 0.2 = 0.4 + 2.0 = 2.4

This is what happens when doing the math in the forward rendered shader. You then write it out to a U8 buffer, which clips it and writes out a 1.0.

If you use alpha blending, it clamps the 10 to 1.0 before doing the lerp, which means that it lerps from 0.5 to 1.0 with a lerp factor of 0.2. That gives you a result of 0.6 which is VERY incorrect. This is why the HDR color blending to the U8 buffer didn’t work.

If you use premultiplied alpha blending instead, it clamps the 10.0*0.2 to 1, which means that it was 2 but becomes 1, and the result becomes 1.4. That gets clipped to 1.0 so gives you the same result as when doing it during the forward rendering, but allowing you to do it during a second pass.

0.5 * 0.8 + \text{sat}(10.0 * 0.2) = 0.4 + \text{sat}(2.0) = 0.4 + 1.0 = 1.4

This doesn’t just work for these examples or some of the time, it actually works for all inputs, all of the time. The reason for that is, the second term of the lerp is clipped to 0 to 1 and is added to the first term which is always correct. Both terms are always positive. That means that the second term can add the full range of available values (0 to 1) to the first term, and it is correct within that range. That means this technique will either give you the right answer or clip, but will only clip when it is supposed to anyways.

Closing

While I found this useful in a pinch, it’s worth noting that you may just want to use an HDR format buffer for doing this work instead of working in a U8 buffer. The reason why is even though this gives the same answer as doing the work in the shader code, BOTH implementations clip. That is… both implementations SHOULD be writing out values larger than 1.0 but the colors are clamped to being <= 1.0. This is important because if you are doing HDR lit fog (and similar), you probably want to do some sort of tone mapping to remap HDR colors to SDR colors, and once your colors clip, you've lost information that you need to do that remapping.

The red pixels below show where clipping happens:

A Neat Trick For Compressing Networked State Data

I’m not a network programmer, but I do like cool algorithms.

Today at lunch a co-worker mentioned a really cool trick for when you need to continually send state across a network.

For instance, if you had a large struct of data containing information about all the players in the world, projectiles, enemies, and other game objects – and you wanted to send this information to a specific player so their game client could render the world appropriately / do collision detection / etc.

It goes like this:

  1. Get the initial state and send it (compressed).
  2. When it’s time to send an update to the state, XOR it against the previous state, compress that result and send it.
  3. Rinse and repeat.

The magic here is in the assumption that the state as a whole isn’t going to change much from update to update. If this assumption is true, when you do the XOR against the previous state, you are going to end up with a lot of zeroes, which compress very nicely, making for a small data payload.

On the other side, when you receive an update, you would just decompress it, XOR it against the previous state, and use that as the new state.

I thought that was pretty clever.

BTW when sending compressed data across the network, the smaller the message sent, the more of the network data is taken up by the “compression header”. To get an intuition for this, go compress an empty text file and notice how it got bigger. The header is there to give information about how to decompress the data (such as a huffman table, or whatever else).

One way to get around this is to have the server and client pre-agree on a compression header so that when they talk to each other, they can omit it.

Another way to get around this is to use an ADAPTIVE compression header.

What you do is on the client, when you decompress data, you process it using logic to improve an implicit compression header for use with the next communication. If the server does the same (keeping a context for each player individually), and if this code is deterministic, client and server will adapt their compression in the exact same ways. This allows each side to talk to each other without sending compression headers, but still changing compression on demand to match the needs of the specific data being sent.

Anyways, like I said, I’m not a network programmer, so back to graphics for me 🙂

If you like network programming, Glenn Fiedler has written a lot of really interesting things on the subject.
https://gafferongames.com/
@gafferongames

Don’t Convert sRGB U8 to Linear U8!

In this post I’m going to explain something that I have been doing wrong for a while in my at home graphics programming projects, and show you the noticeable loss in image quality it causes.

The C++ code that generated the data and images for this post is on github. https://github.com/Atrix256/RandomCode/tree/master/sRGBPrecision

sRGB vs Linear

Every image that is meant to be displayed on your screen is an sRGB image. That’s what it means to be an sRGB image.

When doing things like applying lighting, generating mip maps, or blurring an image, we need to be in linear space (not sRGB space) so that the operations give results that appear correct on the monitor.

This means an sRGB image needs to be converted to linear space, the operations can then be done in linear space, and then the result needs to be converted back to sRGB space to be displayed on a monitor.

If this is news to you, or you are unsure of the details, this is a good read on the topic: Linear-Space Lighting (i.e. Gamma)

A small example of why this matters is really driven home when you try to interpolate between colors. The image below interpolates from green (0, 1, 0) to red (1, 0, 0)

out_gradients_labeled.

The top row interpolates in sRGB space, meaning it interpolates between those colors and writes out the result without doing any other steps. As you can see, there is a dip in brightness in the middle. That comes from not doing the operation in linear space.

The second row uses gamma 1.8. What is meant by that is that the color components are raised to the power of 1.8 to convert from sRGB to linear, the interpolation happens in linear space, and then they are raised to the power of 1.0/1.8 to convert from linear to sRGB. As you can hopefully see, the result is much better and there is no obvious drop in brightness in the middle.

Getting into and out of linear space isn’t so simple though, as it depends on your display. Most displays use a gamma of 2.2, but some use 1.8. Furthermore, some people do a cheaper approximation of gamma operations using a value of 2.0 which translates into squaring the value to make it linear, and square rooting the value to take it back to sRGB. You can see the difference between those options on the image.

The last row is “sRGB”, which means it uses a standard formula to convert from sRGB to linear, do the interpolation, and then use another standard formula to convert back to sRGB.

You can read more about those formulas here: A close look at the sRGB formula

The Mistake!

The mistake I was making seemed innocent enough to me…

Whenever loading an image that was color information (I’m not talking about normals or roughness maps here, just things that are colors), as part of the loading process I’d take the u8 image (aka 8 bits per channel), and convert it from sRGB to linear, giving a result still in u8.

From there, I’d do my rendering as normal, come up with the results, convert back to linear and go on my way.

Doing this you can look at your results and think “Wow, doing lighting in linear space sure does make it look better!” and you’d be right. But, confirmation bias bites us a bit here. We are missing the fact that by converting to linear, and storing the result in 8 bits means that we lost quite a bit of precision in the dark colors.

Here are some graphs to show the problem. Blue is the input color, red is the color after converting to linear u8, and then back to sRGB u8. Yellow is the difference between the two. (If wondering why I’m showing round trip instead of just 1 way, think about what you are going to do with the linear u8 image. You are going to use it for something, then convert it back to sRGB u8 for the display!)

Gamma 1.8

Gamma 2.0

Gamma 2.2

sRGB

As you can see, there is quite a bit of error in the lower numbers! This translates to error in the darker colors, or just any color which has a lower numbered color component.

The largest amount of error comes up in gamma 2.2. sRGB has lower initial error, but has more error after that. I would bet that was a motivation for the sRGB formulas, to spread the error out a bit better at the front.

Even though gamma 2.2 and sRGB looked really similar in the green to red color interpolation, this shows a reason you may prefer to use the sRGB formulas instead.

Another way of thinking about these graphs is that there are a quite a few input numbers that get clamped to zero. At gamma 1.8, an input u8 value of 12 (aka 0.047) has to be reached before the output is non zero. At gamma 2.0, that value is 16. At gamma 2.2 it’s 21. At sRGB it’s 13.

Showing graphs and talking about numbers is one thing, but looking at images is another, so let’s check it out!

Below are images put through the round trip process, along with the error shown. I multiplied the error by 8 to make it easier to see.

Gamma 1.8

Gamma 1.8 isn’t the most dramatic of the tests but you should be able to see a difference.

Error*8:

Gamma 2.0

Gamma 2.0 is a bit more noticeable.

Error*8:

Gamma 2.2

Gamma 2.2 is a lot more noticeable, and even has some noticeable sections of the images turning from dark colors into complete blackness.

Error*8:

sRGB

sRGB seems basically as bad as Gamma 2.2 to me, despite what the graphs showed earlier.

Error*8:

Since this dark image was basically a “worst case scenario”, you might wonder how the round trip operation treats a more typical image.

It actually has very little effect, except in the areas of shadows. (these animated gifs do show more color banding than they should, and some other compression artifacts. Check out the source images in github to get a clean view of the differences!)

Gamma 1.8

Error*8:

Gamma 2.0

Error*8:

Gamma 2.0

Error*8:

sRGB

Error*8:

So What Do We Do?

So, while we must work in linear space, converting our sRGB u8 source images into linear u8 source images causes problems with dark colors. What do we do?

Well there are two solutions, depending on what you are trying to do…

If you are going to be using the image in a realtime rendering context, your API will have texture format types that let you specify that a texture is sRGB and needs to be converted to linear before being used. In directx, you would use DXGI_FORMAT_R8G8B8A8_UNORM_SRGB instead of DXGI_FORMAT_R8G8B8A8_UNORM for instance.

If you are going to be doing a blur or generating mip maps, one solution is that you convert from sRGB u8 to linear f32, do your operation, and then convert from linear f32 back to sRGB u8 and write out the results. In other words, you do your linear operations with floating point numbers so that you never have the precision loss from converting linear values to u8.

You can also do your operations in u16 instead of u8 apparently, and also f16 which is a half float.

The takeaway is that you should “never” (there are always exceptions) store linear color data as uint8 – whether in memory, on disk, or anywhere else.

I’ve heard that u12 is enough for storage though, for what that’s worth.

Links Etc

Thanks @romainguy for suggesting a color interpolation for the opening image of this post. It’s a great, simple example for seeing why sRGB vs linear operations matter.

Here is some more info on sRGB and related things from Bart Wronski (@BartWronsk):

Part 1 – https://bartwronski.com/2016/08/29/localized-tonemapping/

Part 2 – https://bartwronski.com/2016/09/01/dynamic-range-and-evs/

And this great presentation from Timothy Lottes (@TimothyLottes)

Advanced Techniques and Optimization of HDR Color Pipelines

This from Matt Pettineo (@MyNameIsMJP) is also very much on topic for this post:

https://www.gamedev.net/forums/topic/692667-tone-mapping/?page=3&tab=comments#comment-5360306

What the Heck is Blue Noise?

This is a gentle explanation of blue noise and how it can be useful.

We’ll start with something simple that we can all get behind – not getting eaten by a cheetah!

Let’s talk about our eyes for a minute.

Our eyes have about 126 million photo receptors in them – about 6 million cones, 120 million rods (source). These photo receptors give your brain an image of the world around you. They are a bit like pixels because they are just small points of data that your brain combines into an image.

How those photo receptors are arranged in your eye can make a big difference. Imagine for a second that we only had 10 photo receptors. If they were laid out like these blue dots, we wouldn’t be able to see the cheetah and we’d become a tasty cat snack.

In the image above, white noise random numbers were used to place the points. White noise is what most people are talking about when they talk about random numbers. Using white noise to generate numbers, the numbers can clump up in some spots and leave empty holes in other spots. When using white noise to lay out photo receptors, that makes it so some photo receptors give redundant information when they are too close together, and leave big open spaces in your vision where you are not getting any information at all. Not good!

What if the dots were laid out like this instead?

The points are still randomly placed, but they are roughly evenly spaced. This makes it so we get the most bang for our buck from the photo receptors. We basically have the maximum amount of information we can get for the number of photo receptors we have to work with.

In this case, two of the photo receptors are on the cat, so we have some information about that predator, and we have a better chance at reacting before we become lunch!

Blue noise random numbers were used to place the points on this image, and this example shows exactly why blue noise can be better than white noise – you get maximal information with fewer samples.

Interestingly, our photo receptors (as well as other animals) are in fact laid out this way. Here is an image of a primate (macaque) retina (source)

You might also find this an interesting read about chicken eyes which also have blue noise properties:
https://www.princeton.edu/news/2014/02/24/eye-chicken-new-state-matter-comes-view

That’s blue noise in a nut shell, but continue on if you’d like to go just a tad bit deeper.

A Little More Technical

If maximizing information is the goal, you might wonder why blue noise is better than putting the sample points in a grid, or in a honeycomb structure or some other regular pattern. The short answer is that regular patterns have a problem called “aliasing”. Random numbers in general trade the problem of aliasing for the problem of noise, but blue noise random numbers in particular still get the benefits of “roughly even coverage”, so blue noise is the best of both worlds.

Blue noise is difficult / computationally intensive to generate though, compared to white noise or regular sampling. Generating better blue noise more efficient is in fact is an ongoing area of research!

For a deeper comparison of white noise, blue noise, and regular sampling, and also how to generate blue noise sample points, give this a read: https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/

If you want at least some of the benefits of blue noise, but don’t want to spend the resources to compute it, a nice alternative might be low discrepancy sequences. You can read about them (and how to generate them) here: https://blog.demofox.org/2017/05/29/when-random-numbers-are-too-random-low-discrepancy-sequences/

You often hear about blue noise and low discrepancy sequences in graphics / in numerical integration. For low sample counts, the blue noise / LDS’s give you more even spaces for your samples in the sampling domain, but I’ve heard that white noise gives you better results for larger sample counts.

There is a whole rainbow of noises possible, each with their own unique usage cases. If you want to know a way to transmute white noise to other colors of noises, give this a read: https://blog.demofox.org/2017/10/25/transmuting-white-noise-to-blue-red-green-purple/

Lastly, the other day I found out that Tempurpedic beds are the best, because they have some secret formula/process they bought from NASA. This recipe allows them to make memory foam such that the bubbles are all roughly the same size. The foam is not arranged into any regular structure such as a grid or a honeycomb, so in essence, the memory foam is blue noise. More specifically, it’s basically the Voronoi diagram of blue noise distributed sample points in 3d.

So, Tempurpedic is the best because they have blue noise foam.

Weird, right?!

Demystifying Floating Point Precision

Floating point numbers have limited precision. If you are a game programmer, you have likely encountered bugs where things start breaking after too much time has elapsed, or after something has moved too far from the origin.

This post aims to show you how to answer the questions:

  1. What precision do I have at a number?
  2. When will I hit precision issues?

First, a very quick look at the floating point format.

Floating Point Format

Floating point numbers (Wikipedia: IEEE 754) have three components:

  1. Sign bit – whether the number is positive or negative
  2. Exponent bits – the magnitude of the number
  3. Mantissa bits – the fractional bits

32 bit floats use 1 bit for sign, 8 bits for exponent and 23 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 127 to get the actual exponent, meaning the exponent can be from -126 to +127.

64 bit doubles use 1 bit for sign, 11 bits for exponent and 52 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 1023 to get the actual exponent, meaning the exponent can be from -1022 to +1023.

16 bit half floats use 1 bit for sign, 5 bits for exponent and 10 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 15 to get the actual exponent, meaning the exponent can be from -14 to +15.

For all of the above, an exponent of all zeros has the special meaning “exponent 0” (and this is where the denormals / subnormals come into play) and all ones has the special meaning “infinity”

The exponent bits tell you which power of two numbers you are between – [2^{exponent}, 2^{exponent+1}) – and the mantissa tells you where you are in that range.

What precision do I have at a number?

Let’s look at the number 3.5.

To figure out the precision we have at that number, we figure out what power of two range it’s between and then subdivide that range using the mantissa bits.

3.5 is between 2 and 4. That means we are diving the range of numbers 2 to 4 using the mantissa bits. A float has 23 bits of mantissa, so the precision we have at 3.5 is:

\frac{4-2}{2^{23}} = \frac{2}{8388608} \approx 0.000000238418579

3.5 itself is actually exactly representable by a float, double or half, but the amount of precision numbers have at that scale is that value. The smallest number you can add or subtract to a value between 2 and 4 is that value. That is the resolution of the values you are working with when working between 2 and 4 using a float.

Using a double instead of a float gives us 52 bits of mantissa, making the precision:

\frac{4-2}{2^{52}} = \frac{2}{4503599627370496} \approx 0.00000000000000044408921

Using a half float with 10 bits of mantissa it becomes:

\frac{4-2}{2^{10}} = \frac{2}{1024} = 0.001953125

Here’s a table showing the amount of precision you get with each data type at various exponent values. N/A is used when an exponent is out of range for the specific data type.

\begin{array}{c|c|c|c|c} exponent & range & half & float & double \\ \hline 0 & [1,2) & 0.0009765625 & 0.00000011920929 & 0.0000000000000002220446 \\ 1 & [2,4) & 0.001953125 & 0.000000238418579 & 0.00000000000000044408921 \\ 2 & [4,8) & 0.00390625 & 0.000000476837158 & 0.00000000000000088817842 \\ 9 & [512, 1024) & 0.5 & 0.00006103515 & 0.00000000000011368684 \\ 10 & [1024,2048) & 1 & 0.00012207031 & 0.00000000000022737368 \\ 11 & [2048,4096) & 2 & 0.00024414062 & 0.00000000000045474735 \\ 12 & [4096,8192) & 4 & 0.00048828125 & 0.0000000000009094947 \\ 15 & [32768, 65536) & 32 & 0.00390625 & 0.0000000000072759576 \\ 16 & [65536, 131072) & N/A & 0.0078125 & 0.0000000000014551915 \\ 17 & [131072, 262144) & N/A & 0.015625 & 0.00000000002910383 \\ 18 & [262144, 524288) & N/A & 0.03125 & 0.000000000058207661 \\ 19 & [524288, 1048576) & N/A & 0.0625 & 0.00000000011641532 \\ 23 & [8388608,16777216) & N/A & 1 & 0.00000000186264515 \\ 52 & [4503599627370496, 9007199254740992) & N/A & 536870912 & 1 \\ \end{array}

A quick note on the maximum number you can store in floating point numbers, by looking at the half float specifically:

A half float has a maximum exponent of 15, which you can see above puts the number range between 32768 and 65536. The precision is 32 which is the smallest step that can be made in a half float at that scale. That range includes the smaller number but not the larger number. That means that the largest number a half float can store is one step away (32) from the high side of that range. So, the largest number that can be stored is 65536 – 32 = 65504.

How Many Digits Can I Rely On?

Another helpful way of looking at floating point precision is how many digits of precision you can rely on.

A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.

That means that from 0 to 1, you have quite a few decimal places to work with. If you go into the hundreds or thousands, you’ve lost a few. When you get up into the tens of millions, you’ve run out of digits for anything beyond the decimal place.

You can actually see that this is true in the table in the last section. With floating point numbers, it’s at exponent 23 (8,388,608 to 16,777,216) that the precision is at 1. The smallest value that you can add to a floating point value in that range is in fact 1. It’s at this point that you have lost all precision to the right of the decimal place. Interestingly, you still have perfect precision of the integers though.

Half floats have 10 mantissa bits and 2^10 = 1024, so they just barely have 3 digits of precision.

Doubles have 52 mantissa bits and 2^52 = 4,503,599,627,370,496. That means doubles have between 15 and 16 digits of precision.

This can help you understand how precision will break down for you when using a specific data type for a specific magnitude of numbers.

When will I hit precision issues?

Besides the loose rules above about how many digits of precision you can count on, you can also solve to see when precision will break down for you.

Let’s say that you are tracking how long your game has been running (in seconds), and you do so by adding your frame delta (in seconds) to a variable every frame.

If you have a 30fps game, your frame delta is going to be 0.0333.

Adding that each frame to a float will eventually cause the float to reach a value where that number is smaller than the smallest difference representable (smaller than the precision), at which point things will start breaking. At first your accuracy will drop and your time will be wrong, but eventually adding your frame delta to the current time won’t even change the value of the current time. Time will effectively stop!

When will this happen though?

We’ll start with the formula we saw earlier and do one step of simple algebra to get us an equation which can give us this answer.

\frac{range}{mantissa} = precision \\ \\ range = mantissa * precision

How we use this formula is we put the precision we want into “precision” and we put the size of the mantissa (2^{MantissaBits}) into “mantissa”. The result tells us the range that we’ll get the precision at.

Let’s plug in our numbers:

range = 8388608 * 0.0333 = 279340.6464

This tells us the range of the floating point numbers where we’ll have our problems, but this isn’t the value that we’ll have our problems at, so we have another step to do. We need to find what exponent has this range.

Looking at the table earlier in the post you might notice that the range at an exponent also happens to be just 2^{exponent}.

That’s helpful because that just means we take log2 of the answer we got:

log2(279340.6464) = 18.0916659875

Looking at the table again, we can see that floating point numbers have a precision of 0.03125 at exponent value 18. So, exponent 18 is close, but it’s precision is smaller than what we want – aka the precision is still ok.

That means we need to ceil() the number we got from the log2.

Doing that, we see that things break down at exponent 19, which has precision of 0.0625. This actual value it has this problem at is 528,288 (which is 2^{19}).

So, our final formula for “where does precision become this value?” becomes:

value = pow(2, ceil(log2(mantissa * precision)))

Note that at exponent 18, there is still imprecision happening. When adding 1/30 to 264144, It goes from 264144 to 264144.031 to 264144.063, instead of 264144, 264144.033, 264144.066. There is error, but it’s fairly small.

At exponent 19 though, things fall apart a lot more noticeably. When adding 1/30 to 528288, it goes from 528288 to 528288.063 to 528288.125. Time is actually moving almost twice as fast in this case!

At exponent 20, we start at 1056576.00 and adding 1/30 doesn’t even change the value. Time is now stopped.

It does take 6.1 days (528,288 seconds) to reach exponent 19 though, so that’s quite a long time.

If we use half floats, it falls apart at value 64. That’s right, it only takes 64 seconds for this to fall apart when using 16 bit half floats, compared to 6.1 days when using 32 bit floats!

With doubles, it falls apart at value 281,474,976,710,656. That is 8,925,512 years!

Let’s check out that equation again:

value = pow(2, ceil(log2(mantissa * precision)))

A possibly more programmer friendly way to do the above would be to calculate mantissa * precision and then round up to the next power of 2. That’s exactly what the formula is doing above, but in math terms, not programming terms.

Storing Integers

I recently learned that floating point numbers can store integers surprisingly well. It blows my mind that I never knew this. Maybe you are in the same boat 😛

Here’s the setup:

  1. For any exponent, the range of numbers it represents is a power of 2.
  2. The mantissa will always divide that range into a power of 2 different values.

It might take some time and/or brain power to soak that up (it did for me!) but what that ends up ultimately meaning is that floating point numbers can exactly represent a large number of integers.

In fact, a floating point number can EXACTLY store all integers from -2^{MantissaBits+1} to +2^{MantissaBits+1}.

For half floats that means you can store all integers between (and including) -2048 to +2048. (\pm 2^{11})

For floats, it’s -16,777,216 to +16,777,216. (\pm 2^{24})

For doubles it’s -9,007,199,254,740,992 to +9,007,199,254,740,992. (\pm 2^{53})

Doubles can in fact exactly represent any 32 bit unsigned integer, since 2^32 = 4,294,967,296.

Links

Here are some links you might find interesting!

Floating point visually explained:
http://fabiensanglard.net/floating_point_visually_explained/

What Every Computer Scientist Should Know About Floating-Point Arithmetic:
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

A matter of precision:
http://tomforsyth1000.github.io/blog.wiki.html#[[A%20matter%20of%20precision]]

Denormal numbers – aka very small numbers that make computations slow when you use them:
https://en.m.wikipedia.org/wiki/Denormal_number

Catastrophic Cancellation – a problem you can run into when doing floating point math:
https://en.wikipedia.org/wiki/Loss_of_significance

A handy web page that lets you play with the binary representation of a float and what number it comes out as:
https://www.h-schmidt.net/FloatConverter/IEEE754.html

Half precision floating point format:
https://en.wikipedia.org/wiki/Half-precision_floating-point_format

What is the first integer that a float is incapable of representing?
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

Ready to go deeper? Bruce Dawson has some amazing write ups on deeper floating point issues:
https://randomascii.wordpress.com/category/floating-point/

This talks about how to use floating point precision limits as an activation function in a neural network (?!)
https://blog.openai.com/nonlinear-computation-in-linear-networks/

Animating Noise For Integration Over Time 2: Uniform Over Time

After I put out the last post, Mikkel Gjoel (@pixelmager), made an interesting observation that you can see summarized in his image below. (tweet / thread here)

BTW Mikkel has an amazing presentation about rendering the beautiful game “Inside” that you should check out. Lots of interesting techniques used, including some enlightening uses of noise.
YouTube –
Low Complexity, High Fidelity: The Rendering of INSIDE

The images left to right are:

  • One frame of white noise
  • N frames of white noise averaged.
  • N frames averaged where the first frame is white noise, and a per frame random number is added to all pixels every frame.
  • N frames averaged where the first frame is white noise, and 1/N is added to all pixels every frame.
  • N frames averaged where the first frame is white noise, and the golden ratio is added to all pixels every frame.

In the above, the smoother and closer to middle grey that an image is, the better it is – that means it converged to the true result of the integral better.

Surprisingly it looks like adding 1/N outperforms the golden ratio, which means that regular spaced samples are outperforming a low discrepancy sequence!

To compare apples to apples, we’ll do the “golden ratio” tests we did last post, but instead do them with adding this uniform value instead.

To be explicit, there are 8 frames and they are:

  • Frame 0: The noise
  • Frame 1: The noise + 1/8
  • Frame 2: The noise + 2/8
  • Frame 7: the noise + 7/8

Modulus is used to keep the values between 0 and 1.

Below is how white noise looks animated with golden ratio (top) vs uniform values (bottom). There are 8 frames and it’s played at 8fps so it loops every second.

Interleaved Gradient Noise. Top is golden ratio, bottom is uniform.

Blue Noise. Top is golden ratio, bottom is uniform.

The uniform ones look pretty similar. Maybe a little smoother, but it’s hard to tell by just looking at it. Interestingly, the frequency content of the blue noise seems more stable using these uniform values instead of golden ratio.

The histogram data of the noise was the same for all frames of animation, just like in last post, which is a good thing. The important bit is that adding a uniform value doesn’t modify the histogram shape, other than changing which counts go to which specific buckets. Ideally the histogram would start out perfectly even like the blue noise does, but since this post is about the “adding uniform values” process, and not about the starting noise, this shows that the process does the right thing with the histogram.

  • White Noise – min 213, max 306, average 256, std dev 16.51
  • Interleaved Gradient Noise – min 245, max 266, average 256, std dev 2.87
  • Blue Noise – min, max, average are 256, std dev 0.

Let’s look at the integrated animations.

White noise. Top is golden ratio, bottom is uniform.

Interleaved gradient noise. Top is golden ratio, bottom is uniform.

Blue noise. Top is golden ratio, bottom is uniform.

The differences between these animations are subtle unless you know what you are looking for specifically so let’s check out the final frames and the error graphs.

Each noise comparison below has three images. The first image is the “naive” way to animate the noise. The second uses golden ratio instead. The third one uses 1/N. The first two images (and techniques) are from (and explained in) the last post, and the third image is the technique from this post.

White noise. Naive (top), golden ratio (mid), uniform (bottom).


Interleaved gradient noise. Naive (top), golden ratio (mid), uniform (bottom).


Blue noise. Naive (top), golden ratio (mid), uniform (bottom).


So, what’s interesting is that the uniform sampling over time has lower error and standard deviation (variance) than golden ratio, which has less than the naive method. However, it’s only at the end that the uniform sampling over time has the best results, and it’s actually quite terrible until then.

The reason for this is that uniform has good coverage over the sample space, but it takes until the last frame to get that good coverage because each frame takes a small step over the remaining sample space.

What might work out better would be if our first frame was the normal noise, but then the second frame was the normal noise plus a half, so we get the most information we possibly can from that sample by splitting the sample space in half. We would then want to cut the two halves of the space space in half, and so the next two frames would be the noise plus 1/4 and the noise plus 3/4. We would then continue with 1/8, 5/8, 3/8 and 7/8 (note we didn’t do these 1/8 steps in order. We did it in the order that gives us the most information the most quickly!). At the end of all this, we would have our 8 uniformly spaced samples over time, but we would have taken the samples in an order that makes our intermediate frames look better hopefully.

Now, interestingly, that number sequence I just described has a name. It’s the base 2 Van Der Corput sequence, which is a type of low discrepancy sequence. It’s also the 1D version of the Halton sequence, and is related to other sequences as well. More info here: When Random Numbers Are Too Random: Low Discrepancy Sequences

Mikkel mentioned he thought this would be helpful, and I was thinking the same thing too. Let’s see how it does!

White noise. Uniform (top), Van Der Corput (bottom).

Interleaved gradient noise. Uniform (top), Van Der Corput (bottom).

Blue noise. Uniform (top), Van Der Corput (bottom).

The final frames look the same as before (and the same as each other), so I won’t show those again but here are the updated graphs.



Interestingly, using the Van Der Corput sequence has put intermediate frames more in line with golden ratio, while of course still being superior at the final frame.

I’ve been trying to understand why uniform sampling over time out performs the golden ratio which acts more like blue noise over time. I still don’t grasp why it works as well as it does, but the proof is in the pudding.

Theoretically, this uniform sampling over time should lead to the possibility of aliasing on the time axis, since blue noise / white noise (and other randomness) get rid of the aliasing in exchange for noise.

Noise over the time dimension would mean missing details that were smaller than the sample spacing size. in our case, we are using the time sampled values (noise + uniform value) to threshold a source image to make a sample. It may be that since we are thresholding, that aliasing isn’t possible since our sample represents everything below or equal to the value?

I’m not really sure, but will be thinking about it for a while. If you have any insights please let me know!

It would be interesting to try an actual 1d blue noise sequence and see how it compares. If it does better, it sounds like it would be worth while to try jittering the uniform sampled values on the time axis to try and approximate blue noise a bit. Mikkel tried the jittering and said it gave significantly worse results, so that seems like a no go.

Lastly, some other logical experiments from here seem to be…

  • See how other forms of noise and ordered dithers do, including perhaps a Bayer Matrix. IG noise seems to naturally do better on the time axis for some reason I don’t fully understand yet. There may be some interesting properties of other noise waiting to be found.
  • Do we get any benefits in this context by arranging the interleaved gradient noise in a spiral like Jorge mentions in his presentation? (Next Generation Post Processing In Call Of Duty: Advanced Warfare
  • It would be interesting to see how this works in a more open ended case – such as if you had temporal AA which was averaging a variable number of pixels each frame. Would doing a van Der Corput sequence give good results there? Would you keep track of sample counts per pixel and keep marching the Van Der Corput forward for each pixel individually? Or would you just pick something like an 8 Van Der Corput sequence, adding the current sequence to all pixels and looping that sequence every 8 frames? It really would be interesting to see what is best in that sort of a setup.

I’m sure there are all sorts of other things to try to. This is a deep, interesting and important topic for graphics and beyond (:

Code

Source code below, but it’s also available on github, along with the source images used: Github:
Atrix256/RandomCode/AnimatedNoise

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <atomic>
#include <thread>
#include <complex>
#include <array>

typedef uint8_t uint8;

const float c_pi = 3.14159265359f;

// settings
const bool c_doDFT = true;

// globals 
FILE* g_logFile = nullptr;

//======================================================================================
inline float Lerp (float A, float B, float t)
{
    return A * (1.0f - t) + B * t;
}

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (const SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
void ImageConvertToLuma (SImageData& image)
{
    ImageForEachPixel(
        image,
        [] (SColor& pixel, size_t pixelIndex)
        {
            float luma = float(pixel.R) * 0.3f + float(pixel.G) * 0.59f + float(pixel.B) * 0.11f;
            uint8 lumau8 = uint8(luma + 0.5f);
            pixel.R = lumau8;
            pixel.G = lumau8;
            pixel.B = lumau8;
        }
    );
}

//======================================================================================
void ImageCombine2 (const SImageData& imageA, const SImageData& imageB, SImageData& result)
{
    // put the images side by side. A on left, B on right
    ImageInit(result, imageA.m_width + imageB.m_width, max(imageA.m_height, imageB.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B on right
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
void ImageCombine3 (const SImageData& imageA, const SImageData& imageB, const SImageData& imageC, SImageData& result)
{
    // put the images side by side. A on left, B in middle, C on right
    ImageInit(result, imageA.m_width + imageB.m_width + imageC.m_width, max(max(imageA.m_height, imageB.m_height), imageC.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B in middle
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image C on right
    for (size_t y = 0; y < imageC.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3 + imageC.m_width * 3];
        SColor* srcPixel = (SColor*)&imageC.m_pixels[y * imageC.m_pitch];
        for (size_t x = 0; x < imageC.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
float GoldenRatioMultiple (size_t multiple)
{
    return float(multiple) * (1.0f + std::sqrtf(5.0f)) / 2.0f;
}

//======================================================================================
void IntegrationTest (const SImageData& dither, const SImageData& groundTruth, size_t frameIndex, const char* label)
{
    // calculate min, max, total and average error
    size_t minError = 0;
    size_t maxError = 0;
    size_t totalError = 0;
    size_t pixelCount = 0;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            totalError += error;

            if ((x == 0 && y == 0) || error < minError)
                minError = error;

            if ((x == 0 && y == 0) || error > maxError)
                maxError = error;

            ++ditherPixel;
            ++truthPixel;
            ++pixelCount;
        }
    }
    float averageError = float(totalError) / float(pixelCount);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            float diff = float(error) - averageError;

            sumSquaredDiff += diff*diff;
        }
    }
    float stdDev = std::sqrtf(sumSquaredDiff / float(pixelCount - 1));

    // report results
    fprintf(g_logFile, "%s %zu error\n", label, frameIndex);
    fprintf(g_logFile, "  min error: %zu\n", minError);
    fprintf(g_logFile, "  max error: %zu\n", maxError);
    fprintf(g_logFile, "  avg error: %0.2f\n", averageError);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "\n");
}

//======================================================================================
void HistogramTest (const SImageData& noise, size_t frameIndex, const char* label)
{
    std::array<size_t, 256> counts;
    std::fill(counts.begin(), counts.end(), 0);

    ImageForEachPixel(
        noise,
        [&] (const SColor& pixel, size_t pixelIndex)
        {
            counts[pixel.R]++;
        }
    );

    // calculate min, max, total and average
    size_t minCount = 0;
    size_t maxCount = 0;
    size_t totalCount = 0;
    for (size_t i = 0; i < 256; ++i)
    {
        if (i == 0 || counts[i] < minCount)
            minCount = counts[i];

        if (i == 0 || counts[i] > maxCount)
            maxCount = counts[i];

        totalCount += counts[i];
    }
    float averageCount = float(totalCount) / float(256.0f);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t i = 0; i < 256; ++i)
    {
        float diff = float(counts[i]) - averageCount;
        sumSquaredDiff += diff*diff;
    }
    float stdDev = std::sqrtf(sumSquaredDiff / 255.0f);

    // report results
    fprintf(g_logFile, "%s %zu histogram\n", label, frameIndex);
    fprintf(g_logFile, "  min count: %zu\n", minCount);
    fprintf(g_logFile, "  max count: %zu\n", maxCount);
    fprintf(g_logFile, "  avg count: %0.2f\n", averageCount);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "  counts: ");
    for (size_t i = 0; i < 256; ++i)
    {
        if (i > 0)
            fprintf(g_logFile, ", ");
        fprintf(g_logFile, "%zu", counts[i]);
    }

    fprintf(g_logFile, "\n\n");
}

//======================================================================================
void GenerateWhiteNoise (SImageData& image, size_t width, size_t height)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    ImageForEachPixel(
        image,
        [&] (SColor& pixel, size_t pixelIndex)
        {
            uint8 value = dist(rng);
            pixel.R = value;
            pixel.G = value;
            pixel.B = value;
        }
    );
}

//======================================================================================
void GenerateInterleavedGradientNoise (SImageData& image, size_t width, size_t height, float offsetX, float offsetY)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    for (size_t y = 0; y < height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < width; ++x)
        {
            float valueFloat = std::fmodf(52.9829189f * std::fmod(0.06711056f*float(x + offsetX) + 0.00583715f*float(y + offsetY), 1.0f), 1.0f);
            size_t valueBig = size_t(valueFloat * 256.0f);
            uint8 value = uint8(valueBig % 256);
            pixel->R = value;
            pixel->G = value;
            pixel->B = value;
            ++pixel;
        }
    }
}

//======================================================================================
template <size_t NUM_SAMPLES>
void GenerateVanDerCoruptSequence (std::array<float, NUM_SAMPLES>& samples, size_t base)
{
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = 0.0f;
        float denominator = float(base);
        size_t n = i;
        while (n > 0)
        {
            size_t multiplier = n % base;
            samples[i] += float(multiplier) / denominator;
            n = n / base;
            denominator *= base;
        }
    }
}

//======================================================================================
void DitherWithTexture (const SImageData& ditherImage, const SImageData& noiseImage, SImageData& result)
{
    // init the result image
    ImageInit(result, ditherImage.m_width, ditherImage.m_height);

    // make the result image
    for (size_t y = 0; y < ditherImage.m_height; ++y)
    {
        SColor* srcDitherPixel = (SColor*)&ditherImage.m_pixels[y * ditherImage.m_pitch];
        SColor* destDitherPixel = (SColor*)&result.m_pixels[y * result.m_pitch];

        for (size_t x = 0; x < ditherImage.m_width; ++x)
        {
            // tile the noise in case it isn't the same size as the image we are dithering
            size_t noiseX = x % noiseImage.m_width;
            size_t noiseY = y % noiseImage.m_height;
            SColor* noisePixel = (SColor*)&noiseImage.m_pixels[noiseY * noiseImage.m_pitch + noiseX * 3];

            uint8 value = 0;
            if (noisePixel->R < srcDitherPixel->R)
                value = 255;

            destDitherPixel->R = value;
            destDitherPixel->G = value;
            destDitherPixel->B = value;

            ++srcDitherPixel;
            ++destDitherPixel;
        }
    }
}

//======================================================================================
void DitherWhiteNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_whitenoise.bmp");
}

//======================================================================================
void DitherInterleavedGradientNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_ignoise.bmp");
}

//======================================================================================
void DitherBlueNoise (const SImageData& ditherImage, const SImageData& blueNoise)
{
    printf("\n%s\n", __FUNCTION__);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, blueNoise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, blueNoise, dither, combined);
    ImageSave(combined, "out/still_bluenoise.bmp");
}

//======================================================================================
void DitherWhiteNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&](SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i + 1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedIntegrated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatio (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniform (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniformIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedVDCIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
int main (int argc, char** argv)
{
    // load the dither image and convert it to greyscale (luma)
    SImageData ditherImage;
    if (!ImageLoad("src/ditherimage.bmp", ditherImage))
    {
        printf("Could not load src/ditherimage.bmp");
        return 0;
    }
    ImageConvertToLuma(ditherImage);

    // load the blue noise images.
    SImageData blueNoise[8];
    for (size_t i = 0; i < 8; ++i)
    {
        char buffer[256];
        sprintf(buffer, "src/BN%zu.bmp", i);
        if (!ImageLoad(buffer, blueNoise[i]))
        {
            printf("Could not load %s", buffer);
            return 0;
        }

        // They have different values in R, G, B so make R be the value for all channels
        ImageForEachPixel(
            blueNoise[i],
            [] (SColor& pixel, size_t pixelIndex)
            {
                pixel.G = pixel.R;
                pixel.B = pixel.R;
            }
        );
    }

    g_logFile = fopen("log.txt", "w+t");
    
    // still image dither tests
    DitherWhiteNoise(ditherImage);
    DitherInterleavedGradientNoise(ditherImage);
    DitherBlueNoise(ditherImage, blueNoise[0]);

    // Animated dither tests
    DitherWhiteNoiseAnimated(ditherImage);
    DitherInterleavedGradientNoiseAnimated(ditherImage);
    DitherBlueNoiseAnimated(ditherImage, blueNoise);

    // Golden ratio animated dither tests
    DitherWhiteNoiseAnimatedGoldenRatio(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatio(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatio(ditherImage, blueNoise[0]);

    // Uniform animated dither tests
    DitherWhiteNoiseAnimatedUniform(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniform(ditherImage);
    DitherBlueNoiseAnimatedUniform(ditherImage, blueNoise[0]);

    // Animated dither integration tests
    DitherWhiteNoiseAnimatedIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedIntegrated(ditherImage);
    DitherBlueNoiseAnimatedIntegrated(ditherImage, blueNoise);

    // Golden ratio animated dither integration tests
    DitherWhiteNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatioIntegrated(ditherImage, blueNoise[0]);

    // Uniform animated dither integration tests
    DitherWhiteNoiseAnimatedUniformIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniformIntegrated(ditherImage);
    DitherBlueNoiseAnimatedUniformIntegrated(ditherImage, blueNoise[0]);

    // Van der corput animated dither integration tests
    DitherWhiteNoiseAnimatedVDCIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedVDCIntegrated(ditherImage);
    DitherBlueNoiseAnimatedVDCIntegrated(ditherImage, blueNoise[0]);

    fclose(g_logFile);

    return 0;
}

Animating Noise For Integration Over Time

You can use noise textures (like the ones from the last post) to do dithering.

For instance, you can do the process below to make a 1 bit (black and white) dithered image using a gray scale source image and a gray scale noise texture. This would be useful if you had a 1 bit display that you were trying to display an image on.

  1. For each pixel in the source image…
  2. If the source image pixel is brighter than the noise texture, put a white pixel.
  3. Else put a black pixel.

(info on converting images to grey scale here: Converting RGB to Grayscale)

The quality of the result depends on the type of noise you use.

If you use pure random numbers (white noise) it looks like this:

You could also use something called “Interleaved Gradient Noise” which would look like this:

Or you could use blue noise which would look like this:

As you can see, white noise was the worst looking, interleaved gradient noise is is the middle, and blue noise looked the best.

White noise is very cheap to generate and can be done in real time on either the CPU or GPU – you just use random numbers.

Blue noise is more expensive to generate and usually must be done in advance, but gives high quality results.

Interleaved gradient noise, which gives middle results, is actually very similar in generation costs as white noise believe it or not, and so can also be done in real time on either the CPU or GPU.

If you have X and Y pixel coordinates (not uv coordinates), you can generate the noise value for the pixel by using this formula:

float noise = std::fmodf(52.9829189f * std::fmodf(0.06711056f*float(x) + 0.00583715f*float(y), 1.0f), 1.0f);

Interleaved gradient noise was made by Jorge Jimenez (@iryoku1) and you can read more about it at these links:
Next Generation Post Processing in Call Of Duty: Advanced Warfare
Dithering part three – real world 2D quantization dithering (Bart Wronksi)

Dithering still images is fun, but in the context of video games, we are more interested in animated images, so let’s look at things in motion.

Animated Noise

Let’s start by just animating those three noise types over 8 frames.

For white noise, we’ll generate a new white noise texture every frame.

For interleaved gradient noise, we’ll add a random offset (0 to 1000) to the pixel each frame, so we get 8 different interleaved gradient noise textures.

For blue noise, we’ll just have 8 different blue noise textures that we generate in advance.

These are playing at 8 fps, so loop every second.

White Noise:

IG Noise:

Blue Noise:

Once again we can see that white noise is worst, blue noise is best, and interleaved gradient noise is in the middle.

When you think about it though, despite these animations all using different types of noise over SPACE, they all use white noise over time. What i mean by that is if you isolate any individual pixel in any of the images and look at it over the 8 frames, that single pixel will look like white noise.

Let’s see if we can improve that.

Golden Ratio Animated Noise

In a conversation on twitter, @R4_Unit told me that in the past he had good success by adding the golden ratio to blue noise textures to make the noise more blue over time.

The background here is that repeatedly adding the golden ratio to any number will make a low discrepancy sequence (details: When Random Numbers Are Too Random: Low Discrepancy Sequences)

The golden ratio is \frac{1+\sqrt{5}}{2} or approximately 1.61803398875, and interestingly is THE MOST irrational number that there is. Weird right?

For each of the noise types, we’ll generate a single texture for frame 0, and each subsequent frame we will add the golden ratio to each pixel. The pixel values are in the 0 to 1 space when adding the golden ratio (not 0 to 255) and we use modulus to wrap it around.

The DFT magnitude is shown on the left to show how adding the golden ratio affects frequency components.

White Noise:

IG Noise:

Blue Noise:

When I look at these side by side with the previous animations, it’s hard for me to see much of a difference. That is interesting for the case of blue noise, where it’s difficult to generate multiple blue noise textures. It means that you can get a fairly decent “blue noise” texture by adding multiples of the golden ratio to an existing blue noise texture (aka recycling!).

It’s interesting that the white noise and interleaved gradient noise don’t change their frequency spectrum much over time. On the other hand, it’s a bit sad to see that the blue noise texture gains some low frequency content so the blue noise becomes lower quality. You aren’t just getting more blue noise textures for free by adding the golden ratio, even though they are blue-ish.

Another important aspect to look at is the histogram of colors used of these images when adding golden ratio. The ideal situation is that the starting images have roughly the same number of every color in the image, and that when adding the golden ratio for each frame, that we still use roughly the same number of every color. That turns out to be the case luckily.

The white noise histogram has a minimum count of 213, a maximum count of 303, an average count of 256 (the image is 256×256), and a standard deviation of 15.64. Those values are the same for each frame of the animation.

For interleaved gradient noise, it has a minimum count of 245, a maximum count of 266, an average count of 256 and a standard deviation of 2.87. Those values are the same for the entire animation.

Lastly, for blue noise, it has a minimum, maximum, and average count of 256, and a standard deviation of 0. This also remains true for the entire animation.

Integration Over Time

A big reason we might want animated noise in graphics is because we are taking multiple samples and want to numerically integrate them.

Lets analyze how these two types of animations (regular and golden ratio) compare for integration.

These animations are the same as before, but on frame 1, we show the average of frame 0 and 1. On frame 2 we show the average of frame 0, 1 and 2. And so on to frame 7 which is the average of all 8 frames. This is an integration of our black and white sample points we are taking, where the correct value of the integration is the greyscale image we started with.

Here is white noise, IG noise and blue noise animated (new noise each frame), integrated over those 8 frames, playing at 8 frames a second:



Here is the same using the golden ratio to animate the noise instead:



Since it can be a little difficult to compare these things while they are in motion, here is the final frames of each method and some graphs to show the average error and standard deviation of the error, compared to the ground truth source image.

White Noise vs White Noise Golden Ratio:


IG Noise vs IG Noise Golden Ratio:


Blue Noise vs Blue Noise Golden Ratio:


Interestingly, the golden ratio average error and standard deviation (from the ground truth) are pretty even for all types of noise by frame 7, even though the blue noise is perceptually superior. This also happens for the non golden ratio integrations of blue noise and white noise. That’s part of the value of blue noise, that even if it has the same amount of error as say, white noise, it still looks better.

Another interesting observation is that interleaved gradient noise performs better at integration (at least numerically) than white or blue noise, when not using the golden ratio. The only way I can explain this is that when picking random pixel offsets to generate each frame of interleaved gradient noise, it’s somehow more blue over time than the other two methods. It’s a strange but pretty useful property.

Despite IG having success when looking at the numbers, it has very visible directional patterns which are not so nice. The fact that it is as cheap as white noise to generate, but has results much closer to blue noise perceptually is pretty awesome though.

Something else important to note is that white noise beats blue noise in the long run (higher sample counts). It’s only at these lower sample counts that blue noise is the clear winner.

Lastly, it seems like the ideal setup for integrating some values over time with a lower sample count would be to have N blue noise textures to use over N frames, but *somehow* have a constraint on those textures generated such that each individual pixel over time has blue noise distributed values.

I’m not sure how to generate that, or if it’s even possible to do so, but doing that seems like it would be pretty near the ideal for doing integration like the above.

Taking a guess at how the DFT’s would look, each individual slice seems like it should look like a totally normal blue noise texture where it’s black in the middle (low frequencies) and noisy elsewhere (high frequencies). If you had N slices of these it would look like a black cylinder surrounded by noise when taking the 3D DFT. I’m not sure though how having the constraint on individual pixels would modify the DFT, or if it even would.

This “ideal” I’m describing is different than vanilla 3d blue noise. The 3d DFT of 3d blue noise is a black sphere surrounded by noise. What I’m describing is a cylinder instead of a sphere.

3d blue noise turns out not to be great for these needs. You can read about that here:

The problem with 3D blue noise

That author also has some an interesting post on blue noise, and a zip file full of blue noise textures that you can take and use.

Free Blue Noise Textures

I have some thoughts on generating this blue noise cylinder that if they work out may very well be the next blog post.

Code

Here is the code used to generate the images in this post. It’s also on github, which also contains the source images used.

Atrix256: RandomCode/AnimatedNoise

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <atomic>
#include <thread>
#include <complex>
#include <array>

typedef uint8_t uint8;

const float c_pi = 3.14159265359f;

// settings
const bool c_doDFT = true;

// globals 
FILE* g_logFile = nullptr;

//======================================================================================
inline float Lerp (float A, float B, float t)
{
    return A * (1.0f - t) + B * t;
}

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (const SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
void ImageConvertToLuma (SImageData& image)
{
    ImageForEachPixel(
        image,
        [] (SColor& pixel, size_t pixelIndex)
        {
            float luma = float(pixel.R) * 0.3f + float(pixel.G) * 0.59f + float(pixel.B) * 0.11f;
            uint8 lumau8 = uint8(luma + 0.5f);
            pixel.R = lumau8;
            pixel.G = lumau8;
            pixel.B = lumau8;
        }
    );
}

//======================================================================================
void ImageCombine2 (const SImageData& imageA, const SImageData& imageB, SImageData& result)
{
    // put the images side by side. A on left, B on right
    ImageInit(result, imageA.m_width + imageB.m_width, max(imageA.m_height, imageB.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B on right
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
void ImageCombine3 (const SImageData& imageA, const SImageData& imageB, const SImageData& imageC, SImageData& result)
{
    // put the images side by side. A on left, B in middle, C on right
    ImageInit(result, imageA.m_width + imageB.m_width + imageC.m_width, max(max(imageA.m_height, imageB.m_height), imageC.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B in middle
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image C on right
    for (size_t y = 0; y < imageC.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3 + imageC.m_width * 3];
        SColor* srcPixel = (SColor*)&imageC.m_pixels[y * imageC.m_pitch];
        for (size_t x = 0; x < imageC.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
float GoldenRatioMultiple (size_t multiple)
{
    return float(multiple) * (1.0f + std::sqrtf(5.0f)) / 2.0f;
}

//======================================================================================
void IntegrationTest (const SImageData& dither, const SImageData& groundTruth, size_t frameIndex, const char* label)
{
    // calculate min, max, total and average error
    size_t minError = 0;
    size_t maxError = 0;
    size_t totalError = 0;
    size_t pixelCount = 0;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            totalError += error;

            if ((x == 0 && y == 0) || error < minError)
                minError = error;

            if ((x == 0 && y == 0) || error > maxError)
                maxError = error;

            ++ditherPixel;
            ++truthPixel;
            ++pixelCount;
        }
    }
    float averageError = float(totalError) / float(pixelCount);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            float diff = float(error) - averageError;

            sumSquaredDiff += diff*diff;
        }
    }
    float stdDev = std::sqrtf(sumSquaredDiff / float(pixelCount - 1));

    // report results
    fprintf(g_logFile, "%s %zu error\n", label, frameIndex);
    fprintf(g_logFile, "  min error: %zu\n", minError);
    fprintf(g_logFile, "  max error: %zu\n", maxError);
    fprintf(g_logFile, "  avg error: %0.2f\n", averageError);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "\n");
}

//======================================================================================
void HistogramTest (const SImageData& noise, size_t frameIndex, const char* label)
{
    std::array<size_t, 256> counts;
    std::fill(counts.begin(), counts.end(), 0);

    ImageForEachPixel(
        noise,
        [&] (const SColor& pixel, size_t pixelIndex)
        {
            counts[pixel.R]++;
        }
    );

    // calculate min, max, total and average
    size_t minCount = 0;
    size_t maxCount = 0;
    size_t totalCount = 0;
    for (size_t i = 0; i < 256; ++i)
    {
        if (i == 0 || counts[i] < minCount)
            minCount = counts[i];

        if (i == 0 || counts[i] > maxCount)
            maxCount = counts[i];

        totalCount += counts[i];
    }
    float averageCount = float(totalCount) / float(256.0f);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t i = 0; i < 256; ++i)
    {
        float diff = float(counts[i]) - averageCount;
        sumSquaredDiff += diff*diff;
    }
    float stdDev = std::sqrtf(sumSquaredDiff / 255.0f);

    // report results
    fprintf(g_logFile, "%s %zu histogram\n", label, frameIndex);
    fprintf(g_logFile, "  min count: %zu\n", minCount);
    fprintf(g_logFile, "  max count: %zu\n", maxCount);
    fprintf(g_logFile, "  avg count: %0.2f\n", averageCount);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "  counts: ");
    for (size_t i = 0; i < 256; ++i)
    {
        if (i > 0)
            fprintf(g_logFile, ", ");
        fprintf(g_logFile, "%zu", counts[i]);
    }

    fprintf(g_logFile, "\n\n");
}

//======================================================================================
void GenerateWhiteNoise (SImageData& image, size_t width, size_t height)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    ImageForEachPixel(
        image,
        [&] (SColor& pixel, size_t pixelIndex)
        {
            uint8 value = dist(rng);
            pixel.R = value;
            pixel.G = value;
            pixel.B = value;
        }
    );
}

//======================================================================================
void GenerateInterleavedGradientNoise (SImageData& image, size_t width, size_t height, float offsetX, float offsetY)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    for (size_t y = 0; y < height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < width; ++x)
        {
            float valueFloat = std::fmodf(52.9829189f * std::fmod(0.06711056f*float(x + offsetX) + 0.00583715f*float(y + offsetY), 1.0f), 1.0f);
            size_t valueBig = size_t(valueFloat * 256.0f);
            uint8 value = uint8(valueBig % 256);
            pixel->R = value;
            pixel->G = value;
            pixel->B = value;
            ++pixel;
        }
    }
}

//======================================================================================
void DitherWithTexture (const SImageData& ditherImage, const SImageData& noiseImage, SImageData& result)
{
    // init the result image
    ImageInit(result, ditherImage.m_width, ditherImage.m_height);

    // make the result image
    for (size_t y = 0; y < ditherImage.m_height; ++y)
    {
        SColor* srcDitherPixel = (SColor*)&ditherImage.m_pixels[y * ditherImage.m_pitch];
        SColor* destDitherPixel = (SColor*)&result.m_pixels[y * result.m_pitch];

        for (size_t x = 0; x < ditherImage.m_width; ++x)
        {
            // tile the noise in case it isn't the same size as the image we are dithering
            size_t noiseX = x % noiseImage.m_width;
            size_t noiseY = y % noiseImage.m_height;
            SColor* noisePixel = (SColor*)&noiseImage.m_pixels[noiseY * noiseImage.m_pitch + noiseX * 3];

            uint8 value = 0;
            if (noisePixel->R < srcDitherPixel->R)
                value = 255;

            destDitherPixel->R = value;
            destDitherPixel->G = value;
            destDitherPixel->B = value;

            ++srcDitherPixel;
            ++destDitherPixel;
        }
    }
}

//======================================================================================
void DitherWhiteNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_whitenoise.bmp");
}

//======================================================================================
void DitherInterleavedGradientNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_ignoise.bmp");
}

//======================================================================================
void DitherBlueNoise (const SImageData& ditherImage, const SImageData& blueNoise)
{
    printf("\n%s\n", __FUNCTION__);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, blueNoise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, blueNoise, dither, combined);
    ImageSave(combined, "out/still_bluenoise.bmp");
}

//======================================================================================
void DitherWhiteNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&](SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i + 1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedIntegrated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatio (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
int main (int argc, char** argv)
{
    // load the dither image and convert it to greyscale (luma)
    SImageData ditherImage;
    if (!ImageLoad("src/ditherimage.bmp", ditherImage))
    {
        printf("Could not load src/ditherimage.bmp");
        return 0;
    }
    ImageConvertToLuma(ditherImage);

    // load the blue noise images.
    SImageData blueNoise[8];
    for (size_t i = 0; i < 8; ++i)
    {
        char buffer[256];
        sprintf(buffer, "src/BN%zu.bmp", i);
        if (!ImageLoad(buffer, blueNoise[i]))
        {
            printf("Could not load %s", buffer);
            return 0;
        }

        // They have different values in R, G, B so make R be the value for all channels
        ImageForEachPixel(
            blueNoise[i],
            [] (SColor& pixel, size_t pixelIndex)
            {
                pixel.G = pixel.R;
                pixel.B = pixel.R;
            }
        );
    }

    g_logFile = fopen("log.txt", "w+t");
    
    // still image dither tests
    DitherWhiteNoise(ditherImage);
    DitherInterleavedGradientNoise(ditherImage);
    DitherBlueNoise(ditherImage, blueNoise[0]);

    // Animated dither tests
    DitherWhiteNoiseAnimated(ditherImage);
    DitherInterleavedGradientNoiseAnimated(ditherImage);
    DitherBlueNoiseAnimated(ditherImage, blueNoise);

    // Golden ratio animated dither tests
    DitherWhiteNoiseAnimatedGoldenRatio(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatio(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatio(ditherImage, blueNoise[0]);

    // Animated dither integration tests
    DitherWhiteNoiseAnimatedIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedIntegrated(ditherImage);
    DitherBlueNoiseAnimatedIntegrated(ditherImage, blueNoise);

    // Golden ratio animated dither integration tests
    DitherWhiteNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatioIntegrated(ditherImage, blueNoise[0]);

    fclose(g_logFile);

    return 0;
}

Calculating the Distance Between Points in “Wrap Around” (Toroidal) Space

Let’s say you are trying to find the distance between two points in 2D, but that these points are in a universe that “wraps around” like old video games – leaving the screen on the right, left, top or bottom side makes you re-appear on the opposite edge.

This universe is actually shaped like a toroid, also known as a doughnut. It’s actually an impossible object, a “flat torus”, so not exactly a doughnut, but whatever.

If you imagine yourself on the surface of a doughnut, it would behave exactly this way. If you go “down” you end up where you previously considered “up”. If you go far enough “left” you end up where you previously considered “right”.

How would you calculate the distance between two points in a universe like this?

Let’s imagine the situation below where we are trying to find the distance between the red point and the green point:

One way to do this would be to pick one of the points (I’m picking red in this case) and clone it 8 times to surround the cell like the below. You’d calculate the distance from the green point to each of the 9 red points, and whatever distance was smallest would be the answer.

Something not so desirable about this is that it takes 9 distance calculations to find the minimum distance. You can work with squared distances instead of regular distances to avoid a square root on each of these distance calculations, but that’s still a bit of calculation to do.

Going up in dimensions makes the problem even worse. In 3D, it requires 27 distance calculations to find the shortest point, and 81 distance calculations in 4D!

Luckily there’s a better way to approach this.

Let’s say that our universe (image) is 1 unit by 1 unit big (aka we are working in texture UVs). If you look at the image with 9 copies of the red dot, you can see that they are just the 9 possible combinations of having -1, +0, +1 on each axis added to the red dot’s coordinates. All possible combinations of the x and y axis having -1, +0 or +1 added to them are valid locations of the red dot.

Looking at the distance formula we can see that if we minimize each axis individually, that we will also end up with the minimal distance overall.

d = \sqrt{(x_2-x_1)^2+(y_2-y_1)^2}

So, the better way is to minimize each axis individually.

On the x axis you’d find if the x axis distance between the red and green point is minimal when you subtract 1 from the red dot’s x axis position, leave it alone, or add 1.

Whichever x axis value of the red dot gives you the minimal x axis 1D distance is the x axis location to use.

You’d repeat for the y axis to get the y axis location to use (and would repeat for any further axes for higher dimensions).

This gives you the closest point which you can then plug into the distance formula to get the distance between the points in this wrap around space.

You can actually do better though.

Still working on each axis individually, you can calculate the absoluate value of the 1D distance between the two points on that axis. If that distance is greater than 0.5, the real distance for that axis is 1-distance.

The intuition here is that if you are in a 1d repeating space, if going from A to B is more than half the distance, it means that you went the wrong way, and that going the other way is shorter. The distance of that other way is one minus whatever distance you just calculated since the distance from one point to itself is 1!

Do that for each axis and use those 1d distances in the distance formula to get the actual distance.

This lets you minimize the distance without having to explicitly figure out which combination makes the point closest.

More importantly, it lets you efficiently calculate the distance between the two points in toroidal space (doughnut space!)

The computational complexity is a lot better. It’s now linear in the number of dimensions: O(N), instead of O(3^N).

Here is some C++ to show you how it would work in 2D.

float ToroidalDistance (float x1, float y1, float x2, float y2)
{
    float dx = std::abs(x2 - x1);
    float dy = std::abs(y2 - y1);

    if (dx > 0.5f)
        dx = 1.0f - dx;

    if (dy > 0.5f)
        dy = 1.0f - dy;

    return std::sqrt(dx*dx + dy*dy);
}

I hit this problem trying to make a tileable texture. I needed to place a few circles on a texture such that the circles weren’t too close to each other, even when the texture was tiled.

The calculations above gave me the basic tool needed to be able to calculate distances between points. Subtracting circle radii from the distance between points let me get toroidal distance between circles and make sure I didn’t place them too closely to each other.

That let me make an image that kept the distance constraints even when it was tiled.

Here’s an example image by itself:

Here is the image tiled:

Half Tile Offset Streaming World Grids

A number of years ago I worked on an open world game that got cancelled. Despite it not being released, I learned a few things.

If you try to make a game where the player can walk around a large world without loading screens, chances are that the whole world won’t fit in memory at once.

Since it can’t all fit in memory at once, as the player moves around you are going to have to unload old chunks of the world to make room for the new chunks that the player is about to enter.

To simplify the problem, it’s really common to break the world up into a grid, and keep a radius of tiles loaded around the player at any one time.

In the above you can see that 9 tiles are kept in memory. If the player crosses the boundary to the cell to the left, we unload the cells on the right (red cells) and load new cells in on the left (green cells).

The idea is that we keep a border of cells loaded around the player at all times so that they never see the edge of the world, but we don’t have to keep the whole world in memory at the same time.

The size of the cells can vary depending on the actual needs of your game. If you can travel very quickly in your game, you may need larger cells.

Instead of just having 9 large cells loaded at any one time, you may instead opt to have smaller cells but have more layers of them loaded at any one time. The below has 2 layers of cells loaded around the player, so has to keep 5×5 = 25 cells loaded at any one time. This can give you more granularity if you need it.

The number of cells you have to keep in memory when keeping N layers of cells loaded around the player is (2N+1)^2.

1 layer means 9 cells, 2 layers mean 25 cells, 3 layers mean 49 cells, 4 layers mean 81 cells and so on.

This isn’t the only way to arrange your world tiles though. You can also make it so every row of tiles is offset by half a tile from the one above it. That gives you a setup like this:

With this setup, keeping a single layer of cells loaded around the player takes only 7 cells instead of 9. That might not sound like much, but that means that your memory budget is 129% of what it was the other way.

Alternately, it means you can keep your cells at the same quality level but only have to load 78% as much stuff from disk as the player moves around the world.

For keeping N layers of cells around the player you need to keep 3N^2+3N+1 cells in memory.

1 layer means 7 cells, 2 layers mean 19 cells, 3 layers mean 37 cells, 4 layers mean 61 cells.

Here’s a table to show you how the regular grid compares to the half offset grid for cell counts. The savings do get better with more layers, but not very quickly.

\begin{array}{c|c|c|c} \text{Layers} & \text{Regular Grid} & \text{Half Offset Grid} & \text{Size} \\ \hline 1 & 9 & 7 & 77.8\%\\ 2 & 25 & 19 & 76\%\\ 3 & 49 & 37 & 75.5\%\\ 4 & 81 & 61 & 75.3\%\\ \end{array}

Overall, this is a pretty cool technique that is pretty low cost if you do this early in the project. Once you have a lot of content divided into a regular grid, it can be a challenge to move over to this half tile offset grid.

Some Notes From Readers

@chrispewebb said that an issue he’s faced when going this method is having T junctions on LODed terrain, but that skirts should be able to help there.

@runevision pointed out that while the memory requirements are lowered, so is the shortest distance (radius) from the player to data that isn’t loaded. One idea to deal with this if it’s a problem could be to use smaller cell sizes and to do more layers to make up for it.