Dissecting “Tiny Clouds”

There is an amazing shadertoy called “Tiny Clouds” by stubbe (twitter: @Stubbesaurus) which flies you through nearly photorealistic clouds in only 10 lines of code / 280 characters (2 old sized tweets or 1 new larger sized tweet).

The code is a bit dense, so I wanted to take some time to understand it and share the explanation for anyone else who was interested. Rune (the author) kindly answered a couple questions for me as well. Thanks Rune!

Link: [SH17A] Tiny Clouds (Check out this link, it looks even more amazing in motion)

Here is the code in full. The texture in iChannel0 is just a white noise texture that is bilinearly sampled.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

BTW this shadertoy is a shrunken & reinterpreted version of a larger, more feature rich shadertoy by iq: Clouds

Before diving into the details of the code, here is how it works in short:

  • Every pixel does a ray march from far to near. It does it backwards to make for simpler alpha blending math.
  • At every ray step, it samples FBM data (fractal brownian motion) to figure out if the current position is below the surface of the cloud or above it.
  • If below, it alpha blends the pixel color with the cloud color at that point, using the vertical distance into the cloud as the cloud density.

Pretty reasonable and simple – and it would have to be, to look so good in so few characters! Let’s dig into the code.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

Line 1 is a define that we’ll come back to and line 2 is just a minimal definition of the mainImage function.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

On line 3 several variables are declared:

  • p – this is the variable that holds the position of the ray during the ray march. It isn’t initialized here, but that’s ok because the position is calculated each step in the loop. It is interesting to see that the y component of p is never used. p.x is actually depth into the screen, p.z is the screen x axis, and p.w is the screen y axis (aka the up axis). I believe that the axis choices and the fact that the y component is never used is purely to make the code smaller.
  • d – this is the direction that the ray for this pixel travels in. It uses the same axis conventions as p, and the y component is also never used (except implicitly for calculating p.y, which is never used). 0.8 is subtracted from d.z and d.w (the screen x and screen y axes). Interestingly that makes the screen x axis 0 nearly centered on the screen. It also points the screen y axis downward a bit, putting the 0 value near the top of the screen to make the camera look more downward at the clouds.
  • c – this is the color of the sky, which is a nice sky blue. It’s initialized with constants in x and y, and then d is used for z and w. d.xy goes into c.zw. That gives c the 0.8 value in the z field. I’m sure it was done this way because it’s fewer characters to initialize using “d” compared to “.8,0.” for the same effect. Note that c.w is used to calculate O.w (O.a) but that the alpha channel of the output pixel value is currently ignored by shadertoy, so this is a meaningless by product of the code, not a desired feature.
#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

Line 4 initializes the output pixel color to be the sky color (c), but then subtracts d.w which is the pixel’s ray march direction on the screen y axis. This has a nice effect of making a nice sky blue gradient.

To see this in action, here we set O to c:

Here we set O to c-d.w:

It gets darker blue towards the top – where d.w is positive – because a positive number is being subtracted from the sky color. The color values get smaller.

It gets lighter towards the bottom – where d.w is negative – because a negative number is being subtracted from the sky color. The color values get larger.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

On line 5, the for loop for ray marching starts. A few things happen here:

  • f is declared – f is the signed vertical distance from the current point in space to the cloud. If negative, it means that the point is inside the cloud. If positive, it means that the point is outside (above) the cloud. It isn’t initialized here, but it’s calculated each iteration of the loop so that’s fine.
  • s is declared – s is a scale value for use with the FBM data. FBMs work by sampling multiple octaves of data. You scale up the position and scale down the value for each octave. s is that scale value, used for both purposes. This isn’t initialized but is calculated each frame so that’s fine.
  • t is declared and initialized – t (aka ray march step index) is initialized to 2e2 aka 200. It was done this way because “2e2” is smaller than “200.” by one character. Note that the for loop takes t from 200 to 0. The ray marching happens back to front to simplify alpha blending. The sin(dot(x,x)) part I want to talk about briefly below.
  • p is calculated – p (aka the position in the current step of the ray march) is calculated, and this happens every step of the loop. p is t (time) multiplied by the direction of the ray for this pixel, and multiplied by .05 to scale it down.

The reason that sin(dot(x,x)) is added to the “ray time” is because the ray is marching through voxelized data (boxes). Unlike boxes, clouds are supposed to look organic, and not geometric. A way to fight the problem of the data looking boxy is to add a little noise to each ray to break up the geometric pattern. You can either literally add some noise to the result, or do what this shader does, which is add some noise to the starting position of the ray so that neighboring rays will cross the box (voxel) boundaries at different times and will look noisy instead of geometric.

I can’t see a difference when removing this from the shader, and other people have said the same. Rune says in the comments that it’d be on the chopping block for sure if he needed to shave off some more characters. He reached his 280 character goal, so no there is no need to remove it.

For what it’s worth, here is that expression visualized in the blue channel. the -1 to +1 is mapped to 0 to 1 by multiplying it by a half and adding a half:

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

Line 6 adds the current time to p.x and p.z. Remember that the x component is the axis pointing into the screen and the z component is the screen space x axis, so this line of code moves the camera forward and to the right over time.

If you are wondering why the lines in the for loop end in a comma instead of a semicolon, the reason is because if a semicolon was used instead, the for loop would require two more characters: “{” and “}” to show where the scope of the loop started and ended. Ending the lines with commas mean it’s one long statement, so the single line version of a for loop can be used. An interesting trick 😛

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

Line 7 sets / initializes s to 2. Remember that s is used as the octave scale for sample position and resulting value. That will come into play in the next line.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

First let’s look at line 1, which is the “T” macro.

That macro samples the texture (which is just white noise) at a position described by the current ray position in the ray march. the s variable is used to scale up the position, and it’s also used to scale down the noise value at that position. The same position involves p.zw which is the screen space x and y axis respectively, but also includes p.x which is the axis pointing into the screen. This maps a 3d coordinate to a 2d texture location. I have tried making the shader sample a 3d white noise texture instead of doing this and get what looks to be the same quality results.

The macro also multiplies s by 2 each sample, so that the next sample will sample the next octave.

An interesting part of this texture coordinate conversion from 3d to 2d though is that the x component is ceil’d(the axis that goes into the screen). I’m not sure if there is any logic to this other than it’s a way to transform the 3d coordinates into a 2d one for the texture lookup.

Below is what it looks like without the ceil in the macro for s*p.x. It stretches the noise in a weird way.

The uv coordinates sampled are divided by 2e2 (which is 200, but again, fewer characters than “200.”). I believe this value of 200 matches the number of ray march steps intentionally, so that the ray marches across the entire texture (with wrap around) each time.

Line 8 uses this macro. We set f to be p.w, which is the ray’s height. 1 is added to the height which moves the camera up one unit. Lastly, the T macro is used to subtract 4 octaves of noise from f.

The result of this is that f gives us a signed distance to the cloud on the vertical axis. In other words, f tells us how far above or below the surface of the clouds we are. A positive value means the position is above the clouds, and a negative value means the position is below the clouds.

#define T texture(iChannel0,(s*p.zw+ceil(s*p.x))/2e2).y/(s+=s)*4.
void mainImage(out vec4 O,vec2 x){
    vec4 p,d=vec4(.8,0,x/iResolution.y-.8),c=vec4(.6,.7,d);
    O=c-d.w;
    for(float f,s,t=2e2+sin(dot(x,x));--t>0.;p=.05*t*d)
        p.xz+=iTime,
        s=2.,
        f=p.w+1.-T-T-T-T,
    	f<0.?O+=(O-1.-f*c.zyxw)*f*.4:O;
}

Line 10 is the close of the function, so line 9 is the last meaningful line of code.

This line of code says:

  • If f less than zero (“If the point is inside the cloud”)
  • Then add “some formula” to the pixel color (more info on that in a moment)
  • Else, “O”. This is a dummy statement with no side effects that is there to satisfy the ternary operator syntax with a minimal number of characters.

I was looking at that formula for a while, trying to figure it out. I was thinking maybe it was something like a cheaper function fitting of some more complex light scattering / absorption function.

I asked Rune and he explained it. All it’s doing is doing an alpha blend (a lerp) from the current pixel color to the color of the cloud at this position. If you do the lerp mathematically, expand the function and combine terms, you get the above. Here’s his explanation from twitter (link to twitter thread):

Alpha blending between accumulated color (O) and incoming cloud color (1+f*c.zyxw). Note density (f) is negative:
O = lerp(O, 1+f*c.zyxw, -f*.4)
O = O * (1+f*.4) + (1+f*c.zyxw)*-f*.4
O = O + O*f*.4 + (1+f*c.zyxw)*-f*.4
O = O + (O-1-f*c.zyxw)*f*.4
O += (O-1-f*c.zyxw)*f*.4

Remember the marching is from far to near which simplifies the calculations quite a bit. If the marching was reversed then you would also need to keep track of an accumulated density.

One obvious question then would be: why is “1+f*c.zyxw” the cloud color of the current sample?

One thing that helps clear that up is that f is negative. if you make “f” mean “density” and flip it’s sign, the equation becomes: “1-density*c.zyxw”

We can then realize that “1” when interpreted as a vec4 is the color white, and that c is the sky color. We can also throw out the w since we (and shadertoy) don’t care about the alpha channel. We can also replace x,y,z with r,g,b. That makes the equation become: “white-density*skycolor.bgr”

In that equation, when density is 0, all we are left with is white. As density increases, the color gets darker.

The colors are the reversed sky color, because the sky color is (0.6, 0.7, 0.8). if we used the sky color instead of the reversed sky color, you can see that blue would drop away faster than green, which would drop away faster than red. If you do that, the clouds turn a reddish color like you can see here:

I’m not an expert in atmospheric rendering (check links at the bottom for more info on that!), but it looks more natural and correct for it to do the reverse. What we really want is for red to drop off the quickest, then green, then blue. I believe a more correct thing to do would be to subtract sky color from 1.0 and use that color to multiply density by. However, reversing the color channels works fine in this case, so no need to spend the extra characters!

Another obvious question might be: why is the amount of lerp “-f*.4”?

It probably looks strange to see a negative value in a lerp amount, but remembering that f is negative when it’s inside a cloud means that it’s a positive value, multiplied by 0.4 to make it smaller. It’s just scaling the density a bit.

Other Notes

Using bilinear interpolation of the texture makes a big difference. If you switch the texture to using nearest neighbor point sampling you get something like this which looks very boxy. It looks even more boxy when it’s in motion.

One thing I wanted to try when understanding this shader was to try to replace the white noise texture lookup with a white noise function. It does indeed work as you can see below, but it got noticeably slower on my machine doing that. I’m so used to things being texture bound that getting rid of texture reads is usually a win. I didn’t stop to think that in this situation all that was happening was compute and no texture reads. In a more fully featured renderer, you may indeed find yourself texture read bound, and moving it out of a texture read could help speed things up – profile and see! It’s worth noting that to get proper results you need to discretize your noise function into a grid and use bilinear interpolation between the values – mimicing what the texture read does. Check my unpacked version of the shader in the links section for more details!

Something kind of fun is that you can replace the white noise texture with other textures. The results seem to be pretty good usually! Below is where i made the shadertoy use the “Abstract1” image as a source. The clouds got a lot more soft.

Thanks for reading. Anything that I got wrong or missed, please let me and the other readers know!

Links

Here is my unpacked version of the shader, which includes the option to use a white noise function instead of a white noise texture: Tiny Clouds: Unpacked & No Tex

Here are two great links for more information on how to render atmospheric and volumetric effects:

Volumetric Atmospheric Scattering

Creating a Volumetric Ray Marcher

Demystifying Floating Point Precision

Floating point numbers have limited precision. If you are a game programmer, you have likely encountered bugs where things start breaking after too much time has elapsed, or after something has moved too far from the origin.

This post aims to show you how to answer the questions:

  1. What precision do I have at a number?
  2. When will I hit precision issues?

First, a very quick look at the floating point format.

Floating Point Format

Floating point numbers (Wikipedia: IEEE 754) have three components:

  1. Sign bit – whether the number is positive or negative
  2. Exponent bits – the magnitude of the number
  3. Mantissa bits – the fractional bits

32 bit floats use 1 bit for sign, 8 bits for exponent and 23 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 127 to get the actual exponent, meaning the exponent can be from -126 to +127.

64 bit doubles use 1 bit for sign, 11 bits for exponent and 52 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 1023 to get the actual exponent, meaning the exponent can be from -1022 to +1023.

16 bit half floats use 1 bit for sign, 5 bits for exponent and 10 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 15 to get the actual exponent, meaning the exponent can be from -14 to +15.

For all of the above, an exponent of all zeros has the special meaning “exponent 0” (and this is where the denormals / subnormals come into play) and all ones has the special meaning “infinity”

The exponent bits tell you which power of two numbers you are between – [2^{exponent}, 2^{exponent+1}) – and the mantissa tells you where you are in that range.

What precision do I have at a number?

Let’s look at the number 3.5.

To figure out the precision we have at that number, we figure out what power of two range it’s between and then subdivide that range using the mantissa bits.

3.5 is between 2 and 4. That means we are diving the range of numbers 2 to 4 using the mantissa bits. A float has 23 bits of mantissa, so the precision we have at 3.5 is:

\frac{4-2}{2^{23}} = \frac{2}{8388608} \approx 0.000000238418579

3.5 itself is actually exactly representable by a float, double or half, but the amount of precision numbers have at that scale is that value. The smallest number you can add or subtract to a value between 2 and 4 is that value. That is the resolution of the values you are working with when working between 2 and 4 using a float.

Using a double instead of a float gives us 52 bits of mantissa, making the precision:

\frac{4-2}{2^{52}} = \frac{2}{4503599627370496} \approx 0.00000000000000044408921

Using a half float with 10 bits of mantissa it becomes:

\frac{4-2}{2^{10}} = \frac{2}{1024} = 0.001953125

Here’s a table showing the amount of precision you get with each data type at various exponent values. N/A is used when an exponent is out of range for the specific data type.

\begin{array}{c|c|c|c|c} exponent & range & half & float & double \\ \hline 0 & [1,2) & 0.0009765625 & 0.00000011920929 & 0.0000000000000002220446 \\ 1 & [2,4) & 0.001953125 & 0.000000238418579 & 0.00000000000000044408921 \\ 2 & [4,8) & 0.00390625 & 0.000000476837158 & 0.00000000000000088817842 \\ 9 & [512, 1024) & 0.5 & 0.00006103515 & 0.00000000000011368684 \\ 10 & [1024,2048) & 1 & 0.00012207031 & 0.00000000000022737368 \\ 11 & [2048,4096) & 2 & 0.00024414062 & 0.00000000000045474735 \\ 12 & [4096,8192) & 4 & 0.00048828125 & 0.0000000000009094947 \\ 15 & [32768, 65536) & 32 & 0.00390625 & 0.0000000000072759576 \\ 16 & [65536, 131072) & N/A & 0.0078125 & 0.0000000000014551915 \\ 17 & [131072, 262144) & N/A & 0.015625 & 0.00000000002910383 \\ 18 & [262144, 524288) & N/A & 0.03125 & 0.000000000058207661 \\ 19 & [524288, 1048576) & N/A & 0.0625 & 0.00000000011641532 \\ 23 & [8388608,16777216) & N/A & 1 & 0.00000000186264515 \\ 52 & [4503599627370496, 9007199254740992) & N/A & 536870912 & 1 \\ \end{array}

A quick note on the maximum number you can store in floating point numbers, by looking at the half float specifically:

A half float has a maximum exponent of 15, which you can see above puts the number range between 32768 and 65536. The precision is 32 which is the smallest step that can be made in a half float at that scale. That range includes the smaller number but not the larger number. That means that the largest number a half float can store is one step away (32) from the high side of that range. So, the largest number that can be stored is 65536 – 32 = 65504.

How Many Digits Can I Rely On?

Another helpful way of looking at floating point precision is how many digits of precision you can rely on.

A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.

That means that from 0 to 1, you have quite a few decimal places to work with. If you go into the hundreds or thousands, you’ve lost a few. When you get up into the tens of millions, you’ve run out of digits for anything beyond the decimal place.

You can actually see that this is true in the table in the last section. With floating point numbers, it’s at exponent 23 (8,388,608 to 16,777,216) that the precision is at 1. The smallest value that you can add to a floating point value in that range is in fact 1. It’s at this point that you have lost all precision to the right of the decimal place. Interestingly, you still have perfect precision of the integers though.

Half floats have 10 mantissa bits and 2^10 = 1024, so they just barely have 3 digits of precision.

Doubles have 52 mantissa bits and 2^52 = 4,503,599,627,370,496. That means doubles have between 15 and 16 digits of precision.

This can help you understand how precision will break down for you when using a specific data type for a specific magnitude of numbers.

When will I hit precision issues?

Besides the loose rules above about how many digits of precision you can count on, you can also solve to see when precision will break down for you.

Let’s say that you are tracking how long your game has been running (in seconds), and you do so by adding your frame delta (in seconds) to a variable every frame.

If you have a 30fps game, your frame delta is going to be 0.0333.

Adding that each frame to a float will eventually cause the float to reach a value where that number is smaller than the smallest difference representable (smaller than the precision), at which point things will start breaking. At first your accuracy will drop and your time will be wrong, but eventually adding your frame delta to the current time won’t even change the value of the current time. Time will effectively stop!

When will this happen though?

We’ll start with the formula we saw earlier and do one step of simple algebra to get us an equation which can give us this answer.

\frac{range}{mantissa} = precision \\ \\ range = mantissa * precision

How we use this formula is we put the precision we want into “precision” and we put the size of the mantissa (2^{MantissaBits}) into “mantissa”. The result tells us the range that we’ll get the precision at.

Let’s plug in our numbers:

range = 8388608 * 0.0333 = 279340.6464

This tells us the range of the floating point numbers where we’ll have our problems, but this isn’t the value that we’ll have our problems at, so we have another step to do. We need to find what exponent has this range.

Looking at the table earlier in the post you might notice that the range at an exponent also happens to be just 2^{exponent}.

That’s helpful because that just means we take log2 of the answer we got:

log2(279340.6464) = 18.0916659875

Looking at the table again, we can see that floating point numbers have a precision of 0.03125 at exponent value 18. So, exponent 18 is close, but it’s precision is smaller than what we want – aka the precision is still ok.

That means we need to ceil() the number we got from the log2.

Doing that, we see that things break down at exponent 19, which has precision of 0.0625. This actual value it has this problem at is 528,288 (which is 2^{19}).

So, our final formula for “where does precision become this value?” becomes:

value = pow(2, ceil(log2(mantissa * precision)))

Note that at exponent 18, there is still imprecision happening. When adding 1/30 to 264144, It goes from 264144 to 264144.031 to 264144.063, instead of 264144, 264144.033, 264144.066. There is error, but it’s fairly small.

At exponent 19 though, things fall apart a lot more noticeably. When adding 1/30 to 528288, it goes from 528288 to 528288.063 to 528288.125. Time is actually moving almost twice as fast in this case!

At exponent 20, we start at 1056576.00 and adding 1/30 doesn’t even change the value. Time is now stopped.

It does take 6.1 days (528,288 seconds) to reach exponent 19 though, so that’s quite a long time.

If we use half floats, it falls apart at value 64. That’s right, it only takes 64 seconds for this to fall apart when using 16 bit half floats, compared to 6.1 days when using 32 bit floats!

With doubles, it falls apart at value 281,474,976,710,656. That is 8,925,512 years!

Let’s check out that equation again:

value = pow(2, ceil(log2(mantissa * precision)))

A possibly more programmer friendly way to do the above would be to calculate mantissa * precision and then round up to the next power of 2. That’s exactly what the formula is doing above, but in math terms, not programming terms.

Storing Integers

I recently learned that floating point numbers can store integers surprisingly well. It blows my mind that I never knew this. Maybe you are in the same boat 😛

Here’s the setup:

  1. For any exponent, the range of numbers it represents is a power of 2.
  2. The mantissa will always divide that range into a power of 2 different values.

It might take some time and/or brain power to soak that up (it did for me!) but what that ends up ultimately meaning is that floating point numbers can exactly represent a large number of integers.

In fact, a floating point number can EXACTLY store all integers from -2^{MantissaBits+1} to +2^{MantissaBits+1}.

For half floats that means you can store all integers between (and including) -2048 to +2048. (\pm 2^{11})

For floats, it’s -16,777,216 to +16,777,216. (\pm 2^{24})

For doubles it’s -9,007,199,254,740,992 to +9,007,199,254,740,992. (\pm 2^{53})

Doubles can in fact exactly represent any 32 bit unsigned integer, since 2^32 = 4,294,967,296.

Links

Here are some links you might find interesting!

Floating point visually explained:
http://fabiensanglard.net/floating_point_visually_explained/

What Every Computer Scientist Should Know About Floating-Point Arithmetic:
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

A matter of precision:
http://tomforsyth1000.github.io/blog.wiki.html#[[A%20matter%20of%20precision]]

Denormal numbers – aka very small numbers that make computations slow when you use them:
https://en.m.wikipedia.org/wiki/Denormal_number

Catastrophic Cancellation – a problem you can run into when doing floating point math:
https://en.wikipedia.org/wiki/Loss_of_significance

A handy web page that lets you play with the binary representation of a float and what number it comes out as:
https://www.h-schmidt.net/FloatConverter/IEEE754.html

Half precision floating point format:
https://en.wikipedia.org/wiki/Half-precision_floating-point_format

What is the first integer that a float is incapable of representing?
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

Ready to go deeper? Bruce Dawson has some amazing write ups on deeper floating point issues:
https://randomascii.wordpress.com/category/floating-point/

This talks about how to use floating point precision limits as an activation function in a neural network (?!)
https://blog.openai.com/nonlinear-computation-in-linear-networks/

My Old Master: On Optimism

The “My Old Master” posts are non technical posts in reference to my karate (shaolin kempo) teacher, and the things he taught my friends and I over a decade to be martial artists (peaceful warriors), instructors, and better human beings.

I’ve been in a funk for a few weeks – ever since the time change.

Several things have aligned just so to make things particularly shitty, such as the children being sick, them not sleeping, our son transitioning to preschool, holiday and other responsibilities eating up the almost non existent free time, and perhaps most of all, me missing/skipping my weekly exercise routine.

I’m starting to recover (sleep and exercise have been helping a lot) but in doing so, I’m reminded of some things “my old master” told me on the topic of optimism, that I think are worth sharing.

As time passes, I see more and more about how the most successful people use “brain hacks” to help them ensure success. It’s weird to think of your brain and your instincts as tools to leverage to your advantage but they totally are.

As a quick example, if you are aiming to eat fewer sweets, making sure you don’t have any around the house is a great first step to achieving your goal.

Ages and ages of evolution of our species have hard wired us to go after those calories so you don’t starve. It’s very very difficult (read: impossible) to combat that. The best thing to do is to not even have the option.

This is a hack to help you succeed in your goal.

Optimism, But Not Blind Optimism

Just like in the sweets example, optimism can be a great tool for achieving your goals, but as we all know, blind optimism is foolish and can definitely negatively impact your goals.

To reconcile these two things, we should “Expect the best, but plan for the worst”.

This makes us optimistic, but if things go wrong, we aren’t completely blindsided and unprepared.

Why should we be optimistic to begin with though?

Because: “You get what you expect”

Imagine your neighbors dog is consistently pooping near your house and the owner is not cleaning it up. You decide it’s time to confront them about it and go knock on their door.

There’s two ways you might go into this situation.

The first way might be, you are pissed, and you expect a fight. They open the door, see you angry, immediately their “guard goes up” and there’s little chance the outcome will be anything other than an awful experience for one or both of you. The person may even make it a point to have their dog poop on your lawn and not clean it up.

The second way might be that you realize you’ve seen your neighbor playing with their kids, being a good parent, and that in general they seem like a good natured person and a good neighbor except for this one issue. Because of this, you figure the conversation will be completely peaceful, it will be totally fine, and your neighbor will “get it”. They open the door, see you smiling and hear you using a friendly respectful tone, and they respond similarly. Perhaps they are embarrassed about it even, and profusely apologize.

It’s definitely true that neither of these situations are guaranteed to play out like this, but the odds are improved that they will.

Improving the odds for getting what you want is a good thing. If you don’t go into it blindly (prepare for the worst), you are also in a reasonable position if things don’t go like you want them to.

Finding The Positive

There are negatives and positives to every situation. Whichever you focus on is up to you.

Imagine yourself in a dark room where there is sewage in one corner and a pile of shiny gold in the other. You have a flash light. Which are you going to look at?

Whatever you choose to focus on will rattle around in your head and become amplified. This is the story about there being two wolves in us, and whichever we feed is the one that gets stronger.

You may notice this in yourself in fact. Have you ever dwelled on something negative only to have it get worse and worse in your mind, til it was unbearable and causes you to do something? Perhaps quiting a job, telling someone off, or similar? Maybe you have some of this going on right now somewhere in your life?

Recognizing and disrupting those patterns can help you keep from over-reacting or incorrectly reacting to situations, both of which are inappropriate because of the fact (and identified by the fact…) that they actually set you back towards achieving your goals.

Taking this mental life hack a bit further, there are concepts to help you visualize your goals, how you are going to achieve those goals, and constantly remind you of these things.

A vision board is one such example. You find imagery that speaks to you and reminds you of what you want, and how you are going to get there, and you put it somewhere highly visible to you that you see every day.

Seeing this stuff daily ingrains in your mind what it is you want to do and how you are going to do it. Any opportunities that come up that help you get closer will more easily be identified and you’ll be in a better position to take them. “Luck is where opportunity meets preparedness”.

For me, this blog is in many ways similar to a vision board. Besides being external memory (for me to re-learn things) and a resume helper, it also helps me remember that I am experienced, skilled and decently talented – or at least persistent enough to achieve meaningful things.

We humans sometime look at how others see us to get an idea of ourselves and base our self worth on that. That is a pretty awful idea though, as other people don’t know always what we are capable of, and frankly probably don’t even care, as they have their own agenda and goals.

Whatever you can do to help you visualize your goals and how you are going to achieve them is going to help you succeed.

If you ever find yourself in a funk, I highly recommend these three things:

  • Make sure you are getting enough sleep
  • Make sure you are getting some exercise (an hour martial arts class a week is enough for me to feel the benefits!)
  • Look to see if you are having any cyclical negative thoughts. If so, see if you can break out of them by turning your flashlight onto the gold, instead of the poop. Possibly using something like a vision board, or whatever works for you.

Thanks for reading!

Animating Noise For Integration Over Time 2: Uniform Over Time

After I put out the last post, Mikkel Gjoel (@pixelmager), made an interesting observation that you can see summarized in his image below. (tweet / thread here)

BTW Mikkel has an amazing presentation about rendering the beautiful game “Inside” that you should check out. Lots of interesting techniques used, including some enlightening uses of noise.
YouTube –
Low Complexity, High Fidelity: The Rendering of INSIDE

The images left to right are:

  • One frame of white noise
  • N frames of white noise averaged.
  • N frames averaged where the first frame is white noise, and a per frame random number is added to all pixels every frame.
  • N frames averaged where the first frame is white noise, and 1/N is added to all pixels every frame.
  • N frames averaged where the first frame is white noise, and the golden ratio is added to all pixels every frame.

In the above, the smoother and closer to middle grey that an image is, the better it is – that means it converged to the true result of the integral better.

Surprisingly it looks like adding 1/N outperforms the golden ratio, which means that regular spaced samples are outperforming a low discrepancy sequence!

To compare apples to apples, we’ll do the “golden ratio” tests we did last post, but instead do them with adding this uniform value instead.

To be explicit, there are 8 frames and they are:

  • Frame 0: The noise
  • Frame 1: The noise + 1/8
  • Frame 2: The noise + 2/8
  • Frame 7: the noise + 7/8

Modulus is used to keep the values between 0 and 1.

Below is how white noise looks animated with golden ratio (top) vs uniform values (bottom). There are 8 frames and it’s played at 8fps so it loops every second.

Interleaved Gradient Noise. Top is golden ratio, bottom is uniform.

Blue Noise. Top is golden ratio, bottom is uniform.

The uniform ones look pretty similar. Maybe a little smoother, but it’s hard to tell by just looking at it. Interestingly, the frequency content of the blue noise seems more stable using these uniform values instead of golden ratio.

The histogram data of the noise was the same for all frames of animation, just like in last post, which is a good thing. The important bit is that adding a uniform value doesn’t modify the histogram shape, other than changing which counts go to which specific buckets. Ideally the histogram would start out perfectly even like the blue noise does, but since this post is about the “adding uniform values” process, and not about the starting noise, this shows that the process does the right thing with the histogram.

  • White Noise – min 213, max 306, average 256, std dev 16.51
  • Interleaved Gradient Noise – min 245, max 266, average 256, std dev 2.87
  • Blue Noise – min, max, average are 256, std dev 0.

Let’s look at the integrated animations.

White noise. Top is golden ratio, bottom is uniform.

Interleaved gradient noise. Top is golden ratio, bottom is uniform.

Blue noise. Top is golden ratio, bottom is uniform.

The differences between these animations are subtle unless you know what you are looking for specifically so let’s check out the final frames and the error graphs.

Each noise comparison below has three images. The first image is the “naive” way to animate the noise. The second uses golden ratio instead. The third one uses 1/N. The first two images (and techniques) are from (and explained in) the last post, and the third image is the technique from this post.

White noise. Naive (top), golden ratio (mid), uniform (bottom).


Interleaved gradient noise. Naive (top), golden ratio (mid), uniform (bottom).


Blue noise. Naive (top), golden ratio (mid), uniform (bottom).


So, what’s interesting is that the uniform sampling over time has lower error and standard deviation (variance) than golden ratio, which has less than the naive method. However, it’s only at the end that the uniform sampling over time has the best results, and it’s actually quite terrible until then.

The reason for this is that uniform has good coverage over the sample space, but it takes until the last frame to get that good coverage because each frame takes a small step over the remaining sample space.

What might work out better would be if our first frame was the normal noise, but then the second frame was the normal noise plus a half, so we get the most information we possibly can from that sample by splitting the sample space in half. We would then want to cut the two halves of the space space in half, and so the next two frames would be the noise plus 1/4 and the noise plus 3/4. We would then continue with 1/8, 5/8, 3/8 and 7/8 (note we didn’t do these 1/8 steps in order. We did it in the order that gives us the most information the most quickly!). At the end of all this, we would have our 8 uniformly spaced samples over time, but we would have taken the samples in an order that makes our intermediate frames look better hopefully.

Now, interestingly, that number sequence I just described has a name. It’s the base 2 Van Der Corput sequence, which is a type of low discrepancy sequence. It’s also the 1D version of the Halton sequence, and is related to other sequences as well. More info here: When Random Numbers Are Too Random: Low Discrepancy Sequences

Mikkel mentioned he thought this would be helpful, and I was thinking the same thing too. Let’s see how it does!

White noise. Uniform (top), Van Der Corput (bottom).

Interleaved gradient noise. Uniform (top), Van Der Corput (bottom).

Blue noise. Uniform (top), Van Der Corput (bottom).

The final frames look the same as before (and the same as each other), so I won’t show those again but here are the updated graphs.



Interestingly, using the Van Der Corput sequence has put intermediate frames more in line with golden ratio, while of course still being superior at the final frame.

I’ve been trying to understand why uniform sampling over time out performs the golden ratio which acts more like blue noise over time. I still don’t grasp why it works as well as it does, but the proof is in the pudding.

Theoretically, this uniform sampling over time should lead to the possibility of aliasing on the time axis, since blue noise / white noise (and other randomness) get rid of the aliasing in exchange for noise.

Noise over the time dimension would mean missing details that were smaller than the sample spacing size. in our case, we are using the time sampled values (noise + uniform value) to threshold a source image to make a sample. It may be that since we are thresholding, that aliasing isn’t possible since our sample represents everything below or equal to the value?

I’m not really sure, but will be thinking about it for a while. If you have any insights please let me know!

It would be interesting to try an actual 1d blue noise sequence and see how it compares. If it does better, it sounds like it would be worth while to try jittering the uniform sampled values on the time axis to try and approximate blue noise a bit. Mikkel tried the jittering and said it gave significantly worse results, so that seems like a no go.

Lastly, some other logical experiments from here seem to be…

  • See how other forms of noise and ordered dithers do, including perhaps a Bayer Matrix. IG noise seems to naturally do better on the time axis for some reason I don’t fully understand yet. There may be some interesting properties of other noise waiting to be found.
  • Do we get any benefits in this context by arranging the interleaved gradient noise in a spiral like Jorge mentions in his presentation? (Next Generation Post Processing In Call Of Duty: Advanced Warfare
  • It would be interesting to see how this works in a more open ended case – such as if you had temporal AA which was averaging a variable number of pixels each frame. Would doing a van Der Corput sequence give good results there? Would you keep track of sample counts per pixel and keep marching the Van Der Corput forward for each pixel individually? Or would you just pick something like an 8 Van Der Corput sequence, adding the current sequence to all pixels and looping that sequence every 8 frames? It really would be interesting to see what is best in that sort of a setup.

I’m sure there are all sorts of other things to try to. This is a deep, interesting and important topic for graphics and beyond (:

Code

Source code below, but it’s also available on github, along with the source images used: Github:
Atrix256/RandomCode/AnimatedNoise

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <atomic>
#include <thread>
#include <complex>
#include <array>

typedef uint8_t uint8;

const float c_pi = 3.14159265359f;

// settings
const bool c_doDFT = true;

// globals 
FILE* g_logFile = nullptr;

//======================================================================================
inline float Lerp (float A, float B, float t)
{
    return A * (1.0f - t) + B * t;
}

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (const SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
void ImageConvertToLuma (SImageData& image)
{
    ImageForEachPixel(
        image,
        [] (SColor& pixel, size_t pixelIndex)
        {
            float luma = float(pixel.R) * 0.3f + float(pixel.G) * 0.59f + float(pixel.B) * 0.11f;
            uint8 lumau8 = uint8(luma + 0.5f);
            pixel.R = lumau8;
            pixel.G = lumau8;
            pixel.B = lumau8;
        }
    );
}

//======================================================================================
void ImageCombine2 (const SImageData& imageA, const SImageData& imageB, SImageData& result)
{
    // put the images side by side. A on left, B on right
    ImageInit(result, imageA.m_width + imageB.m_width, max(imageA.m_height, imageB.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B on right
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
void ImageCombine3 (const SImageData& imageA, const SImageData& imageB, const SImageData& imageC, SImageData& result)
{
    // put the images side by side. A on left, B in middle, C on right
    ImageInit(result, imageA.m_width + imageB.m_width + imageC.m_width, max(max(imageA.m_height, imageB.m_height), imageC.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B in middle
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image C on right
    for (size_t y = 0; y < imageC.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3 + imageC.m_width * 3];
        SColor* srcPixel = (SColor*)&imageC.m_pixels[y * imageC.m_pitch];
        for (size_t x = 0; x < imageC.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
float GoldenRatioMultiple (size_t multiple)
{
    return float(multiple) * (1.0f + std::sqrtf(5.0f)) / 2.0f;
}

//======================================================================================
void IntegrationTest (const SImageData& dither, const SImageData& groundTruth, size_t frameIndex, const char* label)
{
    // calculate min, max, total and average error
    size_t minError = 0;
    size_t maxError = 0;
    size_t totalError = 0;
    size_t pixelCount = 0;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            totalError += error;

            if ((x == 0 && y == 0) || error < minError)
                minError = error;

            if ((x == 0 && y == 0) || error > maxError)
                maxError = error;

            ++ditherPixel;
            ++truthPixel;
            ++pixelCount;
        }
    }
    float averageError = float(totalError) / float(pixelCount);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            float diff = float(error) - averageError;

            sumSquaredDiff += diff*diff;
        }
    }
    float stdDev = std::sqrtf(sumSquaredDiff / float(pixelCount - 1));

    // report results
    fprintf(g_logFile, "%s %zu error\n", label, frameIndex);
    fprintf(g_logFile, "  min error: %zu\n", minError);
    fprintf(g_logFile, "  max error: %zu\n", maxError);
    fprintf(g_logFile, "  avg error: %0.2f\n", averageError);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "\n");
}

//======================================================================================
void HistogramTest (const SImageData& noise, size_t frameIndex, const char* label)
{
    std::array<size_t, 256> counts;
    std::fill(counts.begin(), counts.end(), 0);

    ImageForEachPixel(
        noise,
        [&] (const SColor& pixel, size_t pixelIndex)
        {
            counts[pixel.R]++;
        }
    );

    // calculate min, max, total and average
    size_t minCount = 0;
    size_t maxCount = 0;
    size_t totalCount = 0;
    for (size_t i = 0; i < 256; ++i)
    {
        if (i == 0 || counts[i] < minCount)
            minCount = counts[i];

        if (i == 0 || counts[i] > maxCount)
            maxCount = counts[i];

        totalCount += counts[i];
    }
    float averageCount = float(totalCount) / float(256.0f);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t i = 0; i < 256; ++i)
    {
        float diff = float(counts[i]) - averageCount;
        sumSquaredDiff += diff*diff;
    }
    float stdDev = std::sqrtf(sumSquaredDiff / 255.0f);

    // report results
    fprintf(g_logFile, "%s %zu histogram\n", label, frameIndex);
    fprintf(g_logFile, "  min count: %zu\n", minCount);
    fprintf(g_logFile, "  max count: %zu\n", maxCount);
    fprintf(g_logFile, "  avg count: %0.2f\n", averageCount);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "  counts: ");
    for (size_t i = 0; i < 256; ++i)
    {
        if (i > 0)
            fprintf(g_logFile, ", ");
        fprintf(g_logFile, "%zu", counts[i]);
    }

    fprintf(g_logFile, "\n\n");
}

//======================================================================================
void GenerateWhiteNoise (SImageData& image, size_t width, size_t height)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    ImageForEachPixel(
        image,
        [&] (SColor& pixel, size_t pixelIndex)
        {
            uint8 value = dist(rng);
            pixel.R = value;
            pixel.G = value;
            pixel.B = value;
        }
    );
}

//======================================================================================
void GenerateInterleavedGradientNoise (SImageData& image, size_t width, size_t height, float offsetX, float offsetY)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    for (size_t y = 0; y < height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < width; ++x)
        {
            float valueFloat = std::fmodf(52.9829189f * std::fmod(0.06711056f*float(x + offsetX) + 0.00583715f*float(y + offsetY), 1.0f), 1.0f);
            size_t valueBig = size_t(valueFloat * 256.0f);
            uint8 value = uint8(valueBig % 256);
            pixel->R = value;
            pixel->G = value;
            pixel->B = value;
            ++pixel;
        }
    }
}

//======================================================================================
template <size_t NUM_SAMPLES>
void GenerateVanDerCoruptSequence (std::array<float, NUM_SAMPLES>& samples, size_t base)
{
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = 0.0f;
        float denominator = float(base);
        size_t n = i;
        while (n > 0)
        {
            size_t multiplier = n % base;
            samples[i] += float(multiplier) / denominator;
            n = n / base;
            denominator *= base;
        }
    }
}

//======================================================================================
void DitherWithTexture (const SImageData& ditherImage, const SImageData& noiseImage, SImageData& result)
{
    // init the result image
    ImageInit(result, ditherImage.m_width, ditherImage.m_height);

    // make the result image
    for (size_t y = 0; y < ditherImage.m_height; ++y)
    {
        SColor* srcDitherPixel = (SColor*)&ditherImage.m_pixels[y * ditherImage.m_pitch];
        SColor* destDitherPixel = (SColor*)&result.m_pixels[y * result.m_pitch];

        for (size_t x = 0; x < ditherImage.m_width; ++x)
        {
            // tile the noise in case it isn't the same size as the image we are dithering
            size_t noiseX = x % noiseImage.m_width;
            size_t noiseY = y % noiseImage.m_height;
            SColor* noisePixel = (SColor*)&noiseImage.m_pixels[noiseY * noiseImage.m_pitch + noiseX * 3];

            uint8 value = 0;
            if (noisePixel->R < srcDitherPixel->R)
                value = 255;

            destDitherPixel->R = value;
            destDitherPixel->G = value;
            destDitherPixel->B = value;

            ++srcDitherPixel;
            ++destDitherPixel;
        }
    }
}

//======================================================================================
void DitherWhiteNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_whitenoise.bmp");
}

//======================================================================================
void DitherInterleavedGradientNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_ignoise.bmp");
}

//======================================================================================
void DitherBlueNoise (const SImageData& ditherImage, const SImageData& blueNoise)
{
    printf("\n%s\n", __FUNCTION__);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, blueNoise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, blueNoise, dither, combined);
    ImageSave(combined, "out/still_bluenoise.bmp");
}

//======================================================================================
void DitherWhiteNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&](SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i + 1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedIntegrated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatio (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniform (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniformIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedVDCIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
int main (int argc, char** argv)
{
    // load the dither image and convert it to greyscale (luma)
    SImageData ditherImage;
    if (!ImageLoad("src/ditherimage.bmp", ditherImage))
    {
        printf("Could not load src/ditherimage.bmp");
        return 0;
    }
    ImageConvertToLuma(ditherImage);

    // load the blue noise images.
    SImageData blueNoise[8];
    for (size_t i = 0; i < 8; ++i)
    {
        char buffer[256];
        sprintf(buffer, "src/BN%zu.bmp", i);
        if (!ImageLoad(buffer, blueNoise[i]))
        {
            printf("Could not load %s", buffer);
            return 0;
        }

        // They have different values in R, G, B so make R be the value for all channels
        ImageForEachPixel(
            blueNoise[i],
            [] (SColor& pixel, size_t pixelIndex)
            {
                pixel.G = pixel.R;
                pixel.B = pixel.R;
            }
        );
    }

    g_logFile = fopen("log.txt", "w+t");
    
    // still image dither tests
    DitherWhiteNoise(ditherImage);
    DitherInterleavedGradientNoise(ditherImage);
    DitherBlueNoise(ditherImage, blueNoise[0]);

    // Animated dither tests
    DitherWhiteNoiseAnimated(ditherImage);
    DitherInterleavedGradientNoiseAnimated(ditherImage);
    DitherBlueNoiseAnimated(ditherImage, blueNoise);

    // Golden ratio animated dither tests
    DitherWhiteNoiseAnimatedGoldenRatio(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatio(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatio(ditherImage, blueNoise[0]);

    // Uniform animated dither tests
    DitherWhiteNoiseAnimatedUniform(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniform(ditherImage);
    DitherBlueNoiseAnimatedUniform(ditherImage, blueNoise[0]);

    // Animated dither integration tests
    DitherWhiteNoiseAnimatedIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedIntegrated(ditherImage);
    DitherBlueNoiseAnimatedIntegrated(ditherImage, blueNoise);

    // Golden ratio animated dither integration tests
    DitherWhiteNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatioIntegrated(ditherImage, blueNoise[0]);

    // Uniform animated dither integration tests
    DitherWhiteNoiseAnimatedUniformIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniformIntegrated(ditherImage);
    DitherBlueNoiseAnimatedUniformIntegrated(ditherImage, blueNoise[0]);

    // Van der corput animated dither integration tests
    DitherWhiteNoiseAnimatedVDCIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedVDCIntegrated(ditherImage);
    DitherBlueNoiseAnimatedVDCIntegrated(ditherImage, blueNoise[0]);

    fclose(g_logFile);

    return 0;
}

Animating Noise For Integration Over Time

You can use noise textures (like the ones from the last post) to do dithering.

For instance, you can do the process below to make a 1 bit (black and white) dithered image using a gray scale source image and a gray scale noise texture. This would be useful if you had a 1 bit display that you were trying to display an image on.

  1. For each pixel in the source image…
  2. If the source image pixel is brighter than the noise texture, put a white pixel.
  3. Else put a black pixel.

(info on converting images to grey scale here: Converting RGB to Grayscale)

The quality of the result depends on the type of noise you use.

If you use pure random numbers (white noise) it looks like this:

You could also use something called “Interleaved Gradient Noise” which would look like this:

Or you could use blue noise which would look like this:

As you can see, white noise was the worst looking, interleaved gradient noise is is the middle, and blue noise looked the best.

White noise is very cheap to generate and can be done in real time on either the CPU or GPU – you just use random numbers.

Blue noise is more expensive to generate and usually must be done in advance, but gives high quality results.

Interleaved gradient noise, which gives middle results, is actually very similar in generation costs as white noise believe it or not, and so can also be done in real time on either the CPU or GPU.

If you have X and Y pixel coordinates (not uv coordinates), you can generate the noise value for the pixel by using this formula:

float noise = std::fmodf(52.9829189f * std::fmodf(0.06711056f*float(x) + 0.00583715f*float(y), 1.0f), 1.0f);

Interleaved gradient noise was made by Jorge Jimenez (@iryoku1) and you can read more about it at these links:
Next Generation Post Processing in Call Of Duty: Advanced Warfare
Dithering part three – real world 2D quantization dithering (Bart Wronksi)

Dithering still images is fun, but in the context of video games, we are more interested in animated images, so let’s look at things in motion.

Animated Noise

Let’s start by just animating those three noise types over 8 frames.

For white noise, we’ll generate a new white noise texture every frame.

For interleaved gradient noise, we’ll add a random offset (0 to 1000) to the pixel each frame, so we get 8 different interleaved gradient noise textures.

For blue noise, we’ll just have 8 different blue noise textures that we generate in advance.

These are playing at 8 fps, so loop every second.

White Noise:

IG Noise:

Blue Noise:

Once again we can see that white noise is worst, blue noise is best, and interleaved gradient noise is in the middle.

When you think about it though, despite these animations all using different types of noise over SPACE, they all use white noise over time. What i mean by that is if you isolate any individual pixel in any of the images and look at it over the 8 frames, that single pixel will look like white noise.

Let’s see if we can improve that.

Golden Ratio Animated Noise

In a conversation on twitter, @R4_Unit told me that in the past he had good success by adding the golden ratio to blue noise textures to make the noise more blue over time.

The background here is that repeatedly adding the golden ratio to any number will make a low discrepancy sequence (details: When Random Numbers Are Too Random: Low Discrepancy Sequences)

The golden ratio is \frac{1+\sqrt{5}}{2} or approximately 1.61803398875, and interestingly is THE MOST irrational number that there is. Weird right?

For each of the noise types, we’ll generate a single texture for frame 0, and each subsequent frame we will add the golden ratio to each pixel. The pixel values are in the 0 to 1 space when adding the golden ratio (not 0 to 255) and we use modulus to wrap it around.

The DFT magnitude is shown on the left to show how adding the golden ratio affects frequency components.

White Noise:

IG Noise:

Blue Noise:

When I look at these side by side with the previous animations, it’s hard for me to see much of a difference. That is interesting for the case of blue noise, where it’s difficult to generate multiple blue noise textures. It means that you can get a fairly decent “blue noise” texture by adding multiples of the golden ratio to an existing blue noise texture (aka recycling!).

It’s interesting that the white noise and interleaved gradient noise don’t change their frequency spectrum much over time. On the other hand, it’s a bit sad to see that the blue noise texture gains some low frequency content so the blue noise becomes lower quality. You aren’t just getting more blue noise textures for free by adding the golden ratio, even though they are blue-ish.

Another important aspect to look at is the histogram of colors used of these images when adding golden ratio. The ideal situation is that the starting images have roughly the same number of every color in the image, and that when adding the golden ratio for each frame, that we still use roughly the same number of every color. That turns out to be the case luckily.

The white noise histogram has a minimum count of 213, a maximum count of 303, an average count of 256 (the image is 256×256), and a standard deviation of 15.64. Those values are the same for each frame of the animation.

For interleaved gradient noise, it has a minimum count of 245, a maximum count of 266, an average count of 256 and a standard deviation of 2.87. Those values are the same for the entire animation.

Lastly, for blue noise, it has a minimum, maximum, and average count of 256, and a standard deviation of 0. This also remains true for the entire animation.

Integration Over Time

A big reason we might want animated noise in graphics is because we are taking multiple samples and want to numerically integrate them.

Lets analyze how these two types of animations (regular and golden ratio) compare for integration.

These animations are the same as before, but on frame 1, we show the average of frame 0 and 1. On frame 2 we show the average of frame 0, 1 and 2. And so on to frame 7 which is the average of all 8 frames. This is an integration of our black and white sample points we are taking, where the correct value of the integration is the greyscale image we started with.

Here is white noise, IG noise and blue noise animated (new noise each frame), integrated over those 8 frames, playing at 8 frames a second:



Here is the same using the golden ratio to animate the noise instead:



Since it can be a little difficult to compare these things while they are in motion, here is the final frames of each method and some graphs to show the average error and standard deviation of the error, compared to the ground truth source image.

White Noise vs White Noise Golden Ratio:


IG Noise vs IG Noise Golden Ratio:


Blue Noise vs Blue Noise Golden Ratio:


Interestingly, the golden ratio average error and standard deviation (from the ground truth) are pretty even for all types of noise by frame 7, even though the blue noise is perceptually superior. This also happens for the non golden ratio integrations of blue noise and white noise. That’s part of the value of blue noise, that even if it has the same amount of error as say, white noise, it still looks better.

Another interesting observation is that interleaved gradient noise performs better at integration (at least numerically) than white or blue noise, when not using the golden ratio. The only way I can explain this is that when picking random pixel offsets to generate each frame of interleaved gradient noise, it’s somehow more blue over time than the other two methods. It’s a strange but pretty useful property.

Despite IG having success when looking at the numbers, it has very visible directional patterns which are not so nice. The fact that it is as cheap as white noise to generate, but has results much closer to blue noise perceptually is pretty awesome though.

Something else important to note is that white noise beats blue noise in the long run (higher sample counts). It’s only at these lower sample counts that blue noise is the clear winner.

Lastly, it seems like the ideal setup for integrating some values over time with a lower sample count would be to have N blue noise textures to use over N frames, but *somehow* have a constraint on those textures generated such that each individual pixel over time has blue noise distributed values.

I’m not sure how to generate that, or if it’s even possible to do so, but doing that seems like it would be pretty near the ideal for doing integration like the above.

Taking a guess at how the DFT’s would look, each individual slice seems like it should look like a totally normal blue noise texture where it’s black in the middle (low frequencies) and noisy elsewhere (high frequencies). If you had N slices of these it would look like a black cylinder surrounded by noise when taking the 3D DFT. I’m not sure though how having the constraint on individual pixels would modify the DFT, or if it even would.

This “ideal” I’m describing is different than vanilla 3d blue noise. The 3d DFT of 3d blue noise is a black sphere surrounded by noise. What I’m describing is a cylinder instead of a sphere.

3d blue noise turns out not to be great for these needs. You can read about that here:

The problem with 3D blue noise

That author also has some an interesting post on blue noise, and a zip file full of blue noise textures that you can take and use.

Free Blue Noise Textures

I have some thoughts on generating this blue noise cylinder that if they work out may very well be the next blog post.

Code

Here is the code used to generate the images in this post. It’s also on github, which also contains the source images used.

Atrix256: RandomCode/AnimatedNoise

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <atomic>
#include <thread>
#include <complex>
#include <array>

typedef uint8_t uint8;

const float c_pi = 3.14159265359f;

// settings
const bool c_doDFT = true;

// globals 
FILE* g_logFile = nullptr;

//======================================================================================
inline float Lerp (float A, float B, float t)
{
    return A * (1.0f - t) + B * t;
}

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (const SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
void ImageConvertToLuma (SImageData& image)
{
    ImageForEachPixel(
        image,
        [] (SColor& pixel, size_t pixelIndex)
        {
            float luma = float(pixel.R) * 0.3f + float(pixel.G) * 0.59f + float(pixel.B) * 0.11f;
            uint8 lumau8 = uint8(luma + 0.5f);
            pixel.R = lumau8;
            pixel.G = lumau8;
            pixel.B = lumau8;
        }
    );
}

//======================================================================================
void ImageCombine2 (const SImageData& imageA, const SImageData& imageB, SImageData& result)
{
    // put the images side by side. A on left, B on right
    ImageInit(result, imageA.m_width + imageB.m_width, max(imageA.m_height, imageB.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B on right
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
void ImageCombine3 (const SImageData& imageA, const SImageData& imageB, const SImageData& imageC, SImageData& result)
{
    // put the images side by side. A on left, B in middle, C on right
    ImageInit(result, imageA.m_width + imageB.m_width + imageC.m_width, max(max(imageA.m_height, imageB.m_height), imageC.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B in middle
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image C on right
    for (size_t y = 0; y < imageC.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3 + imageC.m_width * 3];
        SColor* srcPixel = (SColor*)&imageC.m_pixels[y * imageC.m_pitch];
        for (size_t x = 0; x < imageC.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
float GoldenRatioMultiple (size_t multiple)
{
    return float(multiple) * (1.0f + std::sqrtf(5.0f)) / 2.0f;
}

//======================================================================================
void IntegrationTest (const SImageData& dither, const SImageData& groundTruth, size_t frameIndex, const char* label)
{
    // calculate min, max, total and average error
    size_t minError = 0;
    size_t maxError = 0;
    size_t totalError = 0;
    size_t pixelCount = 0;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            totalError += error;

            if ((x == 0 && y == 0) || error < minError)
                minError = error;

            if ((x == 0 && y == 0) || error > maxError)
                maxError = error;

            ++ditherPixel;
            ++truthPixel;
            ++pixelCount;
        }
    }
    float averageError = float(totalError) / float(pixelCount);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            float diff = float(error) - averageError;

            sumSquaredDiff += diff*diff;
        }
    }
    float stdDev = std::sqrtf(sumSquaredDiff / float(pixelCount - 1));

    // report results
    fprintf(g_logFile, "%s %zu error\n", label, frameIndex);
    fprintf(g_logFile, "  min error: %zu\n", minError);
    fprintf(g_logFile, "  max error: %zu\n", maxError);
    fprintf(g_logFile, "  avg error: %0.2f\n", averageError);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "\n");
}

//======================================================================================
void HistogramTest (const SImageData& noise, size_t frameIndex, const char* label)
{
    std::array<size_t, 256> counts;
    std::fill(counts.begin(), counts.end(), 0);

    ImageForEachPixel(
        noise,
        [&] (const SColor& pixel, size_t pixelIndex)
        {
            counts[pixel.R]++;
        }
    );

    // calculate min, max, total and average
    size_t minCount = 0;
    size_t maxCount = 0;
    size_t totalCount = 0;
    for (size_t i = 0; i < 256; ++i)
    {
        if (i == 0 || counts[i] < minCount)
            minCount = counts[i];

        if (i == 0 || counts[i] > maxCount)
            maxCount = counts[i];

        totalCount += counts[i];
    }
    float averageCount = float(totalCount) / float(256.0f);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t i = 0; i < 256; ++i)
    {
        float diff = float(counts[i]) - averageCount;
        sumSquaredDiff += diff*diff;
    }
    float stdDev = std::sqrtf(sumSquaredDiff / 255.0f);

    // report results
    fprintf(g_logFile, "%s %zu histogram\n", label, frameIndex);
    fprintf(g_logFile, "  min count: %zu\n", minCount);
    fprintf(g_logFile, "  max count: %zu\n", maxCount);
    fprintf(g_logFile, "  avg count: %0.2f\n", averageCount);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "  counts: ");
    for (size_t i = 0; i < 256; ++i)
    {
        if (i > 0)
            fprintf(g_logFile, ", ");
        fprintf(g_logFile, "%zu", counts[i]);
    }

    fprintf(g_logFile, "\n\n");
}

//======================================================================================
void GenerateWhiteNoise (SImageData& image, size_t width, size_t height)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    ImageForEachPixel(
        image,
        [&] (SColor& pixel, size_t pixelIndex)
        {
            uint8 value = dist(rng);
            pixel.R = value;
            pixel.G = value;
            pixel.B = value;
        }
    );
}

//======================================================================================
void GenerateInterleavedGradientNoise (SImageData& image, size_t width, size_t height, float offsetX, float offsetY)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    for (size_t y = 0; y < height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < width; ++x)
        {
            float valueFloat = std::fmodf(52.9829189f * std::fmod(0.06711056f*float(x + offsetX) + 0.00583715f*float(y + offsetY), 1.0f), 1.0f);
            size_t valueBig = size_t(valueFloat * 256.0f);
            uint8 value = uint8(valueBig % 256);
            pixel->R = value;
            pixel->G = value;
            pixel->B = value;
            ++pixel;
        }
    }
}

//======================================================================================
void DitherWithTexture (const SImageData& ditherImage, const SImageData& noiseImage, SImageData& result)
{
    // init the result image
    ImageInit(result, ditherImage.m_width, ditherImage.m_height);

    // make the result image
    for (size_t y = 0; y < ditherImage.m_height; ++y)
    {
        SColor* srcDitherPixel = (SColor*)&ditherImage.m_pixels[y * ditherImage.m_pitch];
        SColor* destDitherPixel = (SColor*)&result.m_pixels[y * result.m_pitch];

        for (size_t x = 0; x < ditherImage.m_width; ++x)
        {
            // tile the noise in case it isn't the same size as the image we are dithering
            size_t noiseX = x % noiseImage.m_width;
            size_t noiseY = y % noiseImage.m_height;
            SColor* noisePixel = (SColor*)&noiseImage.m_pixels[noiseY * noiseImage.m_pitch + noiseX * 3];

            uint8 value = 0;
            if (noisePixel->R < srcDitherPixel->R)
                value = 255;

            destDitherPixel->R = value;
            destDitherPixel->G = value;
            destDitherPixel->B = value;

            ++srcDitherPixel;
            ++destDitherPixel;
        }
    }
}

//======================================================================================
void DitherWhiteNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_whitenoise.bmp");
}

//======================================================================================
void DitherInterleavedGradientNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_ignoise.bmp");
}

//======================================================================================
void DitherBlueNoise (const SImageData& ditherImage, const SImageData& blueNoise)
{
    printf("\n%s\n", __FUNCTION__);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, blueNoise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, blueNoise, dither, combined);
    ImageSave(combined, "out/still_bluenoise.bmp");
}

//======================================================================================
void DitherWhiteNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&](SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i + 1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedIntegrated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatio (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
int main (int argc, char** argv)
{
    // load the dither image and convert it to greyscale (luma)
    SImageData ditherImage;
    if (!ImageLoad("src/ditherimage.bmp", ditherImage))
    {
        printf("Could not load src/ditherimage.bmp");
        return 0;
    }
    ImageConvertToLuma(ditherImage);

    // load the blue noise images.
    SImageData blueNoise[8];
    for (size_t i = 0; i < 8; ++i)
    {
        char buffer[256];
        sprintf(buffer, "src/BN%zu.bmp", i);
        if (!ImageLoad(buffer, blueNoise[i]))
        {
            printf("Could not load %s", buffer);
            return 0;
        }

        // They have different values in R, G, B so make R be the value for all channels
        ImageForEachPixel(
            blueNoise[i],
            [] (SColor& pixel, size_t pixelIndex)
            {
                pixel.G = pixel.R;
                pixel.B = pixel.R;
            }
        );
    }

    g_logFile = fopen("log.txt", "w+t");
    
    // still image dither tests
    DitherWhiteNoise(ditherImage);
    DitherInterleavedGradientNoise(ditherImage);
    DitherBlueNoise(ditherImage, blueNoise[0]);

    // Animated dither tests
    DitherWhiteNoiseAnimated(ditherImage);
    DitherInterleavedGradientNoiseAnimated(ditherImage);
    DitherBlueNoiseAnimated(ditherImage, blueNoise);

    // Golden ratio animated dither tests
    DitherWhiteNoiseAnimatedGoldenRatio(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatio(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatio(ditherImage, blueNoise[0]);

    // Animated dither integration tests
    DitherWhiteNoiseAnimatedIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedIntegrated(ditherImage);
    DitherBlueNoiseAnimatedIntegrated(ditherImage, blueNoise);

    // Golden ratio animated dither integration tests
    DitherWhiteNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatioIntegrated(ditherImage, blueNoise[0]);

    fclose(g_logFile);

    return 0;
}

Transmuting White Noise To Blue, Red, Green, Purple

There are many algorithms for generating blue noise, and there are a lot of people working on new ways to do it.

It made me wonder: why don’t people just use the inverse discrete Fourier transform to make noise that has the desired frequency spectrum?

I knew there had to be a reason, since that is a pretty obvious thing to try, but I wasn’t sure if it was due to poor quality results, slower execution times, or some other reason.

After trying it myself and not getting great results I asked on twitter and Bart Wronski (@BartWronsk) clued me in.

It turns out that you can set up your frequency magnitudes such that there are only high frequencies, giving them random amplitudes, and random phases, but when you do the inverse DFT, the result isn’t guaranteed to use all possible color values (0-255) and even if it does, it may not use them evenly.

He pointed me at something that Timothy Lottes wrote up (@TimothyLottes), which talked about using some basic DSP operations to transform white noise into blue noise.

This post uses that technique to do some “Noise Alchemy” and turn white noise into a couple other types of noise. Simple single file standalone C++ source code included at bottom of the post!

Red Noise

We’ll start with red noise because it’s the simplest. Here’s how you do it:

  1. Start with white noise
  2. Low pass filter the white noise
  3. Re-normalize the histogram
  4. Repeat from step 2 as many times as desired

That’s all there is to it.

If you are wondering how you low pass filter an image, that’s another way of saying “blur”. Blurring makes the high frequency details go away, leaving the low frequency smooth shapes.

There are multiple ways to do a blur, including: box blur (averaging pixels), Gaussian blur, sinc filtering. In this post I use a Gaussian blur and get decent results, but box blurring would be cheaper/faster, and sinc filtering would be the most correct results.

An important detail about doing the blur is that your blur needs to “wrap around”. If you are blurring a pixel on the edge of the image, it should smear over to the other side of the image.

You might be wondering how you would normalize the histogram. Normalizing the histogram just means that we want to make sure that the image uses the full range of greyscale values evenly. We don’t want the noise to only use bright colors or only use dark colors, or even just MOSTLY use dark colors, for instance. If we count each color used in the image (which is the histogram I’m referring to), the counts for each color should be roughly equal.

To fix the histogram, Timothy Lottes suggests making an array that contains each pixel location and the brightness of that pixel. You first shuffle the array and then sort by brightness (Timothy uses a 64 bit int to store the pixel information, so uses a radix sort which is more efficient for fixed size keys). Next set the brightness of each item in the array to be it’s index divided by the number of items in the list to put them in the 0 to 1 range. Lastly you write the brightness values back out to the image, using the pixel locations you stored off.

What this process does is makes sure that the full range of greyscale values are used, and that they are used evenly. It also preserves the order of the brightness of the pixels; if pixel A was darker than pixel B before this process, it still will be darker after this process.

You may wonder why the shuffle is done before the sort. That’s done so that if there are any ties between values that it will be random which one is darker after this process. This is important because if it wasn’t random, there may be obvious (ugly/incorrect) patterns in the results.

When normalizing the histogram, it affects the frequency composition of the image, but if doing this process multiple times, it seems to do an OK job of converging to decent results.

Red noise has low frequency content which means it doesn’t have sharp details / fast data changes. An interesting property of 2d red noise is that if you take a random walk on the 2d red noise texture, that the values you’d hit would be a random walk of 1d values. Also, if you draw a straight line randomly on the texture, the pixels it passes through will be a random walk. That is, you’ll get random numbers, but each number will be pretty close to the previous one.

The formal definition of red noise has a more strict definition about frequency content than what we are going for in this post. (Wikipedia: red noise)

Here’s red noise (top) and the frequency magnitudes (bottom) using 5 iterations of the described process, and a blur sigma (strength of blur) of 1.0:


Using different blur strengths controls what frequencies get attenuated. Weaker blurs leave higher frequency components.

Here is red noise generated the same way but using a blur sigma of 0.5:


And here is red noise generated using a blur sigma of 2.0


Here are some animated gifs showing the evolution of the noise as well as the frequencies over time:

Sigma 0.5:

Sigma 1.0:

Sigma 2.0:

Blue Noise

To make blue noise, you use the exact same method but instead of using a low pass filter you use a high pass filter.

An easy way to high pass filter an image is to do a low pass filter to get the low frequency content, and subtract that from the original image so that you are left with the high frequency content.

Blue noise has high frequency content which means it is only made up of sharp details / fast data changes. An interesting property of 2d blue noise is that if you take a random walk (or a straight line walk) on it in any direction, you’ll get a low discrepancy sequence. That is, you’ll get random numbers, but each number will be very different from the previous one.

The formal definition of blue noise has a more strict definition about frequency content than what we are going for in this post. (Wikipedia: blue noise)

Here is blue noise using 5 iterations and a blur sigma of 1.0:

Just like with red noise, changing the strength of the blur controls what frequencies get attenuated.

Here is a sigma of 0.5:


Here is a sigma of 2.0:


Animations of sigma 0.5:

Animations of sigma 1.0:

Animations of sigma 2.0:

Green Noise

Green noise is noise that doesn’t have either low or high frequency components, only mid frequency components.

To make green noise, use you a “band pass” filter, which is a filter that gets rid of both high and low frequency components leaving only the middle.

Here’s how to make a band pass filter:

  1. Make a weak blur of the image – this is the image without the highest frequencies.
  2. Make a strong blur of the image – this is the image with just the lowest frequencies.
  3. Subtract the strong blur from the weak blur – this is the image with just the middle frequencies.

Here is 5 iterations using a sigma of 0.5 and 2.0:

Here is the animation of it evolving:

Nathan Reed (@ReedBeta) mentioned that the green noise looked a lot like Perlin noise, which made sense due to Perlin noise being designed to be band limited, which makes it easier to control the look of perlin noise by summing mulitple octaves. This makes sense to me because you basically can control what frequencies you put noise into by scaling the frequency ring.

Fabien Giesen (@rygorous) said this also helps with mipmapping. This makes sense to me because there can’t be (as much) aliasing with the higher frequencies missing from the noise.

Purple Noise

I’ve never heard of this noise so it may have another name, but what I’m calling purple noise is just noise which has high and low frequency content, but no middle frequency content. It’s basically red noise plus blue noise.

You could literally make red noise and add it to blue noise to make purple noise, but how I made it for this post is to use a “band stop” filter.

A band stop filter is a filter that gets rid of middle frequencies and leaves high and low frequencies alone.

To band stop filter an image, you do a band pass filter to get the middle frequencies (as described in the last section!), and then subtract that from the original image to get only the low and high frequencies.

Here is 5 iterations using a sigma of 0.5 and 2.0:

Here is the animation:

Links

This technique might be useful if you ever need to generate specific types of noise quickly, but if you are just generating noise textures to use later in performance critical situations, there are better algorithms to use. When generating textures offline in advance, you have “all the time in the world”, so it is probably not worth the simplicity of this algorithm, when the trade off is less good noise results.

Dithering part two – golden ratio sequence, blue noise and highpass-and-remap (Bart Wronski)

VDR Follow Up – Fine Art of Film Grain (Timothy Lottes)

Gaussian Blur (Me)

Image DFT / IDFT (me)

Blue-noise Dithered Sampling (Solid Angle) – a better way to generate colored noises

Apparently there is a relation between blue noise, turing patterns / reaction diffusion and these filtering techniques. (Thanks @R4_Unit!)
Turing Patterns in Photoshop

Here’s a link about generating point samples in specific color distributions (Thanks @nostalgiadriven!)
Point Sampling with General Noise Spectrum

Here is an interesting shadertoy which uses the mip map of a noise texture to get the low frequency content to do a high pass filter: (found by @paniq, who unfortunately got nerd sniped by this noise generation stuff hehe)
pseudo blue noise 2

Source Code

The source code to generate the images is below, but is also on githib at Atrix256 – NoiseShaping

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <array>
#include <thread>
#include <complex>
#include <atomic>

typedef uint8_t uint8;
typedef int64_t int64;

const float c_pi = 3.14159265359f;

// settings
const size_t    c_imageSize = 256;
const bool      c_doDFT = true;
const float     c_blurThresholdPercent = 0.005f; // lower numbers give higher quality results, but take longer. This is 0.5%
const float     c_numBlurs = 5;

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataFloat
{
    SImageDataFloat()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    std::vector<float> m_pixels;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
void ImageFloatInit (SImageDataFloat& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pixels.resize(image.m_width * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0.0f);
}

//======================================================================================
int PixelsNeededForSigma (float sigma)
{
    // returns the number of pixels needed to represent a gaussian kernal that has values
    // down to the threshold amount.  A gaussian function technically has values everywhere
    // on the image, but the threshold lets us cut it off where the pixels contribute to
    // only small amounts that aren't as noticeable.
    return int(floor(1.0f + 2.0f * sqrtf(-2.0f * sigma * sigma * log(c_blurThresholdPercent)))) + 1;
}

//======================================================================================
float Gaussian (float sigma, float x)
{
    return expf(-(x*x) / (2.0f * sigma*sigma));
}

//======================================================================================
float GaussianSimpsonIntegration (float sigma, float a, float b)
{
    return
        ((b - a) / 6.0f) *
        (Gaussian(sigma, a) + 4.0f * Gaussian(sigma, (a + b) / 2.0f) + Gaussian(sigma, b));
}

//======================================================================================
std::vector<float> GaussianKernelIntegrals (float sigma, int taps)
{
    std::vector<float> ret;
    float total = 0.0f;
    for (int i = 0; i < taps; ++i)
    {
        float x = float(i) - float(taps / 2);
        float value = GaussianSimpsonIntegration(sigma, x - 0.5f, x + 0.5f);
        ret.push_back(value);
        total += value;
    }
    // normalize it
    for (unsigned int i = 0; i < ret.size(); ++i)
    {
        ret[i] /= total;
    }
    return ret;
}

//======================================================================================
const float* GetPixelWrapAround (const SImageDataFloat& image, int x, int y)
{
    if (x >= (int)image.m_width)
    {
        x = x % (int)image.m_width;
    }
    else
    {
        while (x < 0)
            x += (int)image.m_width;
    }

    if (y >= (int)image.m_height)
    {
        y = y % (int)image.m_height;
    }
    else
    {
        while (y < 0)
            y += (int)image.m_height;
    }

    return &image.m_pixels[(y * image.m_width) + x];
}

//======================================================================================
void ImageGaussianBlur (const SImageDataFloat& srcImage, SImageDataFloat &destImage, float xblursigma, float yblursigma, unsigned int xblursize, unsigned int yblursize)
{
    // allocate space for copying the image for destImage and tmpImage
    ImageFloatInit(destImage, srcImage.m_width, srcImage.m_height);
 
    SImageDataFloat tmpImage;
    ImageFloatInit(tmpImage, srcImage.m_width, srcImage.m_height);
 
    // horizontal blur from srcImage into tmpImage
    {
        auto row = GaussianKernelIntegrals(xblursigma, xblursize);
 
        int startOffset = -1 * int(row.size() / 2);
 
        for (int y = 0; y < tmpImage.m_height; ++y)
        {
            for (int x = 0; x < tmpImage.m_width; ++x)
            {
                float blurredPixel = 0.0f;
                for (unsigned int i = 0; i < row.size(); ++i)
                {
                    const float *pixel = GetPixelWrapAround(srcImage, x + startOffset + i, y);
                    blurredPixel += pixel[0] * row[i];
                }
                 
                float *destPixel = &tmpImage.m_pixels[y * tmpImage.m_width + x];
                destPixel[0] = blurredPixel;
            }
        }
    }
 
    // vertical blur from tmpImage into destImage
    {
        auto row = GaussianKernelIntegrals(yblursigma, yblursize);
 
        int startOffset = -1 * int(row.size() / 2);
 
        for (int y = 0; y < destImage.m_height; ++y)
        {
            for (int x = 0; x < destImage.m_width; ++x)
            {
                float blurredPixel = 0.0f;
                for (unsigned int i = 0; i < row.size(); ++i)
                {
                    const float *pixel = GetPixelWrapAround(tmpImage, x, y + startOffset + i);
                    blurredPixel += pixel[0] * row[i];
                }
 
                float *destPixel = &destImage.m_pixels[y * destImage.m_width + x];
                destPixel[0] = blurredPixel;
            }
        }
    }
}

//======================================================================================
void SaveImageFloatAsBMP (const SImageDataFloat& imageFloat, const char* fileName)
{
    printf("\n%s\n", fileName);

    // init the image
    SImageData image;
    ImageInit(image, imageFloat.m_width, imageFloat.m_height);

    // write the data to the image
    const float* srcData = &imageFloat.m_pixels[0];
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y*image.m_pitch];

        for (size_t x = 0; x < image.m_width; ++x)
        {
            uint8 value = uint8(255.0f * srcData[0]);

            pixel->Set(value, value, value);

            ++pixel;
            ++srcData;
        }
    }

    // save the image
    ImageSave(image, fileName);

    // also save a DFT of the image
    if (c_doDFT)
    {
        SImageDataComplex dftData;
        ImageDFT(image, dftData);

        SImageData DFTMagImage;
        GetMagnitudeData(dftData, DFTMagImage);

        char buffer[256];
        sprintf(buffer, "%s.mag.bmp", fileName);

        ImageSave(DFTMagImage, buffer);
    }
}

//======================================================================================
void NormalizeHistogram (SImageDataFloat& image)
{
    struct SHistogramHelper
    {
        float value;
        size_t pixelIndex;
    };
    static std::vector<SHistogramHelper> pixels;
    pixels.resize(image.m_width * image.m_height);

    // put all the pixels into the array
    for (size_t i = 0, c = image.m_width * image.m_height; i < c; ++i)
    {
        pixels[i].value = image.m_pixels[i];
        pixels[i].pixelIndex = i;
    }

    // shuffle the pixels to randomly order ties. not as big a deal with floating point pixel values though
    static std::random_device rd;
    static std::mt19937 rng(rd());
    std::shuffle(pixels.begin(), pixels.end(), rng);

    // sort the pixels by value
    std::sort(
        pixels.begin(),
        pixels.end(),
        [] (const SHistogramHelper& a, const SHistogramHelper& b)
        {
            return a.value < b.value;
        }
    );

    // use the pixel's place in the array as the new value, and write it back to the image
    for (size_t i = 0, c = image.m_width * image.m_height; i < c; ++i)
    {
        float value = float(i) / float(c - 1);
        image.m_pixels[pixels[i].pixelIndex] = value;
    }
}

//======================================================================================
void BlueNoiseTest (float blurSigma)
{
    // calculate the blur size from our sigma
    int blurSize = PixelsNeededForSigma(blurSigma) | 1;

    // setup the randon number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);

    // generate some white noise
    SImageDataFloat noise;
    ImageFloatInit(noise, c_imageSize, c_imageSize);
    for (float& v : noise.m_pixels)
    {
        v = dist(rng);
    }

    // save off the starting white noise
    const char* baseFileName = "bluenoise_%i_%zu.bmp";
    char fileName[256];

    sprintf(fileName, baseFileName, int(blurSigma * 100.0f), 0);
    SaveImageFloatAsBMP(noise, fileName);

    // iteratively high pass filter and rescale histogram to the 0 to 1 range
    SImageDataFloat blurredImage;
    for (size_t blurIndex = 0; blurIndex < c_numBlurs; ++blurIndex)
    {
        // get a low passed version of the current image
        ImageGaussianBlur(noise, blurredImage, blurSigma, blurSigma, blurSize, blurSize);

        // subtract the low passed version to get the high passed version
        for (size_t pixelIndex = 0; pixelIndex < c_imageSize * c_imageSize; ++pixelIndex)
            noise.m_pixels[pixelIndex] -= blurredImage.m_pixels[pixelIndex];

        // put all pixels between 0.0 and 1.0 again
        NormalizeHistogram(noise);

        // save this image
        sprintf(fileName, baseFileName, int(blurSigma * 100.0f), blurIndex + 1);
        SaveImageFloatAsBMP(noise, fileName);
    }
}

//======================================================================================
void RedNoiseTest (float blurSigma)
{
    // calculate the blur size from our sigma
    int blurSize = PixelsNeededForSigma(blurSigma) | 1;

    // setup the randon number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);

    // generate some white noise
    SImageDataFloat noise;
    ImageFloatInit(noise, c_imageSize, c_imageSize);
    for (float& v : noise.m_pixels)
    {
        v = dist(rng);
    }

    // save off the starting white noise
    const char* baseFileName = "rednoise_%i_%zu.bmp";
    char fileName[256];

    sprintf(fileName, baseFileName, int(blurSigma * 100.0f), 0);
    SaveImageFloatAsBMP(noise, fileName);

    // iteratively high pass filter and rescale histogram to the 0 to 1 range
    SImageDataFloat blurredImage;
    for (size_t blurIndex = 0; blurIndex < c_numBlurs; ++blurIndex)
    {
        // get a low passed version of the current image
        ImageGaussianBlur(noise, blurredImage, blurSigma, blurSigma, blurSize, blurSize);

        // set noise image to the low passed version
        noise.m_pixels = blurredImage.m_pixels;

        // put all pixels between 0.0 and 1.0 again
        NormalizeHistogram(noise);

        // save this image
        sprintf(fileName, baseFileName, int(blurSigma * 100.0f), blurIndex + 1);
        SaveImageFloatAsBMP(noise, fileName);
    }
}

//======================================================================================
void BandPassTest (float blurSigma1, float blurSigma2)
{
    // calculate the blur size from our sigma
    int blurSize1 = PixelsNeededForSigma(blurSigma1) | 1;
    int blurSize2 = PixelsNeededForSigma(blurSigma2) | 1;

    // setup the randon number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);

    // generate some white noise
    SImageDataFloat noise;
    ImageFloatInit(noise, c_imageSize, c_imageSize);
    for (float& v : noise.m_pixels)
    {
        v = dist(rng);
    }

    // save off the starting white noise
    const char* baseFileName = "bandpass_%i_%i_%zu.bmp";
    char fileName[256];

    sprintf(fileName, baseFileName, int(blurSigma1 * 100.0f), int(blurSigma2 * 100.0f), 0);
    SaveImageFloatAsBMP(noise, fileName);

    // iteratively high pass filter and rescale histogram to the 0 to 1 range
    SImageDataFloat blurredImage1;
    SImageDataFloat blurredImage2;
    for (size_t blurIndex = 0; blurIndex < c_numBlurs; ++blurIndex)
    {
        // get two low passed versions of the current image
        ImageGaussianBlur(noise, blurredImage1, blurSigma1, blurSigma1, blurSize1, blurSize1);
        ImageGaussianBlur(noise, blurredImage2, blurSigma2, blurSigma2, blurSize2, blurSize2);

        // subtract one low passed version from the other
        for (size_t pixelIndex = 0; pixelIndex < c_imageSize * c_imageSize; ++pixelIndex)
            noise.m_pixels[pixelIndex] = blurredImage1.m_pixels[pixelIndex] - blurredImage2.m_pixels[pixelIndex];

        // put all pixels between 0.0 and 1.0 again
        NormalizeHistogram(noise);

        // save this image
        sprintf(fileName, baseFileName, int(blurSigma1 * 100.0f), int(blurSigma2 * 100.0f), blurIndex + 1);
        SaveImageFloatAsBMP(noise, fileName);
    }
}

//======================================================================================
void BandStopTest (float blurSigma1, float blurSigma2)
{
    // calculate the blur size from our sigma
    int blurSize1 = PixelsNeededForSigma(blurSigma1) | 1;
    int blurSize2 = PixelsNeededForSigma(blurSigma2) | 1;

    // setup the randon number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);

    // generate some white noise
    SImageDataFloat noise;
    ImageFloatInit(noise, c_imageSize, c_imageSize);
    for (float& v : noise.m_pixels)
    {
        v = dist(rng);
    }

    // save off the starting white noise
    const char* baseFileName = "bandstop_%i_%i_%zu.bmp";
    char fileName[256];

    sprintf(fileName, baseFileName, int(blurSigma1 * 100.0f), int(blurSigma2 * 100.0f), 0);
    SaveImageFloatAsBMP(noise, fileName);

    // iteratively high pass filter and rescale histogram to the 0 to 1 range
    SImageDataFloat blurredImage1;
    SImageDataFloat blurredImage2;
    for (size_t blurIndex = 0; blurIndex < c_numBlurs; ++blurIndex)
    {
        // get two low passed versions of the current image
        ImageGaussianBlur(noise, blurredImage1, blurSigma1, blurSigma1, blurSize1, blurSize1);
        ImageGaussianBlur(noise, blurredImage2, blurSigma2, blurSigma2, blurSize2, blurSize2);

        // subtract one low passed version from the other to get the pandpass noise, and subtract that from the original noise to get the band stop noise
        for (size_t pixelIndex = 0; pixelIndex < c_imageSize * c_imageSize; ++pixelIndex)
            noise.m_pixels[pixelIndex] -= (blurredImage1.m_pixels[pixelIndex] - blurredImage2.m_pixels[pixelIndex]);

        // put all pixels between 0.0 and 1.0 again
        NormalizeHistogram(noise);

        // save this image
        sprintf(fileName, baseFileName, int(blurSigma1 * 100.0f), int(blurSigma2 * 100.0f), blurIndex + 1);
        SaveImageFloatAsBMP(noise, fileName);
    }
}

//======================================================================================
int main (int argc, char ** argv)
{
    BlueNoiseTest(0.5f);
    BlueNoiseTest(1.0f);
    BlueNoiseTest(2.0f);

    RedNoiseTest(0.5f);
    RedNoiseTest(1.0f);
    RedNoiseTest(2.0f);

    BandPassTest(0.5f, 2.0f);

    BandStopTest(0.5f, 2.0f);

    return 0;
}

Generating Blue Noise Sample Points With Mitchell’s Best Candidate Algorithm

Lately I’ve been eyeball deep in noise, ordered dithering and related topics, and have been learning some really interesting things.

As the information coalesces it’ll become apparent whether there is going to be a huge mega post coming, or if there will be several smaller ones.

In the meantime, I wanted to share this bite sized chunk of information.

Three sampling patterns that are most often used when sampling (say, when numerically integrating a lighting function for graphics/rendering purposes) are: regular samples, white noise samples, and blue noise samples.

Regular Sampling

Regular sampling just means evenly spacing the samples. This sampling strategy can suffer from aliasing, but gives good coverage of the sample space.

Here are 256, 1024 and 4096 samples:


Here are those samples taken from a source image:


Here is the DFT (frequency amplitude) of those samples:


White Noise Sampling

White noise sampling just chooses random numbers for where to place the samples.
The random numbers are uniformly distributed, meaning they are just plain vanilla random numbers where each number is equally likely to come up.

White noise sampling can make for noisy results, and suffers from the fact that white noise sample points can clump together and leave empty space. Clumped sample points give redundant information while empty space is information that you are lacking in the sample space. In general, noise is often desired over aliasing though, so white noise samples are generally preferred over regular sampling. Monte Carlo integration also requires random samples.

White noise is called white noise because it contains all frequencies approximately evenly, like how white light is made up of all frequencies of light.

Here are 256, 1024 and 4096 samples:


Here are those samples taken from a source image:


Here is the DFT (frequency amplitude) of those samples:


Blue Noise Sampling

Lastly is blue noise sampling which is somewhere between regular sampling and white noise sampling. Blue noise sampling has randomly placed points like white noise does, but the randomly placed points are approximately evenly spaced, which is more like regular sampling.

Things like low discrepancy sequences, stratified sampling, and jittered regular sampling mimic blue noise, and are often a cheaper alternative when an approximation is acceptable. More info on low discrepancy sequences is available on my post here: When Random Numbers Are Too Random: Low Discrepancy Sequences

Blue noise is called blue noise because it contains higher amounts of higher frequencies and lower amounts of lower frequencies. This is the same of blue light, which contains higher frequency (bluer) light.

Here are 256, 1024 and 4096 samples:


Here are those samples taken from a source image:


Here is the DFT (frequency amplitude) of those samples:


Comparison

Imagine you were a robot with 4096 light/color sensors. Which of the arrangements below do you think would give you the best information about the world around you with that number of sensors?



To me, the regular grid and the blue noise are a bit of a coin toss, while the white noise version is awful.

The regular grid does seem to show me straight lined things better (the road, sidewalk, etc), but it makes non straight lined things – like the trees – look blocky. The blue noise grid does the reverse and makes straight things look wavy, while making it easier to see the true shape of non straight things.

Mathematically, blue noise is superior sampling, so maybe this example isn’t the best way to show the value of blue noise.

Here is the real image:

Apparently the photo-receptors in our eyes are arranged in a blue noise pattern. Some people say this is why blue noise is more agreeable with our perception, but since it also helps numerical integration converge faster for lower sample counts (compared to white noise – which wins out with larger sample counts BTW!), it seems like there is a more fundamental reason which would cause an evolutionary advantage for them to be arranged that way in the first place.

Generating Blue Noise Sample Points

The obvious question is: I know how to make uniform and random sample points. How do I make blue noise sample points?

There are multiple ways to do it, but a method that I find very easy to understand and to implement is “Mitchell’s Best Candidate Algorithm”.

The algorithm is as follows:

  1. Place a random sample point as the first sample point.
  2. Generate some number of random dots as candidates to be the next sample point.
  3. Whichever of these dots is farthest away from the closest existing sample point is the winner. Place that dot as the new sample point.
  4. GOTO 2 and Repeat until you have as many sample points as you want.

The algorithm is pretty simple, but there are two other important details that are needed to give you good results:

  • When calculating distance between dots, you need to consider wrap around. More info on how to do that here: Calculating the Distance Between Points in “Wrap Around” (Toroidal) Space.
  • The number of candidates you generate should scale up with the number of existing sample points. As the original paper describing this technique says, doing that helps ensure that the statistics of your sample points stay constant as the number of sample points changes.

When I first tried to get this algorithm working, I was generating a fixed number of candidates each time. That gave me these pretty terrible results:

However, when I multiplied the number of existing sample points by a constant “m” as the number of sample points to generate, I got much better results, even when m was 1! (Note: m=0 is the same as white noise in this image. I did NumSamples*m+1 candidates each time.)

Related Computer Graphics Stack Exchange Question: Mitchell’s Best Candidate Algorithm

If you store existing sample points in a grid, you can speed up the algorithm since it will be faster to find the closest point to a candidate. In the implementation on this post I didn’t do that.

You may be able to multithread this algorithm but I haven’t tried it. The idea would be if you needed to make and test N candidates, that you could split that among M threads, so long as N was large enough to make that worth while. I didn’t do that in this post.

Lastly, instead of working with distance, you can work with SQUARED distance to avoid many unnecessary square root calculations. The example code here does that optimization.

Links

The 1991 paper that described this technique:
Spectrally optimal sampling for distribution ray tracing

Another interesting link on this algorithm:
Mitchell’s Best-Candidate

This algorithm isn’t that great for making dense sample points, or for use in dithering / stippling. Look for a future blog post about those usage cases, but for now, this is a great resource:
Free Blue Noise Textures (and info / examples on blue noise texture usage)

A physics based approach to blue noise distributed samples:
Electrostatic Half Toning

A neat read on the “void and cluster” method for generating blue noise, and also a good read on what ordered dithering is all about:
The void and cluster method for dither array generation

Source Code

Here is some simple standalone C++ source code which can generate blue noise sample points, and also generated the images used in this post.

It’s also on github (along with the source image) at https://github.com/Atrix256/RandomCode/tree/master/Mitchell

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <complex>
#include <thread>
#include <atomic>
#include <random>
#include <array>

typedef uint8_t uint8;
typedef int64_t int64;

const float c_pi = 3.14159265359f;

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};

SImageData s_stippleImage;
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
 
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};

//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
 
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
 
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
 
    return ret;
}
 
//======================================================================================
void DFTImage (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.

    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);

    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;

    std::vector<std::thread> threads;
    threads.resize(numThreads);

    printf("Doing DFT with %zu threads...\n", numThreads);

    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;

                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }

                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }

                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }

    for (std::thread& t : threads)
        t.join();

    printf("\n");
}

//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
 
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
 
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
 
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
 
    const float c = 255.0f / log(1.0f+maxmag);
 
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
 
            uint8 magu8 = uint8(src);
 
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}
 
//======================================================================================
void GetPhaseData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
 
    // get floating point phase data, and encode it in [0,255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
 
            // get phase, and change it from [-pi,+pi] to [0,255]
            float phase = (0.5f + 0.5f * std::atan2(src.real(), src.imag()) / c_pi);
            if (phase < 0.0f)
                phase = 0.0f;
            if (phase > 1.0f)
                phase = 1.0;
            uint8 phase255 = uint8(phase * 255);
 
            // write the phase as grey scale color
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = phase255;
            dest[1] = phase255;
            dest[2] = phase255;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_width);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
void SampleTest (const SImageData& image, const SImageData& samples, const char* fileName)
{
    SImageData outImage;
    ImageInit(outImage, image.m_width, image.m_height);

    for (size_t y = 0; y < image.m_height; ++y)
    {
        size_t sampleY = y % samples.m_height;
        for (size_t x = 0; x < image.m_width; ++x)
        {
            size_t sampleX = x % samples.m_width;

            const SColor* samplePixel = (SColor*)&samples.m_pixels[sampleY*samples.m_pitch + sampleX * 3];
            const SColor* imagePixel = (SColor*)&image.m_pixels[y*image.m_pitch + x * 3];

            SColor* outPixel = (SColor*)&outImage.m_pixels[y*outImage.m_pitch + x * 3];

            if (samplePixel->R == 255)
                *outPixel = *imagePixel;
        }
    }

    ImageSave(outImage, fileName);
}

//======================================================================================
inline float Distance (size_t x1, size_t y1, size_t x2, size_t y2, int imageWidth)
{
    // this returns the toroidal distance between the points
    // aka the interval [0, width) wraps around
    float dx = std::abs(float(x2) - float(x1));
    float dy = std::abs(float(y2) - float(y1));

    if (dx > float(imageWidth / 2))
        dx = float(imageWidth) - dx;

    if (dy > float(imageWidth / 2))
        dy = float(imageWidth) - dy;

    // returning squared distance cause why not
    return dx*dx + dy*dy;
}

//======================================================================================
int main (int argc, char** argv)
{
    const size_t c_imageSize = 256;
    const bool c_doDFT = true;

    const size_t c_blueNoiseSampleMultiplier = 1;

    const size_t samples1 = 256;   // 16x16
    const size_t samples2 = 1024;  // 32x32
    const size_t samples3 = 4096; // 128x128

    // load the source image
    SImageData image;
    ImageLoad("Image.bmp", image);

    // init random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<int> dist(0, c_imageSize - 1);

    // white noise
    {
        SImageData samples;
        ImageInit(samples, c_imageSize, c_imageSize);

        for (size_t i = 1; i <= samples3; ++i)
        {
            size_t x = dist(rng);
            size_t y = dist(rng);

            SColor* pixel = (SColor*)&samples.m_pixels[y*samples.m_pitch + x * 3];
            pixel->R = pixel->G = pixel->B = 255;

            if (i == samples1 || i == samples2 || i == samples3)
            {
                printf("White Noise %zu samples\n", i);

                char fileName[256];
                sprintf(fileName, "WhiteNoise_%zu.bmp", i);
                ImageSave(samples, fileName);

                sprintf(fileName, "WhiteNoise_%zu_samples.bmp", i);
                SampleTest(image, samples, fileName);

                if (c_doDFT)
                {
                    SImageDataComplex frequencyData;
                    DFTImage(samples, frequencyData);

                    SImageData magnitudeData;
                    GetMagnitudeData(frequencyData, magnitudeData);

                    sprintf(fileName, "WhiteNoise_%zu_mag.bmp", i);
                    ImageSave(magnitudeData, fileName);
                }
            }
        }
    }

    // regular samples
    {

        auto GridTest = [&] (size_t sampleCount) {
            SImageData samples;
            ImageInit(samples, c_imageSize, c_imageSize);

            size_t side = size_t(std::sqrt(float(sampleCount)));

            size_t pixels = c_imageSize / side;

            for (size_t y = 0; y < side; ++y)
            {
                size_t pixelY = y * pixels;
                for (size_t x = 0; x < side; ++x)
                {
                    size_t pixelX = x * pixels;

                    SColor* pixel = (SColor*)&samples.m_pixels[pixelY*samples.m_pitch + pixelX * 3];
                    pixel->R = pixel->G = pixel->B = 255;
                }
            }

            printf("Regular %zu samples\n", sampleCount);

            char fileName[256];
            sprintf(fileName, "Regular_%zu.bmp", sampleCount);
            ImageSave(samples, fileName);

            sprintf(fileName, "Regular_%zu_samples.bmp", sampleCount);
            SampleTest(image, samples, fileName);

            if (c_doDFT)
            {
                SImageDataComplex frequencyData;
                DFTImage(samples, frequencyData);

                SImageData magnitudeData;
                GetMagnitudeData(frequencyData, magnitudeData);

                sprintf(fileName, "Regular_%zu_mag.bmp", sampleCount);
                ImageSave(magnitudeData, fileName);
            }
        };

        GridTest(samples1);
        GridTest(samples2);
        GridTest(samples3);
    }

    // blue noise
    {
        SImageData samples;
        ImageInit(samples, c_imageSize, c_imageSize);

        std::vector<std::array<size_t, 2>> samplesPos;

        size_t percent = (size_t)-1;

        for (size_t i = 1; i <= samples3; ++i)
        {
            size_t newPercent;
            if (i <= samples1)
                newPercent = size_t(100.0f * float(i) / float(samples1));
            else if (i <= samples2)
                newPercent = size_t(100.0f * float(i - samples1) / float(samples2 - samples1));
            else
                newPercent = size_t(100.0f * float(i - samples2) / float(samples3 - samples2));
            if (percent != newPercent)
            {
                percent = newPercent;
                printf("\rGenerating Blue Noise Samples: %zu%%", percent);
            }

            // keep the candidate that is farthest from it's closest point
            size_t numCandidates = samplesPos.size() * c_blueNoiseSampleMultiplier + 1;
            float bestDistance = 0.0f;
            size_t bestCandidateX = 0;
            size_t bestCandidateY = 0;
            for (size_t candidate = 0; candidate < numCandidates; ++candidate)
            {
                size_t x = dist(rng);
                size_t y = dist(rng);

                // calculate the closest distance from this point to an existing sample
                float minDist = FLT_MAX;
                for (const std::array<size_t, 2>& samplePos : samplesPos)
                {
                    float dist = Distance(x, y, samplePos[0], samplePos[1], c_imageSize);
                    if (dist < minDist)
                        minDist = dist;
                }

                if (minDist > bestDistance)
                {
                    bestDistance = minDist;
                    bestCandidateX = x;
                    bestCandidateY = y;
                }
            }
            samplesPos.push_back({ bestCandidateX, bestCandidateY });

            SColor* pixel = (SColor*)&samples.m_pixels[bestCandidateY*samples.m_pitch + bestCandidateX * 3];
            pixel->R = pixel->G = pixel->B = 255;

            if (i == samples1 || i == samples2 || i == samples3)
            {
                printf("\nBlue Noise %zu samples\n", i);

                char fileName[256];
                sprintf(fileName, "BlueNoise_%zu.bmp", i);
                ImageSave(samples, fileName);

                sprintf(fileName, "BlueNoise_%zu_samples.bmp", i);
                SampleTest(image, samples, fileName);

                if (c_doDFT)
                {
                    SImageDataComplex frequencyData;
                    DFTImage(samples, frequencyData);

                    SImageData magnitudeData;
                    GetMagnitudeData(frequencyData, magnitudeData);

                    sprintf(fileName, "BlueNoise_%zu_mag.bmp", i);
                    ImageSave(magnitudeData, fileName);
                }
            }
        }
    }

    return 0;
}

Calculating the Distance Between Points in “Wrap Around” (Toroidal) Space

Let’s say you are trying to find the distance between two points in 2D, but that these points are in a universe that “wraps around” like old video games – leaving the screen on the right, left, top or bottom side makes you re-appear on the opposite edge.

This universe is actually shaped like a toroid, also known as a doughnut. It’s actually an impossible object, a “flat torus”, so not exactly a doughnut, but whatever.

If you imagine yourself on the surface of a doughnut, it would behave exactly this way. If you go “down” you end up where you previously considered “up”. If you go far enough “left” you end up where you previously considered “right”.

How would you calculate the distance between two points in a universe like this?

Let’s imagine the situation below where we are trying to find the distance between the red point and the green point:

One way to do this would be to pick one of the points (I’m picking red in this case) and clone it 8 times to surround the cell like the below. You’d calculate the distance from the green point to each of the 9 red points, and whatever distance was smallest would be the answer.

Something not so desirable about this is that it takes 9 distance calculations to find the minimum distance. You can work with squared distances instead of regular distances to avoid a square root on each of these distance calculations, but that’s still a bit of calculation to do.

Going up in dimensions makes the problem even worse. In 3D, it requires 27 distance calculations to find the shortest point, and 81 distance calculations in 4D!

Luckily there’s a better way to approach this.

Let’s say that our universe (image) is 1 unit by 1 unit big (aka we are working in texture UVs). If you look at the image with 9 copies of the red dot, you can see that they are just the 9 possible combinations of having -1, +0, +1 on each axis added to the red dot’s coordinates. All possible combinations of the x and y axis having -1, +0 or +1 added to them are valid locations of the red dot.

Looking at the distance formula we can see that if we minimize each axis individually, that we will also end up with the minimal distance overall.

d = \sqrt{(x_2-x_1)^2+(y_2-y_1)^2}

So, the better way is to minimize each axis individually.

On the x axis you’d find if the x axis distance between the red and green point is minimal when you subtract 1 from the red dot’s x axis position, leave it alone, or add 1.

Whichever x axis value of the red dot gives you the minimal x axis 1D distance is the x axis location to use.

You’d repeat for the y axis to get the y axis location to use (and would repeat for any further axes for higher dimensions).

This gives you the closest point which you can then plug into the distance formula to get the distance between the points in this wrap around space.

You can actually do better though.

Still working on each axis individually, you can calculate the absoluate value of the 1D distance between the two points on that axis. If that distance is greater than 0.5, the real distance for that axis is 1-distance.

The intuition here is that if you are in a 1d repeating space, if going from A to B is more than half the distance, it means that you went the wrong way, and that going the other way is shorter. The distance of that other way is one minus whatever distance you just calculated since the distance from one point to itself is 1!

Do that for each axis and use those 1d distances in the distance formula to get the actual distance.

This lets you minimize the distance without having to explicitly figure out which combination makes the point closest.

More importantly, it lets you efficiently calculate the distance between the two points in toroidal space (doughnut space!)

The computational complexity is a lot better. It’s now linear in the number of dimensions: O(N), instead of O(3^N).

Here is some C++ to show you how it would work in 2D.

float ToroidalDistance (float x1, float y1, float x2, float y2)
{
    float dx = std::abs(x2 - x1);
    float dy = std::abs(y2 - y1);

    if (dx > 0.5f)
        dx = 1.0f - dx;

    if (dy > 0.5f)
        dy = 1.0f - dy;

    return std::sqrt(dx*dx + dy*dy);
}

I hit this problem trying to make a tileable texture. I needed to place a few circles on a texture such that the circles weren’t too close to each other, even when the texture was tiled.

The calculations above gave me the basic tool needed to be able to calculate distances between points. Subtracting circle radii from the distance between points let me get toroidal distance between circles and make sure I didn’t place them too closely to each other.

That let me make an image that kept the distance constraints even when it was tiled.

Here’s an example image by itself:

Here is the image tiled:

Half Tile Offset Streaming World Grids

A number of years ago I worked on an open world game that got cancelled. Despite it not being released, I learned a few things.

If you try to make a game where the player can walk around a large world without loading screens, chances are that the whole world won’t fit in memory at once.

Since it can’t all fit in memory at once, as the player moves around you are going to have to unload old chunks of the world to make room for the new chunks that the player is about to enter.

To simplify the problem, it’s really common to break the world up into a grid, and keep a radius of tiles loaded around the player at any one time.

In the above you can see that 9 tiles are kept in memory. If the player crosses the boundary to the cell to the left, we unload the cells on the right (red cells) and load new cells in on the left (green cells).

The idea is that we keep a border of cells loaded around the player at all times so that they never see the edge of the world, but we don’t have to keep the whole world in memory at the same time.

The size of the cells can vary depending on the actual needs of your game. If you can travel very quickly in your game, you may need larger cells.

Instead of just having 9 large cells loaded at any one time, you may instead opt to have smaller cells but have more layers of them loaded at any one time. The below has 2 layers of cells loaded around the player, so has to keep 5×5 = 25 cells loaded at any one time. This can give you more granularity if you need it.

The number of cells you have to keep in memory when keeping N layers of cells loaded around the player is (2N+1)^2.

1 layer means 9 cells, 2 layers mean 25 cells, 3 layers mean 49 cells, 4 layers mean 81 cells and so on.

This isn’t the only way to arrange your world tiles though. You can also make it so every row of tiles is offset by half a tile from the one above it. That gives you a setup like this:

With this setup, keeping a single layer of cells loaded around the player takes only 7 cells instead of 9. That might not sound like much, but that means that your memory budget is 129% of what it was the other way.

Alternately, it means you can keep your cells at the same quality level but only have to load 78% as much stuff from disk as the player moves around the world.

For keeping N layers of cells around the player you need to keep 3N^2+3N+1 cells in memory.

1 layer means 7 cells, 2 layers mean 19 cells, 3 layers mean 37 cells, 4 layers mean 61 cells.

Here’s a table to show you how the regular grid compares to the half offset grid for cell counts. The savings do get better with more layers, but not very quickly.

\begin{array}{c|c|c|c} \text{Layers} & \text{Regular Grid} & \text{Half Offset Grid} & \text{Size} \\ \hline 1 & 9 & 7 & 77.8\%\\ 2 & 25 & 19 & 76\%\\ 3 & 49 & 37 & 75.5\%\\ 4 & 81 & 61 & 75.3\%\\ \end{array}

Overall, this is a pretty cool technique that is pretty low cost if you do this early in the project. Once you have a lot of content divided into a regular grid, it can be a challenge to move over to this half tile offset grid.

Some Notes From Readers

@chrispewebb said that an issue he’s faced when going this method is having T junctions on LODed terrain, but that skirts should be able to help there.

@runevision pointed out that while the memory requirements are lowered, so is the shortest distance (radius) from the player to data that isn’t loaded. One idea to deal with this if it’s a problem could be to use smaller cell sizes and to do more layers to make up for it.

Generating Random Numbers From a Specific Distribution With Rejection Sampling

The last post showed how to transform uniformly generated random numbers into any random number distribution you desired.

It did so by turning the PDF (probability density function) into a CDF (cumulative density function) and then inverting it – either analytically (making a function) or numerically (making a look up table).

This post will show you how to generate numbers from a PDF as well, but will do so using rejection sampling.

Dice

Let’s say you wanted to simulate a fair five sided die but that you only had a six sided die.

You can use rejection sampling for this by rolling a six sided die and ignoring the roll any time a six came up. Doing that, you do in fact get a fair five sided die roll!

This shows doing that to get 10,000 five sided die rolls:

One disadvantage to this method is that you are throwing away die rolls which can be a source of inefficiency. In this setup it takes 1.2 six sided die rolls on average to get a valid five sided die roll since a roll will be thrown away 1/6 of the time.

Another disadvantage is that each time you need a new value, there are an unknown number of die rolls needed to get it. On average it’s true that you only need 1.2 die rolls, but in reality, it’s possible you may roll 10 sixes in a row. Heck it’s even technically possible (but very unlikely) that you could be rolling dice until the end of time and keep getting sixes. (Using PRNG’s in computers, this won’t happen, but it does take a variable number of rolls).

This is just to say: there is uneven and unpredictable execution time of this algorithm, and it needs an unknown (but somewhat predictable) amount of random numbers to work. This is true of the other forms of sampling methods I talk about lower down as well.

Instead of using a six sided die you could use a die of any size that is greater than (or equal to…) five. Here shows a twenty sided die simulating a five sided die:

It looks basically the same as using a six sided die, which makes sense (that shows that it works), but in this case, it actually took 4 rolls on average to make a valid five sided die roll, since the roll fails 15/20 times (3 out of 4 rolls will fail).

Quick Asides:

  • If straying from rejection sampling ideas for a minute, in the case of the twenty sided die, you could use modulus to get a fair five sided die roll each time: ((roll - 1) \% 5) + 1. This works because there is no remainder for 20 % 5. If there was a remainder it would bias the rolls towards the numbers <= the remainder, making them more likely to come up than the other numbers.
  • You could also get a four sided die roll at the same time if you didn’t want to waste any of this precious random information: ((roll - 1) / 5) + 1
  • Another algorithm to check out for discrete (integer) weighted random numbers is Vose’s method: Vose’s Method.

Box Around PDF

Moving back into the world of continuous valued random numbers and PDF’s, a simple version of how rejection sampling can be used is like this:

  1. Graph your PDF
  2. Draw a box around the PDF
  3. Generate a (uniform) random point in that box
  4. If the point is under the curve of the PDF, use the x axis value as your random number, else throw it out and go to 1

That’s all there is to it!

This works because the x axis value of your 2d point is the random number you might be choosing. The y axis value of your 2d point is a probability of choosing that point. Since the PDF graph is higher in places that are more probable, those places are more likely to accept your 2d point than places that have lower PDF values.

Furthermore, the average number of rejected samples vs accepted samples is based on the area under the PDF compared to the area of the box.

The number of samples on average will be the area of the box divided by the area of the PDF.

Since PDF’s by definition have to integrate to 1, that means that you are dividing by 1. So, to simplify: The number of samples on average will be the same as the area of the box!

If it’s hard to come up with the exact size of the box for the PDF, the box doesn’t have to fit exactly, but of course the tighter you can fit the box around the PDF, the fewer rejected samples you’ll have.

You don’t actually need to graph the PDF and draw a box to do this though. Just generate a 2d random number (a random x and a random y) and reject the point if PDF(x) < y.

Here I'm using this technique with the PDF y=2x where x is in [0,1) and I'm using a box that goes from (0,0) to (1,2) to get 100,000 samples.

As expected, it took on average 2 points to get a single valid point since the area of the box is 2. Here are how many failed tests each histogram bucket had. Unsurprisingly, lower values of the PDF have more failed tests!

Moving to a more complex PDF, let’s look at y=\frac{x^3-10x^2+5x+11}{10.417}

Here are 10 million samples (lots of samples to minimize the noise), using a box height of 1.2, which unsurprisingly takes 1.2 samples on average to get a valid sample:

Here is the graph of the failure counts:

Here the box has a height of 2.8. It still works, but uses 2.8 samples on average which is less efficient:

Here’s the graph of failure counts:

Something interesting about this technique is that technically, the distribution you are sampling from doesn’t even have to be a PDF! If you have negative parts of the graph, they will be treated as zero, assuming your box has a minimum y of 0. Also, the fact that your function may not integrate to (have an area of) 1 doesn’t matter at all.

Here we take the PDF from the last examples, and take off the division by a constant, so that it doesn’t integrate to 1: y=x^3-10x^2+5x+11

The interesting thing is that we get as output a normalized PDF (the red line), even though the distribution we were using to sample was not normalized (the blue line, which is mostly hidden behind the yellow line).

Here are the rejection counts:

Generating One PDF from Another PDF

In the last section we showed how to enclose a PDF in a box, make uniformly random 2d points, and use them to generate points from the PDF.

By enclosing it in a box, all we were really doing is putting it under a uniform distribition that was scaled up to be larger than the PDF at all points.

Now here’s the interesting thing: We aren’t limited to using the uniform distribution!

To generalize this technique, if you are trying to sample from a PDF f(x), you can use any PDF g(x) to do so, so long as you multiply g(x) by a scalar value M so that M*g(x)>= f(x) for all values of x. In other words: scale up g so that it’s always bigger than f.

Using this more generalized technique has one or two more steps than the other way, but allows for a tighter fit of a generating function, resulting in fewer samples thrown away.

Here’s how to do it:

  1. Generate a random number from the distribution g, and call it x.
  2. Calculate the percentage chance of x being chosen by getting a ratio of how likely that number is to be chosen in each PDF: \frac{f(x)}{M*g(x)}
  3. Generate a uniform random number from 0 to 1. If it’s less than the value you just calculated, accept x as the random number, else reject it and go back to 1.

Let’s see this in action!

We’ll generate numbers in a Gaussian distribution with a mean of 15 and a standard deviation of 5. We’ll truncate it to +/- 3 standard deviations so we want to generate random numbers from [0,30).

To generate these numbers, we’ll draw random numbers from the PDF y=x*0.002222. We’ll use an M value of 3 to scale up this PDF to always be greater than the Gaussian one.

Here is how it looks doing this with 20,000 samples:

We generate random numbers along the red line, multiply them by 3 to make them be the yellow line. Then, at whatever point we are at on the x axis, we divide the blue line value by the yellow line value and use that as an acceptance probability. Doing this and counting numbers in a histogram gives us our result – the green line. Since the end goal is the blue line, you can see it is indeed working! With a larger number of samples, the green line would more closely match the blue line.

Here’s the graph of the failed tests:

We have to take on average 3 samples before we get a valid random number. That shouldn’t be too surprising because both PDF’s start with area of 1, but we are multiplying one of them by 3 to make it always be larger than the other.

Something else interesting you might notice is that we have a lot fewer failed tests where the two PDF functions are more similar.

That is the power of this technique: If you can cheaply and easily generate samples that are “pretty close” to a harder distribution to sample from, you can use this technique to more cheaply sample from it.

Something to note is that just like in the last section, the target PDF doesn’t necessarily need to be a real PDF with only positive values and integrating to 1. It would work just the same with a non PDF function, just so long as the PDF generating the random numbers you start with is always above the function.

Some Other Notes

There is family of techniques called “adaptive rejection sampling” that will change the PDF they are drawing from whenever there is a failed test.

Basically, if you imagine the PDF you are drawing from as being a bunch of line segments connected together, you could imagine that whenever you failed a test, you moved a line segment down to be closer to the curve, so that when you sampled from that area again, the chances would be lower that you’d fail the test again.

Taking this to the limit, your sampling PDF will eventually become the PDF you are trying to sample from, and then using this PDF will be a no-op.

These techniques are a continued area of research.

Something else to note is that rejection sampling can be used to find random points within shapes.

For instance, a random point on a triangle, ellipse or circle could be done by putting a (tight) bounding box around the shape, generating points randomly in that box, and only accepting ones within the inner shape.

This can be extended to 3d shapes as well.

Some shapes have better ways to generate points within them that don’t involve iteration and rejected samples, but if all else fails, rejection sampling does indeed work!

At some point in the future I’d like to look into “Markov Chain Monte Carlo” more deeply. It seems like a very interesting technique to approach this same problem, but I have no idea if it’s used often in graphics, especially real time graphics.

Code

Here is the code that generated all the data from this post. The data was visualized with open office.

#define _CRT_SECURE_NO_WARNINGS
 
#include <stdio.h>
#include <random>
#include <array>
#include <unordered_map>

template <size_t NUM_TEST_SAMPLES, size_t SIMULATED_DICE_SIDES, size_t ACTUAL_DICE_SIDES>
void TestDice (const char* fileName)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<size_t> dist(0, ACTUAL_DICE_SIDES-1);

    // generate the histogram
    std::array<size_t, SIMULATED_DICE_SIDES> histogram = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        size_t roll = dist(rng);
        while (roll >= SIMULATED_DICE_SIDES)
        {
            ++rejectedSamples;
            roll = dist(rng);
        }
        histogram[roll]++;
    }

    // write the histogram and rejected sample count to a csv
    // an extra 0 data point forces the graph to include 0 in the scale. hack to make the data not look noisier than it really is.
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "Actual Count, Expected Count, , %0.2f samples needed per roll on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t value : histogram)
        fprintf(file, "%zu,%zu,0\n", value, (size_t)(float(NUM_TEST_SAMPLES) / float(SIMULATED_DICE_SIDES)));
    fclose(file);
}
 
template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void Test (const char* fileName, float maxPDFValue, const PDF_LAMBDA& PDF)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // Generate a sample from the PDF by generating a random 2d point.
        // If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float pointX = 0.0f;
        float pointY = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            pointX = dist(rng);
            pointY = dist(rng) * maxPDFValue;
            float pdfValue = PDF(pointX);
            validPoint = (pointY <= pdfValue);

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }
 
        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "PDF, Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
        float pdfSample = PDF(x);
        fprintf(file, "%f,%f,%f,%f\n",
            pdfSample,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
            maxPDFValue,
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_LAMBDA>
void TestNotPDF (const char* fileName, float maxPDFValue, float normalizationConstant, const PDF_LAMBDA& PDF)
{
    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // Generate a sample from the PDF by generating a random 2d point.
        // If the y axis of the value is <= the value returned by PDF(x), accept it, else reject it.
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float pointX = 0.0f;
        float pointY = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            pointX = dist(rng);
            pointY = dist(rng) * maxPDFValue;
            float pdfValue = PDF(pointX);
            validPoint = (pointY <= pdfValue);

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }
 
        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(pointX * float(NUM_HISTOGRAM_BUCKETS));
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS -1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "Function, Simulated PDF, Scaled Simulated PDF, Generating Function, Failed Tests, %0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) / float(NUM_HISTOGRAM_BUCKETS);
        float pdfSample = PDF(x);
        fprintf(file, "%f,%f,%f,%f,%f\n",
            pdfSample,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES),
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / float(NUM_TEST_SAMPLES) * normalizationConstant,
            maxPDFValue,
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}

template <size_t NUM_TEST_SAMPLES, size_t NUM_HISTOGRAM_BUCKETS, typename PDF_F_LAMBDA, typename PDF_G_LAMBDA, typename INVERSE_CDF_G_LAMBDA>
void TestPDFToPDF (const char* fileName, const PDF_F_LAMBDA& PDF_F, const PDF_G_LAMBDA& PDF_G, float M, const INVERSE_CDF_G_LAMBDA& Inverse_CDF_G, float rngRange)
{
    // We generate a sample from PDF F by generating a sample from PDF G, and accepting it with probability PDF_F(x)/(M*PDF_G(x))

    // seed the random number generator
    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1.0f);
 
    // generate the histogram
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> histogram = { 0 };
    std::array<size_t, NUM_HISTOGRAM_BUCKETS> failedTestCounts = { 0 };
    size_t rejectedSamples = 0;
    for (size_t i = 0; i < NUM_TEST_SAMPLES; ++i)
    {
        // generate random points until we have one that's accepted
        // NOTE: this takes an unknown number of iterations, and technically may NEVER finish.
        float sampleG = 0.0f;
        bool validPoint = false;
        while (!validPoint)
        {
            // Generate a sample from the soure PDF G
            sampleG = Inverse_CDF_G(dist(rng));

            // calculate the ratio of how likely we are to accept this sample
            float acceptChance = PDF_F(sampleG) / (M * PDF_G(sampleG));

            // see if we should accept it
            validPoint = dist(rng) <= acceptChance;

            // track number of failed tests per histogram bucket
            if (!validPoint)
            {
                size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
                failedTestCounts[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
                ++rejectedSamples;
            }
        }

        // increment the correct bin in the histogram
        size_t bin = (size_t)std::floor(sampleG * float(NUM_HISTOGRAM_BUCKETS) / rngRange);
        histogram[std::min(bin, NUM_HISTOGRAM_BUCKETS - 1)]++;
    }
 
    // write the histogram and pdf sample to a csv
    FILE *file = fopen(fileName, "w+t");
    fprintf(file, "PDF F,PDF G,Scaled PDF G,Simulated PDF,Failed Tests,%0.2f samples needed per value on average.\n", (float(NUM_TEST_SAMPLES) + float(rejectedSamples)) / float(NUM_TEST_SAMPLES));
    for (size_t i = 0; i < NUM_HISTOGRAM_BUCKETS; ++i)
    {
        float x = (float(i) + 0.5f) * rngRange / float(NUM_HISTOGRAM_BUCKETS);
        
        fprintf(file, "%f,%f,%f,%f,%f\n",
            PDF_F(x),
            PDF_G(x),
            PDF_G(x)*M,
            NUM_HISTOGRAM_BUCKETS * float(histogram[i]) / (float(NUM_TEST_SAMPLES)*rngRange),
            float(failedTestCounts[i])
        );
    }
    fclose(file);
}
 
int main(int argc, char **argv)
{
    // Dice
    {
        // Simulate a 5 sided dice with a 6 sided dice
        TestDice<10000, 5, 6>("test1_5_6.csv");

        // Simulate a 5 sided dice with a 20 sided dice
        TestDice<10000, 5, 20>("test1_5_20.csv");
    }

    // PDF y=2x, simulated with a uniform distribution
    {
        auto PDF = [](float x) { return 2.0f * x; };

        Test<1000, 100>("test2_1k.csv", 2.0f, PDF);
        Test<100000, 100>("test2_100k.csv", 2.0f, PDF);
        Test<1000000, 100>("test2_1m.csv", 2.0f, PDF);
    }

    // PDF y=(x^3-10x^2+5x+11)/10.417, simulated with a uniform distribution
    {
        auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f) / (10.417f); };
        Test<10000000, 100>("test3_10m_1_15.csv", 1.15f, PDF);
        Test<10000000, 100>("test3_10m_1_5.csv", 1.5f, PDF);
        Test<10000000, 100>("test3_10m_2_8.csv", 2.8f, PDF);
    }

    // function (not PDF, Doesn't integrate to 1!) y=(x^3-10x^2+5x+11), simulated with a scaled up uniform distribution
    {
        auto PDF = [](float x) {return (x*x*x - 10.0f*x*x + 5.0f*x + 11.0f); };
        TestNotPDF<10000000, 100>("test4_10m_12_5.csv", 12.5f, 10.417f, PDF);
    }

    // Generate samples from PDF F using samples from PDF G.  random numbers are from 0 to 30.
    // F PDF = gaussian distribution, mean 15, std dev of 5.  Truncated to +/- 3 stddeviations.
    // G PDF = x*0.002222
    // G CDF = 0.001111 * x^2
    // G inverted CDF = (1000 * sqrt(x)) / sqrt(1111)
    // M = 3
    {
        // gaussian PDF F
        const float mean = 15.0f;
        const float stddev = 5.0f;
        auto PDF_F = [=] (float x) -> float
        {
            return (1.0f / (stddev * sqrt(2.0f * (float)std::_Pi))) * std::exp(-0.5f * pow((x - mean) / stddev, 2.0f));
        };

        // PDF G
        auto PDF_G = [](float x) -> float
        {
            return x * 0.002222f;
        };

        // Inverse CDF of G
        auto Inverse_CDF_G = [] (float x) -> float
        {
            return 1000.0f * std::sqrtf(x) / std::sqrtf(1111.0f);
        };

        TestPDFToPDF<20000, 100>("test5.csv", PDF_F, PDF_G, 3.0f, Inverse_CDF_G, 30.0f);
    }

    return 0;
}