Measuring Debug STL Container Perf Cost in MSVC

The code for this post can be found at: https://github.com/Atrix256/STLCost

Folks often will say STL containers are slow in debug, other folks say they aren’t.

Frankly, both are true. “Slow” as a negative statement depends both on what you are doing with them, as well as your need for debug performance.

In game development, you must absolutely be able to run a game in debug mode, and you are usually manipulating a lot of data. This is true both of games at runtime, as well as tool time operations.

This post looks at a pretty common tool time operation: making box filtered mip maps of a 512×512 RGBA F32 image.

This is in MSVC 2017, debug x64, using default project settings.

On my machine, it’s about 25 milliseconds to do this operation with a plain c style array. it’s about 240 milliseconds to do it with a std::array. It’s about 1900 milliseconds to do it with a std::vector.

This example is in line with my own experience that these basic STL containers are really slow in debug, and will bring a debug build to it’s knees. An unusable debug build is hell on a team and will significantly bring down productivity. It can be a project killer, especially as it makes it more difficult for folks to debug problems and validate that things are working correctly.

Maybe perf is better on other compilers though?

Have anything to add, correct, etc? Speak up! 🙂

Taking the Max of Uniform Random Numbers

There is 80 lines of simple standalone C++ code that generated the data for this post at:
https://github.com/Atrix256/RandomCode/blob/master/randmaxadd/Source.cpp

Let’s say you generate 1,000,000 random numbers from 0 to 255 and count how many times each number came up in a histogram. I did that and here it is:

In a previous post I talked about how adding dice rolls together, counting bits in a random number, and similar, would approach a normal / Gaussian distribution:
https://blog.demofox.org/2017/07/25/counting-bits-the-normal-distribution/

The same thing happens if you average random numbers, but there’s a nice side effect of getting values in the same range. That is to say: if you roll N 6 sided dice and average them, you will still get values between 1 and 6, but the more dice there are, the more the distribution will approach Gaussian. It shapes up pretty quickly too. Here is the histogram:

Something interesting to note is that adding two uniform random numbers together – or averaging them – makes something called “triangle distributed noise”. Looking at the histogram for averaging two values, hopefully you can see why it’s called triangle distributed! Triangle noise some cool properties for noise in graphics and is orthogonal to eg white noise vs blue noise vs low discrepancy sequences.

You can read about it in “Banding in Games”:

Click to access banding_in_games.pdf

I recently saw some code that was taking the maximum of two random numbers and that made me wonder what sort of distribution that might give.

Being a better programmer than a mathematician, i made a histogram, then took a guess as to what the formula behind the shape might be.

It turns out that taking the max of N uniform random numbers is the same as using y=x^(N-1) as a PDF. (You of course would need to normalize it to be a real PDF)

Here are the histograms:

These are apparently beta distributions:
https://en.wikipedia.org/wiki/Beta_distribution

If we take the min rather than the max, what happens then? Well, if for example the numbers are between 0 and 1, taking the min will give you the same count as if you took the max of 1-x. So, the graph of the histogram should look the same, just flipped on the x axis.

It turns out that it does:

Twitter Threads

Hey math friends. So if you add N random dice rolls together, it approaches the normal distribution as N approaches infinity.
Is there a name for the distribution that comes up if you take the max of N rolls? At infinity i guess it'd be a dirac delta at max value but before then? pic.twitter.com/PrkoQdhZGm

— Alan Wolfe ☕ (@Atrix256) June 1, 2019

New short blog post: What if you take two random numbers and take the max of em?
"Taking the Max of Uniform Random Numbers"https://t.co/270ARxdHJm
Spoiler: you get beta distributions pic.twitter.com/HbZLBTDuph

— Alan Wolfe ☕ (@Atrix256) June 1, 2019

Generating Random Numbers From a Specific Distribution With The Metropolis Algorithm (MCMC)

There is ~400 lines of standalone C++ code that implements the main ideas in this post. You can find it at: https://github.com/Atrix256/MetropolisMCMC

In previous posts I showed how to generate random numbers from a specific distributing by using two techniques:

Rejection Sampling: https://blog.demofox.org/2017/08/08/generating-random-numbers-from-a-specific-distribution-with-rejection-sampling/

Inverting the CDF: https://blog.demofox.org/2017/08/05/generating-random-numbers-from-a-specific-distribution-by-inverting-the-cdf/

This post will show how to do it using a Markov Chain Monte Carlo method called “The Metropolis Algorithm”. This post also talks about using it for numerical integration.

If you want an intro or review to either Markov Chains or Monte Carlo, these two posts can help you out.

Monte Carlo: https://blog.demofox.org/2018/06/12/monte-carlo-integration-explanation-in-1d/

Markov Chains: https://blog.demofox.org/2019/05/11/markov-chain-text-generation/

Overview

The Metropolis algorithm lets you generate random numbers that follow a distribution given by any function y=f(x), where y is the probability of choosing x.

Rejection sampling does this as well, but you have to throw away an unknown number of bad samples before getting each good sample.

Inverting the CDF doesn’t throw out any samples, but it’s limited in the type of distributions it can do: It can be mathematically complex, or impossible, to find the inverted CDF for a given function analytically.

In these ways, the Metropolis algorithm is the best of both worlds. You can sample from a distribution defined from any function, and you don’t have to throw out any samples while doing it.

Another interesting thing about the Metropolis algorithm is that it can work blindly. It never actually has to KNOW what function describes the probability distribution. If there is a black box you can give an x to get a y, the Metropolis algorithm can generate random numbers from the distribution hidden in that black box.

The Metropolis algorithm also works in any dimension: you can use it with functions like z=f(x,y) or t = f(x,y,z,w,s).

In higher dimensions such as z=f(x,y), the z is the probability of a 2d random number (x,y) being chosen. It sounds weird but this situation is like if you rolled two dice, the value that one die came up as affected the probability of the other die. Maybe when the first die was a larger number, it made it be more probable for the other die to be a larger number too.

As weird as that is, if you can describe the relationship as a z=f(x,y) function, this can generate (x,y) random numbers from that distribution for you.

It’s worth noting that the Metropolis algorithm is a simpler special case of the Metropolis-Hastings algorithm, and these are just two of many Markov Chain Monte Carlo algorithms.

The Metropolis Algorithm

The metropolis algorithm is pretty simple.

You start with an x value and calculate y which is just f(x). This is the initial sample and hopefully is a location where y is greater than 0.

To get the next sample, we first need to calculate a candidate sample, and then choose whether to take it or not.

To make a candidate sample, take a small random step from the current x point to get a new x, either in the negative or positive direction. Calculate y which is just f(x) using the new x.

Calculate a value A (for acceptance value) which is the candidate y divided by the last y value. This is the percentage chance you should take the new sample as the current sample, otherwise you take the old sample as the current sample.

To make the decision, you just generate a random number between 0 and 1 and accept the new sample if it’s below the A value.

Rinse and repeat for as many samples as you want.

Here’s a simple but fully featured implementation that you can find in the code that goes with this post(https://github.com/Atrix256/MetropolisMCMC)

The x axis of each sample is a random number drawn from the distribution described by y=f(x). If you keep track of the average x value seen, you’ll get the expected value of the PDF.

The y axis of each sample isn’t that useful directly. It’s the pdf(x) but scaled up by an unknown amount – the normalization constant for the function.

The convergence rate of Metropolis MCMC isn’t as well understood as monte carlo integration, since Metropolis has dependent samples (a random walk that knows where it was last step) vs independent samples (a stateless random sample).

In higher dimensions, you just take a random step on each axis, instead of only the x axis. The rest of the algorithm remains the same.

Regarding the random walk, it’s possible to use many different types of random numbers to take a step, but it’s most common to use a normal distribution.

Metropolis Burn in and 0.234

Metropolis is stateless, and has no memory of the past. Despite this, it’s common to do a “burn in” with MCMC where you throw out some number of samples before you start.

This might sound weird, but the reason for it is that there are good places to sample from and bad places to sample from, and this is an attempt to have the random walk find a better place to sample from.

For instance, if you were sampling from y=sin(x)*sin(x)*x from 0 to 2 pi, you’d have a pdf that had a shape like the bimodal function below:

If you started the random walk at 2, you’d be doing a random walk in the left, less probable hump, and it would be hard to break out into the right side which was supposed to be more probable.

You should eventually get into the larger hump and stay there longer, but your initial guesses may get stuck in a local minimum and not do a very good job of following the distribution.

While burn in can help situations like these, this is also an example of how the Metropolis algorithm can fail.

It’s worth noting that using a normal distribution for the random walk can help it not get stuck in bad places. If you have a uniform random distribution that can generate numbers between -k and +k, you can get stuck in situations where you need to take a larger than k step to get to a better (more probable) location. If instead you use a normal distribution, the sigma allows you to have a good idea of how big most of the steps will be, but there will always be a greater than zero chance that you can take a step of ANY size, which could get you out of a local minimum.

There is a rule of thumb that the step size you use (the sigma in the case of a normal distribution) should make it so you are accepting a new step on your random walk 23.4% of the time. This supposedly is a good balance between exploration (finding better places elsewhere) and exploitation (staying in a good place) in many situations.

This isn’t fully settled though, as this is only true some of the time, and counter research has come up. (https://www.sciencedirect.com/science/article/pii/S0304414907002177)

I experimented at having a “Burn in” phase where it also adjusted the sigma to try and reach that 23.4% acceptance rate over the previous N samples. I didn’t play with it very long though, and was unable to get it to reliably reach that acceptance rate.

Even beyond the 0.234 acceptance rate goal, tuning the step size for your specific situation can help you get better or worse results. I didn’t play around with that much in my experimentation though, and found a sigma of 0.2 worked pretty well when working with functions in the 0-pi and 0-2pi range.

The initial starting point of your random walk can affect performance too obviously, since the burn in stage is supposed to find a better starting position.

Limiting the Function’s Range

In one of the tests I did, I used the function y=sin(x) with x going from 0 to pi/2.

At first i tried clamping x to 0 to pi/2, to keep it in range but doing that made the technique fall apart. There were plenty of times the random walk would try to step out of bounds, but instead of taking that step, it would clamp to the border. This meant that the random walk had a significantly higher chance of reaching the boundary than anywhere else on the graph, and that broke the algorithm.

The correct thing to do was to just make the function return 0 when x was out of range. In this way, it would end up taking any out of range location with 0% probability, but there was no bias about more or less likely places visited by the random walk, other than it preferring higher (more probable) parts of the function.

Something else worth noting about the function is that if the function ever returns a negative number, it ends up being the same as if it returned zero probability.

If use Metropolis MCMC for integration, this can be an important fact, because it will basically ignore the negative values, and treat them as zero.

Discrete Case

As described, this algorithm works with continuous random numbers, drawing them from a PDF.

The same concepts work for discrete states though too (a more traditional looking markov chain), drawing from a PMF instead.

When handling the discrete case, it needs to be possible to be in any state at any point in time. A usual way to avoid the edge case of probabilities being such that you can only be in some nodes on even steps and others on odd steps, is to have a “self loop” on at least one node, which has a greater than zero probability of staying at the same node.

This page has some great info about the discrete case:
Markov Chain Monte Carlo Without all the Bullshit

Integration

Using the Metropolis algorithm for numerical integration is possible, but is not as straightforward as Monte Carlo integration.

In Monte Carlo integration, to get a single estimate of an integral you calculate f(x) / pdf(x). f(x) is the function value at the random location x, and pdf(x) is the probability of that x value being chosen. You take the average of N such estimates and as N approaches infinity, the error of the average of estimates approaches 0.

In Metropolis MCMC we do have N number of samples, and it almost seems like we have enough data to do this, but it turns out that we don’t.

For f(x) which is the function value at the random location, you literally do have f(x). It’s the y component of each sample generated.

The problem is that we don’t have pdf(x).

The probability of choosing x is in fact based on the function we are evaluating f(x), but the function is essentially an un-normalized pdf, but we are able to draw random numbers from the pdf without knowing the normalization constant.

So, pdf(x) is some scalar multiple of f(x), but we have no idea what the multiplier is. That multiplier, the normalization constant, turns out to be the integration value we want to search for.

So we’ve gone in a circle and are no closer to being able to integrate with Metropolis.

There are some ways to deal with this though.

One way is to do mathematical tricks to make it so things “cancel out” and leave the normalization constant.

There is something called the “Harmonic Mean Estimator” that does this, but has infinite variance so is called “The Worst Monte Carlo Method Ever”.
https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/

There is another way though, that I use in the code that goes with this post.

Imagine that while you are doing your Metropolis MCMC you have some interval [a,b] that whenever you get a random number drawn in that range, you increment a counter.

After N total samples, you’ll have M samples that fell in this interval. An estimate of the integral of the normalized pdf over this interval is M/N.

Now, you can do regular Monte Carlo integration of the function over this range to get the integral of the UN-normalized pdf over this interval.

When you divide the unnormallized pdf value by the normalized pdf value, you’ll get the normalization constant aka the integral of the function.

A smaller interval size is better for the Monte Carlo integration because it will converge faster (better results in fewer samples), but it’s worse for the Metropolis integration because a smaller interval is less likely to be accurate with the random walk.

My intuition tells me that if you keep a histogram of the x values you’ve seen be generated from the Metropolis algorithm, that the ones with higher counts are more likely to be accurate. So I just integrate over whatever histogram bucket has the highest count. I haven’t done any real analysis of whether or not this is true, or how good this integration estimate is in general.

For the Monte Carlo integration, I used white noise (regular old random numbers) to integrate, but in reality you’d get much better results from something like sobol. I used white noise because i made the code generic for N dimensions and white noise generalizes to any dimension.

Experiment Results

I didn’t play around much with initial guesses, sigmas, trying to reach the 0.234 acceptance rate, or burn in, but here’s some results from the code that goes with the post.

In the below, the blue line – normalized function value – is the actual desired PDF . The red line – Percentage – is how many samples we actually got in that histogram bucket. When these lines match up, we are happy and everything worked like we wanted it to.

y=sin(x) x in [0,pi]

y=|sin(x)| x in [0, 2pi]

y=sin(x)*sin(x) x in [0, 2pi]

It’s interesting to see the last one be so far from the real PDF. That function must trap the random walk in one side or the other a bit too much.

There is also a 2d function z=f(x,y) that is tested in the c++ code that goes with this post. I don’t know of any easy ways to make a 2d histogram so don’t have any results to show.

Links and Closing

The Metropolis algorithm is pretty neat but it’s just the beginning of MCMC methods. I’ve heard that Hamiltonian Monte Carlo can give much better results by using derivatives to make more intelligently sized step sizes.

Something I find interesting is that plain Monte Carlo uses white noise, quasi Monte Carlo uses low discrepancy sequences (and i think blue noise would fit in here), while Metropolis MCMC uses a random walk, which is red noise.

I’m not sure what to make of that, but my brief reading about Hamiltonian Monte Carlo was that it allows the samples to be less dependent, and that’s why it improves things. Maybe there are some secrets to red noise, like there are for blue noise? I’m not really sure but will keep looking 😛

A great write up on Metropolis MCMC
https://stephens999.github.io/fiveMinuteStats/MH_intro.html

Another small but useful write up
http://www.pmean.com/07/MetropolisAlgorithm.html

A 35 minute video about Metropolis MCMC

A mathier set of videos about Metropolis MCMC that is actually very easy to understand:

“A Zero-Math Introduction to Markov Chain Monte Carlo Methods”
https://towardsdatascience.com/a-zero-math-introduction-to-markov-chain-monte-carlo-methods-dcba889e0c50

A more mathy overview of Metropolis
https://ermongroup.github.io/cs323-notes/probabilistic/mh/

A series of posts aimed at being a gentle introduction to MCMC
https://theclevermachine.wordpress.com/tag/monte-carlo-integration/

A mutli branch twitter thread talking about some interesting MCMC related things

What i've been learning about metropolis MCMC:
Everyone starts by trying to integrate sin(x) from 0 to pi, and WE ALL get 2.5 as an answer, instead of 2.0.
🤣
I will share the right way when i figure it out 😛

— Alan Wolfe ☕ (@Atrix256) May 18, 2019

“Introduction to MCMC”

Introduction to MCMC

“MCMC Burn In”

MCMC burn-in

Markov Chain Text Generation

This post includes a standalone (only standard headers, no external libs) ~400 line C++ source file that can analyze text and use an order N Markov chain to randomly generate new text in the same style. The Markov code itself is fairly generic / re-usable and a template parameter to the class lets you specify the order of the chain as well as the type of state data to use. That code is on github at: https://github.com/Atrix256/TextMarkovChain

When I see material on Markov chains, it usually comes in two flavors:

Very Mathy
Pretty impressive results light on explanation

It turns out the reason for this is because they CAN be very mathy but they can also be extremely simple.

Without knowing this, I decided it was time to learn about Markov chains. I leveled up my linear algebra knowledge a bit, finally getting a solid grasp on eigen vectors, and learning things like how to put a matrix into an eigen basis form to be able to make matrix exponentiation a trivial operation. There are links at bottom of post if you want to learn this stuff too.

Then, I sat down to learn Markov chains and nearly flipped my table over! Yes, Markov chains can be mathy (and matrix exponentiation is one way to find a Markov chain steady state, but not the best), but that stuff isn’t really required for most uses.

Markov Chains

A Markov chain is just any situation where you have some number of states, and each state has percentage chances to change to 0 or more other states.

You can get these percentages by looking at actual data, and then you can use these probabilities to GENERATE data of similar types / styles.

Example

This post uses Markov chains to generate text in the style of provided source text.

The first step it does is analyze source text.

To analyze the source text, it goes through text, and for each word it finds, it keeps track of what words came next, and how many times those words came next.

When analyzing the story “The Tell-Tale Heart” by Edgar Allan Poe for instance (https://poestories.com/read/telltaleheart , also is data/telltale.txt in the code that goes with this post), here are the words that came after “when” and their counts.

all – 1
enveloped – 2
he – 1
i – 4
my – 2
overcharged – 1
the – 1

Here are the counts for the words that appear after “is”:

but – 1
impossible – 1
merely – 1
nothing – 1
only – 1
the – 2

After all these counts have been gathered up, the next step is to convert them into probabilities. You do this by summing up the words that come after a specific word, and dividing the count of each word by that total sum.

The above examples then turn from counts to probabilities. Here is “when”:

all – 8%
enveloped – 16%
he – 8%
i – 33%
my – 16%
overcharged – 8%
the – 8%

Here is “is”:

but – 14%
impossible – 14%
merely – 14%
nothing – 14%
only – 14%
the – 28%

Note: The code that goes with this post spits out these counts and percentages in the “out/stats.txt” file if you ever want to see the data.

Once the probabilities are known, you can start generating text. The first thing you do is pick a word purely at random, this is the first word in the text.

Next, you use the probabilities of what words come after that word to randomly choose the next word.

You then use the probabilities of what words come after that word to randomly choose the next word.

This repeats until you’ve generate as much text as you want.

The code with this post generates 1000 words into the “out/generated.txt” file.

That is literally all there is to it. You could do this same process with sheet music to generate more music in the same style, you could do it with weather forecasts to generate realistic weather forecasts (or even try to use it to predict what weather is next). You can do this with any data you can imagine.

Example Generated Output

Here is 100 words of generated text from various sources.

First is text generated from “The Tell-Tale Heart” by Edgar Allan Poe (https://poestories.com/read/telltaleheart):

…About trifles, and with perfect distinctness — very slowly, my sagacity. I then took me, louder — you cannot imagine how stealthily — with what caution — cautiously — would have told you may think that no longer i knew that no blood – spot. He would not even his room, to do the hour had made up my whole week before him. I knew what dissimulation i showed them causeless, undisturbed. Now a hideous heart, no — wide open — all and the old man, and he would have…

Here is text generated from “The Last Question” by Isaac Asimov (http://hell.pl/szymon/Baen/The%20best%20of%20Jim%20Baens%20Universe/The%20World%20Turned%20Upside%20Down/0743498747__18.htm):

…Glory that. Man said, it into a meaningful answer. Granted, said, might be kept from the entire known to restore the universe for meaningful answer. Mq – talkie robot, ac learned how many stars are dying. The boys appreciated that not. Cosmic ac that, how may be able to reach the small station, said at half the same. He shrugged. We’ll have enough to be alone. And lose itself aloof. When any other kind of universal ac. He consisted of individuals were self – contact…

Here is text generated from a research paper “Projective Blue-Noise Sampling” (http://resources.mpi-inf.mpg.de/ProjectiveBlueNoise/ProjectiveBlueNoise.pdf):

…Numerical integration. Mj patterns to vector multiplication to achieve a way that the above question whether there exist distributions have addressed anisotropic classic lloyd relaxation green and rotated pattern significantly worse than the j 1, where each site: our projective blue – noise point distributions along both axes. Previous work sampling when undergoing one after a certain number of common blue noise patterns, but at the publisher s ., cohen – left constructs a quality of latinizing the non – sample counts however, as a set only in a theory 28, this shrinkage…

Here is text generated from an example (not real, but representative) psych report from my wife who is a school psychologist:

…Brother had to mildly impaired body movement, the school and placement after a 90 probability that student: adapting to struggle as video games. Student’s planning and he request, spelling subtest scores. This time. The student: this time and accurately with both, including morphology, 2013. Administrators should consider participation in the following are student as intellectually disabled specific auditory comprehension of reading: mr. Mrs. The two subtest is designed to use of or economic disadvantages, gestures, vitality or economic disadvantages, picking at approximately 5th grade prior…

Here we generate a markov chain using ALL the above source texts, to get a mash up of all of them.

…Restore the sphere packing radius is likely an adaptive skills. Please see inset in the conner s problems, we’ll just have well and visualization and he is computed on 1 2 was contacted by things, and restricted number of his abilities. We can simply like them, as well as a s difficulty interacting with a closer to cry, the process based on the standards – appropriate to spurious aliasing artefacts mit87, making a meaningful answer. Finally, 11 months through hyperspace to try his eye contact. Jerrodine’s eyes were going out if…

Lastly, here is only Poe and Asimov combined:

…Could not forever, and continually increased. And stood for a sudden springing to get back and the eighth night i to that man, 2061, but the original star and made trips. A very, and fell full youthfulness even to feel — i then stop someday in five words on a while i heard all the noise steadily for us, calling him to pluto and now a galaxy alone pours out, quick sound would think of individuals. He stirred his hideous veil over the ceiling. Twenty billion years ago, man, …

Nth Order Markov Chains

Using one word to generate the next word works somewhat well – the generated Poe text definitely seemed like Poe for instance – but there are plenty of times when things don’t make much sense.

A markov chain can become higher order when you don’t just look at the current state to transition to the next state, but you look at the last N states to transition to the next state.

In the text generation case, it means that a 2nd order Markov chain would look at the previous 2 words to make the next word. An order 3 markov chain would look at the previous 3 words to make the next word.

Interestingly, an order 0 Markov chain looks at NO WORDS to generate the next word, so is purely random word generation, with similar word counts (by percentage) as the original text.

The code that goes along with this post lets you specify the order on the Markov chain.

Here is “The Tell-Tale Heart” with an order two markov chain.

…Dark as midnight. As the bell sounded the hour, there came to my ears: but he had been too wary for that. A tub had caught all — ha ha when i describe the wise precautions i took for the concealment of the old man sprang up in bed, crying out — no blood – spot whatever. I removed the bed and examined the corpse. Yes, he was stone, stone dead. I knew that he had been lodged at the police. A watch’s minute hand moves more quickly than did…

If you compare that to the actual story, you can find fairly large sections of that are taken verbatim from the source text, but the arrangement of those larger chunks are different.

The reason for this is that when you have two words mapping to the next word, the number of these go up, which makes it so on average, there are going to be fewer choices for “next words”, which make the results less random, and more deterministic.

If you gave it more text (like, maybe, all of Edgar Allan Poe’s work), there would be more options for the next word after specific 2 word pairs, but with a single short story, it doesn’t have very many choices. If you look at the out/stats.txt file and compare order 1 vs order 2, you can see that order 2 has a lot more situations where a current state maps to a single next state.

At order 3 there are even fewer choices, and it hits a pattern loop:

…Had been lodged at the police office, and they the officers had been deputed to search the premises. I smiled, — for what had i now to fear there entered three men, who introduced themselves, with perfect suavity, as officers of the police. A shriek had been heard by a neighbor during the night; suspicion of foul play had been aroused; information had been lodged at the police office, and they the officers had been deputed to search the premises. I smiled, — for what had i now to…

Here is an order 2 mashup of Poe and Asimov:

…Crossing the floor, and still chatted. The universal ac interrupted zee prime’s own. It had to be contrary, and jerrodette i. Ask multivac. As the passage through hyperspace was completed in its place, each cared for by perfect automatons, equally incorruptible, each with its dreadful echo, the real essence of men was to be contrary now, now, honeys. I’ll ask microvac. Don’t shout. When the sun, and their only concern at the visiplate change as the frightened technicians felt they could hold their breath no…

Lastly, here’s an order 2 mashup of all 4 source texts:

…Mathematics: student does not require special education and related services, the radius of each other, indistinguishable. Man said, ac organized the program. The purpose of this report provides information about the child s educational performance. Other pertinent future work includes the extension of our projective lloyd patterns against other patterns on a role not based on his scores on this scale is different for the sake of visual clarity, we specify all spaces via a set x. In a way, man, i undid it just so much that a single…

Other Implementation Details

When combining the texts, it might make sense to “normalize” the percentages for each source text. How it works now with raw counts makes it so longer documents have more of their style preserved in the final output document.

You may also want to give weightings to different text so you can have a sliding scale between Poe and Asimov for instance, by basically scaling the counts from their files higher or lower to give more or less representation in the results.

When analyzing the text, I had to think about what to do with punctuation. I chose to treat punctuation as words in themselves, but ignored some punctuation that was giving weird results – like double quotes. I’ve only just now realized that I incorrectly ignore question marks. Oops.

When generating text, i made it so some words don’t put a space before themselves (like, a period!), and i also made it so words would have their first letter capitalized after a period or similar. There seems to need ad hoc, domain specific massaging to get reasonable results.

It’s possible (especially with higher order markov chains) that you can get into a situation where your current state has nothing to transition to. You’d have to figure out what to do in this case. One idea would be to choose a next word at random. Another idea would be to fall back to a lower order markov chain maybe?

I feel like once you understand the algorithm, it’s an art form to teach and tune the Markov chain to get good results. I bet there are some interesting techniques beyond the simple things I’ve done here.

Links

Mathy Markov Chain Info

If you want to dive into the mathy side of markov chains, here are some great resources you can follow to get there…

A great linear algebra online “text book”, that is very easy to read and understand: http://immersivemath.com/ila/index.html

Some great videos on linear algebra: https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab

A 9 part series on markov chains. It’s this long because it’s very explicit and works through the details by hand. I watched it at like 1.5x speed and was fine 😛

Some “mathy” notes about Markov chains, including higher order ones:

Click to access chap4.pdf

Q Learning

Related to markov chains, Q learning is essentially is a way to learn a Markov chain from data – for instance learning how to play tic tac toe, or how to traverse a maze.

I would like to learn Q learning better and make a post (and code!) at some point.

Q Learning Explained With HTML5
https://blockulator.github.io/Q-Learning-Explained-With-HTML5/

An introduction to Q-Learning: reinforcement learning
https://medium.freecodecamp.org/an-introduction-to-q-learning-reinforcement-learning-14ac0b4493cc

Reinforcement Learning Tutorial Part 1: Q-Learning
https://blog.valohai.com/reinforcement-learning-tutorial-part-1-q-learning

Reinforcement Learning Tutorial Part 2: Cloud Q-learning
https://blog.valohai.com/reinforcement-learning-tutorial-cloud-q-learning

Reinforcement Learning Tutorial Part 3: Basic Deep Q-Learning
https://towardsdatascience.com/reinforcement-learning-tutorial-part-3-basic-deep-q-learning-186164c3bf4

Other

Here is a twitter conversation about some compelling uses of Markov chains

Dear awesome twitter friends: What are some cool and/or compelling usage cases for markov chains? (bonus points if you give a link) 😎

— Alan Wolfe ☕ (@Atrix256) May 8, 2019

Here’s a video “Markov Chain Monte Carlo and the Metropolis Algorithm” which uses Markov chains to help calculate integrals numerically.

Code

Again, the code for this post is up on github at https://github.com/Atrix256/TextMarkovChain

The code is written for readability and runs plenty fast for this demo (nearly instant in release, a couple seconds in debug) but There are lots of string copies etc that you would want to fix up if using this code seriously.

Thanks for reading!

Linear Fit Search

Binary search looks in the middle of a list to make a guess about where a search value is. If that guess is wrong, it can eliminate half of the list (based on whether the search value is less than or greater than the guess location) and try again. It repeats until it’s either found the search value, or runs out of list.

This algorithm works well but it is blind to the actual values it got when making guesses, beyond just checking if they were greater or less than the search value.

I recently wondered: If we knew the min and max value stored in the list, couldn’t we make a more intelligent guess as to where the search value might be? We could fit the data with a line, figure out where our guess would be on that line, and make that be our initial guess. As we iterate, we could use our incorrect guesses as new min or max values of the line as appropriate, updating our line fit as we went, and perhaps arrive at an answer more quickly.

Another way of looking at this: If the guess a binary search made is VERY far from the search value, maybe it should go farther than the midpoint when making the next guess? Or, if it was pretty close to the search value, maybe it shouldn’t go as far as the midpoint? Close vs far measurements depend on the overall magnitude of the numbers in the list, so you’d need to know what sort of values are stored. A min and a max value of the list can give you a rough idea of that, especially if you update those min / max values as you repeatedly cut the list with guesses.

This post explores that idea. The result is something that could be more attractive than binary search, depending on what kind of trade offs are being looked for. While I haven’t heard of this technique , I wouldn’t be surprised if it’s been tried before and written about. (Know of a source? let me know!).

UPDATE: @thouis from twitter mentioned the basic idea is called “interpolation search”. This post goes beyond that basic idea but you can read more about it here if you’d like 🙂 https://www.techiedelight.com/interpolation-search/. He has a paper about interpolation search that you can read here (it has some relation to discrepancy, as in low discrepancy sequences, oddly!) https://erikdemaine.org/papers/InterpolationSearch_SODA2004/

The post goes a step further to address a problem that is encountered when using this algorithm, and also talks about other ways this algorithm might be extended or generalized.

An implementation, and the code that generated all the data for this post, can be found here: https://github.com/Atrix256/LinearFitSearch

Initial Problem / Other Possible Avenues

(Feel free to skip this section if you get lost. You won’t miss anything important about the algorithm itself)

If you are wise in the ways of numbers, you might be saying to yourself that this only works if you have roughly evenly distributed numbers – basically, a flat PDF, or a flat histogram. This is because by only knowing the min and max, you are doing a linear fit of the data, and making guesses as if your data is well represented by that line. The less like a line your data actually is, the less good this ought to work.

That is true, and I thought up this idea while trying to think of how to generate 1d blue noise more quickly, which is random but roughly evenly spaced values. For that usage case it does well, but there are many types of non linear data out there that you might want to search through.

Really what you want to do is learn the distribution of the values in the list, and use that knowledge to know where the value you are searching for is likely to be.

I didn’t go that direction in these experiments, but it seems like a data scientist would have plenty of tools in their tool box to attempt something like that. Markov chain Monte Carlo type algorithms come to mind.

There’s another way to look at the problem of searching for a value in a list, and that’s to look at it as strictly a function inversion problem.

If you look at your sorted list as a lookup table, where the index is the x value, and the value stored is the y value, a search tries to tell you the x value for a specific y value that you are searching for.

In this context you only care about integer values of x, and there might be duplicate values in the list, making it not a strictly monotonic function – not having each y value be larger than the last y value – but has a more relaxed version where each y value is >= the last y value.

Thinking about the search problem as a function inversion problem, ignoring the monotocity issue, there are far too many data points to do an analytic inverse, so you would be looking at numerical inverse solutions.

I also didn’t really explore that direction, so it’s another way to go that might yield some better fruit.

Lastly, you could see searching a sorted list as a root finding problem. If you are looking for where the function minus the search value equals zero, numerical root finding functions could maybe help you here. I also did not try anything in that direction.

If anyone ends up exploring any of the alternative avenues, I’d love to hear what kind of techniques you used and what your results were!

Linear Fit Search

The algorithm works like this…

Start with a sorted list, and the minimum and maximum value stored in that list.
Calculate a line fitting the min and max. For an equation y=mx+b, you are calculating m and b.
Using the inverse of the function, which is x=(y-b)/m, make a guess for what index (x) the search value (y) is at by plugging the search value into that equation as y and getting an x. That x is the index you are guessing the value is at.
If your guess was correct, you are done so exit. Otherwise, if the guess was too high, this is your new max. If the guess was too low, this is your new min. If you’ve run out of list to search, the value isn’t there, so exit.
Goto 2

This algorithm assumes the sorted list looks like a line if you were to graph it, so it does better when the sorted list actually looks like a line.

Let’s see how it does for a linear list with values in it between 0 and 2000. (Click to see full size image)

The left image shows the items in the array.

In the middle image, x axis is the number of items in the list, and y axis is how many guesses it took to search for a random value. This shows the average of 100 runs.

In the right image, it shows the minimum and maximum guesses it took for each list size, for those same 100 runs.

The linear fit did pretty well didn’t it? At minimum it took zero guesses (the search value was less or equal to min or greater or equal to max), and at maximum it took 2 guesses to find the search value, regardless of list size.

Binary search took about the usual log2(N), as expected.

Let’s try a list made up of random numbers between 0 and 2000.

That looks pretty similar to the linear case, but the line fit search doesn’t beat binary search by quite as much. The randomness of the list makes it so the guesses are more often wrong, and so it takes a few extra guesses to find the right place.

Let’s try a quadratic function: y=2000x^2:

The average for line fit search still beats binary search, but if you look at the min/max graph, the line fit min and max entirely encompasses the binary search min and max. That means there is a ton of variance about whether it will be faster or slower than binary search, even though on average it will be faster.

Let’s try a cubic function: y=2000x^3:

While the average still (barely) beats binary search, the maximum for line fit search has gotten REALLY erratic.

Let’s try a log function:

Ouch, the line fit is actually doing worse now than the binary search.

Lastly, let’s go back to the linear list, but let’s make the last entry in the table be 200,000 instead of 2000:

Ouch! Linear fit search is super awful now. What happened?!

It turns out that this uneven histogram type of list is really a worst case scenario for the line fit search.

What is happening here is that it sees the min as 0 and the max as 200,000 so it thinks the line is very steep. On it’s first guess, everything it could search for (it searches for a random value between 0 and 2000), it will think the value is at index 0. It will very likely be wrong, and elminate index 0. The next round, it will choose index 1, be very likely wrong again, and repeat by picking 2 then 3 then 4 and so on. This data layout nearly forces this search to a more computationally expensive version of linear search. Binary search doesn’t have this problem because it doesn’t care what the values are, it just cuts the list in half repeatedly until it’s done.

Wouldn’t it be nice if we could know whether it’d be better to use binary search or linear fit search for a data set?

We’d have to analyze the data set to figure that out, and if we are going to go to all that trouble, we probably should just learn the shape of the data set in general and use that knowledge to make a better guess than either binary search or linear fit.

I think going that route could be fruitful, but I didn’t try it. Instead I came up with a Hybrid Search.

Here is my more readable, less optimized code for the linear fit search.

TestResults TestList_LineFit(const std::vector<size_t>& values, size_t searchValue)
{
    // The idea of this test is that we keep a fit of a line y=mx+b
    // of the left and right side known data points, and use that
    // info to make a guess as to where the value will be.
    //
    // When a guess is wrong, it becomes the new left or right of the line
    // depending on if it was too low (left) or too high (right).
    //
    // This function returns how many steps it took to find the value
    // but doesn't include the min and max reads at the beginning because
    // those could reasonably be done in advance.

    // get the starting min and max value.
    size_t minIndex = 0;
    size_t maxIndex = values.size() - 1;
    size_t min = values[minIndex];
    size_t max = values[maxIndex];

    TestResults ret;
    ret.found = true;
    ret.guesses = 0;

    // if we've already found the value, we are done
    if (searchValue < min)
    {
        ret.index = minIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue < max)
    {
        ret.index = maxIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue == min)
    {
        ret.index = minIndex;
        return ret;
    }
    if (searchValue == max)
    {
        ret.index = maxIndex;
        return ret;
    }

    // fit a line to the end points
    // y = mx + b
    // m = rise / run
    // b = y - mx
    float m = (float(max) - float(min)) / float(maxIndex - minIndex);
    float b = float(min) - m * float(minIndex);

    while (1)
    {
        // make a guess based on our line fit
        ret.guesses++;
        size_t guessIndex = size_t(0.5f + (float(searchValue) - b) / m);
        guessIndex = Clamp(minIndex + 1, maxIndex - 1, guessIndex);
        size_t guess = values[guessIndex];

        // if we found it, return success
        if (guess == searchValue)
        {
            ret.index = guessIndex;
            return ret;
        }

        // if we were too low, this is our new minimum
        if (guess < searchValue)
        {
            minIndex = guessIndex;
            min = guess;
        }
        // else we were too high, this is our new maximum
        else
        {
            maxIndex = guessIndex;
            max = guess;
        }

        // if we run out of places to look, we didn't find it
        if (minIndex + 1 <= maxIndex)
        {
            ret.index = minIndex;
            ret.found = false;
            return ret;
        }

        // fit a new line
        m = (float(max) - float(min)) / float(maxIndex - minIndex);
        b = float(min) - m * float(minIndex);
    }

    return ret;
}

Hybrid Search

Since binary search and linear fit search both have situationally good properties, I decided to try a hybrid of the two where it switches between the two for each guess. The first guess is a linear fit, the next is a binary search guess, then back to linear fit, and so on.

Here’s where that puts things with the previous worst case scneario: the linear data with a single huge outlier. New graph on top, old on bottom for comparison. Apologies that the colors aren’t consistent between old and new! 😛

There’s quite a bit of variance, and the linear fit min and max contains the binary search min and max, but on average it does beat the binary search now, which is kind of neat.

Let’s analyze the line fit worst performers to best performers and see how the hybrid search compares.

Here’s the log function:

The variance has decreased compared to line fit. The average beats binary search too, where the non hybrid test didn’t.

Next is the cubic function:

With the non hybrid approach, cubic on average was barely beating binary search and had a huge amount of variance. The hybrid average is beating binary search by a larger margin and the variance has dropped a lot.

Here’s quadratic:

The line fit search beat binary search, like the hybrid search does. It even beats it by roughly the same amount. The hybrid search has a lot less variance though, which is a nice property. You’ll have more consistent timings as you search.

Here’s random:

The hybrid search does a little worse both for average, and variance, than the linear fit search did.

Last is linear:

it’s impossible to see where the hybrid max line is, but it went up to 3, from the 2 that line fit max was at, which also brings the average up just a little bit. In my opinion, that isn’t so bad that we slightly damaged the perfectly linear and random cases in favor of making it much more robust in the general case.

Here is my more readable, less optimized code for the hybrid search. The only meaningful difference is on line 48 where it chooses to do a linear fit or binary search step, and line 72 where it toggles which one it does next.

TestResults TestList_HybridSearch(const std::vector<size_t>& values, size_t searchValue)
{
    // On even iterations, this does a line fit step.
    // On odd iterations, this does a binary search step.
    // Line fit can do better than binary search, but it can also get trapped in situations that it does poorly.
    // The binary search step is there to help it break out of those situations.

    // get the starting min and max value.
    size_t minIndex = 0;
    size_t maxIndex = values.size() - 1;
    size_t min = values[minIndex];
    size_t max = values[maxIndex];

    TestResults ret;
    ret.found = true;
    ret.guesses = 0;

    // if we've already found the value, we are done
    if (searchValue < min)
    {
        ret.index = minIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue < max)
    {
        ret.index = maxIndex;
        ret.found = false;
        return ret;
    }
    if (searchValue == min)
    {
        ret.index = minIndex;
        return ret;
    }
    if (searchValue == max)
    {
        ret.index = maxIndex;
        return ret;
    }

    // fit a line to the end points
    // y = mx + b
    // m = rise / run
    // b = y - mx
    float m = (float(max) - float(min)) / float(maxIndex - minIndex);
    float b = float(min) - m * float(minIndex);

    bool doBinaryStep = false;
    while (1)
    {
        // make a guess based on our line fit, or by binary search, depending on the value of doBinaryStep
        ret.guesses++;
        size_t guessIndex = doBinaryStep ? (minIndex + maxIndex) / 2 : size_t(0.5f + (float(searchValue) - b) / m);
        guessIndex = Clamp(minIndex + 1, maxIndex - 1, guessIndex);
        size_t guess = values[guessIndex];

        // if we found it, return success
        if (guess == searchValue)
        {
            ret.index = guessIndex;
            return ret;
        }

        // if we were too low, this is our new minimum
        if (guess < searchValue)
        {
            minIndex = guessIndex;
            min = guess;
        }
        // else we were too high, this is our new maximum
        else
        {
            maxIndex = guessIndex;
            max = guess;
        }

        // if we run out of places to look, we didn't find it
        if (minIndex + 1 <= maxIndex)
        {
            ret.index = minIndex;
            ret.found = false;
            return ret;
        }

        // fit a new line
        m = (float(max) - float(min)) / float(maxIndex - minIndex);
        b = float(min) - m * float(minIndex);

        // toggle what search mode we are using
        doBinaryStep = !doBinaryStep;
    }

    return ret;
}

Random Odds and Ends

Just like binary search, the linear fit and hybrid search algorithms can return you the index to insert your value into the list, if not present.

Some folks may balk at the idea of having the min and max value of the list before you do a search, from the point of view that it’s sort of like 2 guesses that aren’t being counted against the graph. If that’s your point of view, you can add 2 to the values graphed and you can see that the hybrid search is still compelling. I think it’s perfectly reasonable that you’d know the min and max of a sorted list though. After all, we store the length, why not also the min and max?

It may not be optimal to do 1 step of line fit search and 1 step of binary search in the hybrid search method. It might be that by doing something like 1 binary step then 3 line fit steps, and repeating that pattern, may give you better results. It may also be a better idea to just do line fit search, but if you aren’t making good enough progress, throw in a binary search step. I didn’t explore this at all due to the “nice enough” results i got switching off every time.

(Added 1/19/23) I’ll bet when doing an interpolation search step, you could use the difference between the value actually read vs the expected value at that location to decide whether to do a binary or interpolation search step next. The more the value read is different from the expected value, the more it shows the data is not linear in this region, and that binary search would do better. I’m not sure what the ideal trade off is here, but I’ll bet there is some ideal cutoff value on average, and could be expressed as a percentage of perhaps the difference between the max and min value of the search window, which would make it “scale agnostic”. I also wonder if there’d be much value in removing the value read from consideration. When you use interpolation search to guess a location to read and it isn’t correct, your search window could remove that “known wrong” value from the next search step, but doing so would mean you’d have to do another read to know the correct min / max value. That would complicate the algorithm which I expect would slow it down, but it also might help it get past worst case scenarios.

I had a thought that it might be good to try doing an “online linear squares fit” while making guesses so that you learned the shape of the list while searching it. If that sounds interesting to you, give this a read: https://blog.demofox.org/2016/12/22/incremental-least-squares-curve-fitting/. I suspect that having a more localized fit (like in this post) performs better, but I might be wrong. I could also see doing a least squares fit of the data offline in advance so you had that data available, like a min and a max, before you started the search. A problem with doing a fit in general though is that you have to be able to invert the function of whatever you fit the data with. Quadratic or cubic seem like they are probably the limit of what you’d want to try to avoid ringing and the complexity of higher order function inversion.

You can make binary searches more cache friendly by putting them into binary trees stored in arrays. This makes it so for instance, that when you test index 0, you are really testing the half way point. If the search value is less than index 0, you look at index 1, else you look at index 2. The left and right child of an index is just index*2 and index*2+1. I bring this up, because the “fixed guess points” of a binary search make this possible. A linear fit search doesn’t have fixed guess points, which makes it not possible to do the same thing. I’m betting with some creativity, some better cache friendliness could be figured out for a linear fit search.

Following in that idea, is the concept of a cache oblivious b-tree. Check it out here: https://github.com/lodborg/cache-oblivious-btree

Another nice property of binary searching is that you can make it branchless and very SIMD friendly, or very friendly for simple hardware implementations. A linear fit search doesn’t seem as well suited for that, but again, maybe some creativity could help it be so. Here’s more about binary search operating like I just described: https://blog.demofox.org/2017/06/20/simd-gpu-friendly-branchless-binary-search/

Lastly, you might have noticed that the graph for the linear data set showed that the line fit and hybrid searches were taking fewer guesses as the list got larger. It looks impossible, and lets me make this dank meme:

What the heck is going on there?

The x axis of those graphs shows how large the list is, and the y axis is how many guesses are taken, but in all those linear lists of each size, the list linearly breaks up the range [0,2000]. It’s also always searching for random numbers in [0,2000]

In smaller lists, the numbers are more sparse, while in larger lists the numbers are more dense.

If you have a linear data set, and are using a linear fit to look for a number in that list that may or may not be there, a denser list will have the values there more often, and the first guess is going to more often be the correct location of the search value.

That’s what is happening, and that’s why it’s showing an improvement in the linear case as the list gets larger, because it’s also getting more dense.

Here’s a graph for a version of the test where the density is kept the same for each list. The lists are between [0,5*count] and the search values are in the same range.

It’s interesting and kind of cool that both the average and min/max are flat, but this is a best case scenario for the line fit (and hybrid) search, with the data actually being linear.

Performance

Ok finally we get to performance. Many of you fine folks were probably looking at the guess count graphs and thinking “So what? Where’s the perf measurements?” TL;DR I think this is a pareto frontier advancement but i’ll explain more.

here are the perf results but don’t be too quick to say “aha!”, because they need some explanation and context. These results are on my modern-ish gaming laptop.

Results:

Linear search takes ~1.5 nanoseconds per guess. (eg, increment the index and read the next value from the array)
Binary search takes ~5 nanoseconds per guess.
Both linear fit and hybrid search takes ~12 nanoseconds per guess.

So, from my tests, binary search would need to take 2.5 times as many guesses as linear fit or hybrid searching to break even. The only case where that is true in my tests is the purely linear list.

Now that I’ve said that, I don’t think the tests I’ve done are really a good apples to apples comparison.

What I did as a test was generate lists of the various types described above, generated a list of random numbers to search for in them, then had each search algorithm do all the searches and i divided the total time by the total number of guesses done to get a time per guesses for each algorithm.

It is true that the linear fit is slightly more complicated logic than a binary search, or the linear search, so computationally I do expect it to take longer, and the 2.5x as long seems like a fair measurement.

HOWEVER, searching the same list over and over is an unrealistic pattern for most applications. More of the list would be likely to be in the cache when doing multiple searches back to back like this, so memory reading would be under-reported in the profiling.

Because the linear fit (and hybrid) searches are more computationally expensive, but end up doing fewer guesses, they use more cpu, but less memory bandwidth. That means that the wins they give would show up in times when memory reads (or wherever the list was stored) were slower. Having the list in the cache is not a time when the reads are going to be slower, so I think the testing is stacked against the linear fit and hybrid testing.

That said, I can’t think of a better “canned performance test” to compare apples to apples. You really would need to drop it in, in a realistic usage case for searching in an application, and see if it was better or worse for that specific usage case.

If you were memory bandwidth bound, and thus had some compute to spare, this search seems like it could possibly be a nice option. Or, in exotic situations where reading a list was VERY VERY slow (remote servers, homomorphic encryption, data stored on disk not in memory?) this could be a better algorithm. In those exotic situations where reads are way more expensive that computation, you’d probably want to go further though, and use more advanced algorithms to really make every guess count, using a lot more CPU to do so.

Lastly on perf: none of this code has been optimized. I wrote it for clarity, not speed. It’s possible that the comparison landscape could change (either for better or worse) with optimized code.

If anyone investigates perf more deeply, I’d love to hear results and in what context those results were found. Thanks!

Quadratic Fit Search and Beyond?

An obvious questions is: can this search technique extend to quadratic and beyond?

I do think so. Let’s look at how that might work, and then i’ll point out some complications that make it more challenging.

Let’s think about the quadratic case. You’d need to start with a quadratic fit of the data, which would require 3 data samples from the list. Two data samples would be the first and last index just like the linear search, but where should the third data point be from?

One place it could be is in the middle of the list. If you can afford more processing time than that, you might consider picking whatever index gives the lowest error between the quadratic fit and the actual data stored in the array.

Now you have a quadratic fit of the data in the array and can begin searching. You have some y=f(x) function that is quadratic, and you invert it to get a x=f(y) function. All is well so far.

You make your first guess by pluggin your search value in for y and getting an x out which is your first guess for where the number is. When you read that number, if it is the search value, you are done. If it doesn’t match though, what do you do?

Your guess point is going to be between your min and max, but it might be to the left or the right of the third point you have in the quadratic fit. That is two possibilities.

Your guess may also be too low, or too high. That is two more possibilities, making for four possible outcomes to your guess.

Let’s say your guess was to the left of the “third point” and deal with these two outcomes first:

If your guess was less than the search value, it means that your guess is the new minimum.
If your guess was greater that the search value it means that your guess is the new maximum. A problem though is that your “third point” is now to the right of the search maximum. This isn’t so bad because it still fits real data on the curve but it seems a little weird.

If your guess was on the right of the “third point”, we have these two outcomes to deal with:

If your guess was less than the search value, the guess is the new minimum, and the “third point” in the quadratic fit is to the left and is less than the minimum.
If your guess was greater than the search value, the guess is the new maximum.

Are you with me so far? the “third point” seems oddly stationary at this point, but the next round of searching fixes that.

On the second step of searching (and beyond), we have some new possibilities to add to the previous four. The “third point” can either be less than the minimum or greater than the maximum. That is two possibilities.

And once again, we have two possibilities in regards to what our guess found: The guess value could be lower than the search value, or it could be higher.

Due to symmetry, let’s just consider the “third point” to be greater than our max, and then we can just consider the less than and greater than case:

If our guess was too small, it’s the new minimum.
If our guess was too large, it’s the new maximum, but the old maximum becomes the new “third point”. This moves the “third point” to be more local, giving us a more local quadratic fit of our data, which should help the search make better guesses.

So now, the “third point” moves around, and the quadratic fit is updated to be a localized fit, like we want it to be.

For the cubic case and above, I’ll leave that to you to sort out. It just is updating the minimum and maximums based on the guess value vs search value, and then doing a dance to make sure and keep the most local points around for the curve fit of the data, and throwing out the less local points to make room. I am pretty sure it’s extendable to any degree you want, and that one algorithm could be written to satisfy arbitrary degrees.

Now onto a complication!

Our very first step is to make an initial fit of data of whatever degree and then invert it. To invert the function, it needs to be monotonically increasing – aka there is no part on the graph where if you look at the point to the left, it’s higher. Each point on the graph should be higher than the point to the left.

The bad news is that if even looking at the quadratic case, making a quadratic curve pass through 3 data points A, B, C where A <= B <= C, the result is very often NOT going to be monotonic.

That means you are going to have a bad time trying to invert that function to make a guess for where a search value should be in the list.

I think a good plan of attack would be to fit it with a monotonic quadratic function that didn’t necessarily pass through the 3 data points. That would affect the quality of your guess, but it might (probably should??) do better at guessing than a line fit, at the cost of being more computationally expensive. I’m not sure how to do that specifically, but I’d be surprised if there wasn’t an algorithm for it.

For details on how even quadratic often isn’t monotonic:
https://twitter.com/Atrix256/status/1108031089493184512

Some possibly good leads to dealing with this:

https://math.stackexchange.com/questions/3129051/how-to-restrict-coefficients-of-polynomial-so-the-function-is-strictly-monotoni

https://en.wikipedia.org/wiki/Monotone_cubic_interpolation

Updates

Jannis Harder on mastodon says.. (https://mathstodon.xyz/@jix/109717359203055176)

Another way to combine interpolation and bisection to get the best of both is the ITP method (https://en.wikipedia.org/wiki/ITP_method) which supposedly performs very well in practice while also having good worst case guarantees. Haven’t had a use case to try it myself yet, though.

Rahul Narain says… (https://mathstodon.xyz/@narain/109720504826215482)

Looks like you are basically reinventing the false position method and its improvements, only on arrays rather than on continuous functions: https://en.wikipedia.org/wiki/Regula_falsi

Marc B. Reynolds send me a 2021 paper relevant here:
https://mastodon.gamedev.place/@mbr/115350671561297933

Closing

Thanks for reading. Hopefully you found it enjoyable.

If you use this, or do any related experimentation, I’d love to hear about it.

You used to be able to find me on twitter at https://twitter.com/Atrix256

You can now find me on mastodon at https://mastodon.gamedev.place/@demofox

Don’t Convert sRGB U8 to Linear U8!

In this post I’m going to explain something that I have been doing wrong for a while in my at home graphics programming projects, and show you the noticeable loss in image quality it causes.

The C++ code that generated the data and images for this post is on github. https://github.com/Atrix256/RandomCode/tree/master/sRGBPrecision

sRGB vs Linear

Every image that is meant to be displayed on your screen is an sRGB image. That’s what it means to be an sRGB image.

When doing things like applying lighting, generating mip maps, or blurring an image, we need to be in linear space (not sRGB space) so that the operations give results that appear correct on the monitor.

This means an sRGB image needs to be converted to linear space, the operations can then be done in linear space, and then the result needs to be converted back to sRGB space to be displayed on a monitor.

If this is news to you, or you are unsure of the details, this is a good read on the topic: Linear-Space Lighting (i.e. Gamma)

A small example of why this matters is really driven home when you try to interpolate between colors. The image below interpolates from green (0, 1, 0) to red (1, 0, 0)

out_gradients_labeled .

The top row interpolates in sRGB space, meaning it interpolates between those colors and writes out the result without doing any other steps. As you can see, there is a dip in brightness in the middle. That comes from not doing the operation in linear space.

The second row uses gamma 1.8. What is meant by that is that the color components are raised to the power of 1.8 to convert from sRGB to linear, the interpolation happens in linear space, and then they are raised to the power of 1.0/1.8 to convert from linear to sRGB. As you can hopefully see, the result is much better and there is no obvious drop in brightness in the middle.

Getting into and out of linear space isn’t so simple though, as it depends on your display. Most displays use a gamma of 2.2, but some use 1.8. Furthermore, some people do a cheaper approximation of gamma operations using a value of 2.0 which translates into squaring the value to make it linear, and square rooting the value to take it back to sRGB. You can see the difference between those options on the image.

The last row is “sRGB”, which means it uses a standard formula to convert from sRGB to linear, do the interpolation, and then use another standard formula to convert back to sRGB.

You can read more about those formulas here: A close look at the sRGB formula

The Mistake!

The mistake I was making seemed innocent enough to me…

Whenever loading an image that was color information (I’m not talking about normals or roughness maps here, just things that are colors), as part of the loading process I’d take the u8 image (aka 8 bits per channel), and convert it from sRGB to linear, giving a result still in u8.

From there, I’d do my rendering as normal, come up with the results, convert back to linear and go on my way.

Doing this you can look at your results and think “Wow, doing lighting in linear space sure does make it look better!” and you’d be right. But, confirmation bias bites us a bit here. We are missing the fact that by converting to linear, and storing the result in 8 bits means that we lost quite a bit of precision in the dark colors.

Here are some graphs to show the problem. Blue is the input color, red is the color after converting to linear u8, and then back to sRGB u8. Yellow is the difference between the two. (If wondering why I’m showing round trip instead of just 1 way, think about what you are going to do with the linear u8 image. You are going to use it for something, then convert it back to sRGB u8 for the display!)

Gamma 1.8

Gamma 2.0

Gamma 2.2

sRGB

As you can see, there is quite a bit of error in the lower numbers! This translates to error in the darker colors, or just any color which has a lower numbered color component.

The largest amount of error comes up in gamma 2.2. sRGB has lower initial error, but has more error after that. I would bet that was a motivation for the sRGB formulas, to spread the error out a bit better at the front.

Even though gamma 2.2 and sRGB looked really similar in the green to red color interpolation, this shows a reason you may prefer to use the sRGB formulas instead.

Another way of thinking about these graphs is that there are a quite a few input numbers that get clamped to zero. At gamma 1.8, an input u8 value of 12 (aka 0.047) has to be reached before the output is non zero. At gamma 2.0, that value is 16. At gamma 2.2 it’s 21. At sRGB it’s 13.

Showing graphs and talking about numbers is one thing, but looking at images is another, so let’s check it out!

Below are images put through the round trip process, along with the error shown. I multiplied the error by 8 to make it easier to see.

Gamma 1.8

Gamma 1.8 isn’t the most dramatic of the tests but you should be able to see a difference.

Error*8:

Gamma 2.0

Gamma 2.0 is a bit more noticeable.

Error*8:

Gamma 2.2

Gamma 2.2 is a lot more noticeable, and even has some noticeable sections of the images turning from dark colors into complete blackness.

Error*8:

sRGB

sRGB seems basically as bad as Gamma 2.2 to me, despite what the graphs showed earlier.

Error*8:

Since this dark image was basically a “worst case scenario”, you might wonder how the round trip operation treats a more typical image.

It actually has very little effect, except in the areas of shadows. (these animated gifs do show more color banding than they should, and some other compression artifacts. Check out the source images in github to get a clean view of the differences!)

Gamma 1.8

Error*8:

Gamma 2.0

Error*8:

Gamma 2.0

Error*8:

sRGB

Error*8:

So What Do We Do?

So, while we must work in linear space, converting our sRGB u8 source images into linear u8 source images causes problems with dark colors. What do we do?

Well there are two solutions, depending on what you are trying to do…

If you are going to be using the image in a realtime rendering context, your API will have texture format types that let you specify that a texture is sRGB and needs to be converted to linear before being used. In directx, you would use DXGI_FORMAT_R8G8B8A8_UNORM_SRGB instead of DXGI_FORMAT_R8G8B8A8_UNORM for instance.

If you are going to be doing a blur or generating mip maps, one solution is that you convert from sRGB u8 to linear f32, do your operation, and then convert from linear f32 back to sRGB u8 and write out the results. In other words, you do your linear operations with floating point numbers so that you never have the precision loss from converting linear values to u8.

You can also do your operations in u16 instead of u8 apparently, and also f16 which is a half float.

The takeaway is that you should “never” (there are always exceptions) store linear color data as uint8 – whether in memory, on disk, or anywhere else.

I’ve heard that u12 is enough for storage though, for what that’s worth.

Links Etc

Thanks @romainguy for suggesting a color interpolation for the opening image of this post. It’s a great, simple example for seeing why sRGB vs linear operations matter.

Here is some more info on sRGB and related things from Bart Wronski (@BartWronsk):

Part 1 – https://bartwronski.com/2016/08/29/localized-tonemapping/

Part 2 – https://bartwronski.com/2016/09/01/dynamic-range-and-evs/

And this great presentation from Timothy Lottes (@TimothyLottes)

Advanced Techniques and Optimization of HDR Color Pipelines

This from Matt Pettineo (@MyNameIsMJP) is also very much on topic for this post:

https://www.gamedev.net/forums/topic/692667-tone-mapping/?page=3&tab=comments#comment-5360306

Granular Audio Synthesis

If you want to make a sound shorter, you can play it faster. Doing this also makes it higher pitch unfortunately.

If you want to make a sound longer, you can play it more slowly. This also makes it lower pitch though.

Sound length and pitch are tied together and there’s no way to change one without changing the other.

… Actually that’s a lie. Granular synthesis can be used to change playback speed and pitch independently!

This post talks about how granular synthesis works, gives some examples you can listen to, and also supplies simple standalone C++ code that does it. (680 lines of code, one source file, only standard C++ includes, no libraries used.)

By the end of this post, you should even be able to program your own “autotune” effect.

The code and audio samples are available on github here: https://github.com/Atrix256/GranularSynth

Granular Synthesis Basics

Granular synthesis is conceptually pretty simple. The first step is to break a sound file up into small sections of sounds called “grains” that are typically between 10 and 100 milliseconds long.

You don’t have to do anything special to make grains, you just literally cut the sound up into a bunch of pieces.

To make a sound twice as long, you then make a new sound where each grain is repeated twice. When you play it back, it will sound mostly the same and be the same pitch, but will be twice as long.

To make a sound half as long, you would just throw away every other grain. The result is a sound that mostly sounds the same, and is the same pitch, but is half as long.

You aren’t restricted to integers though. You could easily throw out every 3rd grain to make it 2/3 as long or repeat every 5th grain to make it 20% longer.

You can also adjust pitch instead of length.

To make a sound that is the same length, but has twice as high pitch, you would double each grain, but play them back twice as fast. That would result in a sound that was the same length but had frequencies that were twice as high.

To make a sound that is the same length, but had half as high of a pitch, you would throw out every other grain, but play the ones you kept twice as slowly. That would result in a sound that was the same length but had frequencies that were half as high.

Congratulations, you now understand granular synthesis!

There are other usage cases of granular synthesis, but they are a lot more exotic – like making crazy cool sounds.

There are also variations of granular synthesis where grains overlap instead of being omitted, when dealing with sounds getting shorter, or grains being played a non integer number of times.

Check out this youtube video to see an insane usage case for real time granular synthesis. WTF!
Youtube: Drum Sound Experiment – Dynamic Granular Synthesis

Some Granular Synthesis Gotchas

If you just do the above, you are going to have some issues with clicking and popping. When you put grains next to each other that weren’t next to each other before, there is going to be a discontinuity in the audio wave form, which translates into very short, very high frequencies that make a popping noise.

What’s more is if your grain size is 20 milliseconds, that means you will get a pop 50 times a second, which means you’ll get a 50hz tone from the popping.

So, how do you fix this?

One way is to use envelopes to do cross fading.

If you wanted to put down grain A and then grain C that would make a pop because A and C expect B to be between them.

Using an envelop to fix this problem, you’d put down grain A, and then next to it you’d put down grain B but have it’s volume go from 1 to 0 over some length of time (like 2 milliseconds). You then ADD grain C on top of grain B, but have it’s volume from from 0 to 1 over the same length of time.

The result is that there is no immediate “pop” from discontinuous wave forms. Instead, it gently fades from one grain to the next over the length of the cross fade.

Note that the above is equivalent to linearly interpolating from grain B to grain C over the length of the envelope, it’s just done in two passes.

I’ve heard that another way to handle this problem is that instead of cutting grains perfectly at the time / length they should be at, you make the cuts at zero crossings that are closest to the desired cut position.

What this does is make it so you can put any grain next to any other grain, and they should fit together pretty decently. This gives you C0 continuity by the way, but higher order discontinuities still affect the quality of the result. So, while this method is fast, it isn’t the highest quality. I didn’t try it personally, so am unsure how it affects the quality in practice.

Another issue you will encounter when doing granular synthesis is wanting to play back a grain at a non integer playback speed. This means you may want to sample index 0, then index 1.1, then index 2.2 and so on. How do you sample a fractional index?

A quick and easy way is to use the fractional part of the index to linearly interpolate between the two samples it’s in between. That means that sampling index 2.2 would be a linear interpolation between index 2 and index 3, and would be 20% of the way from index 2 to index 3. AKA it would be value[2] * 0.8 + value[3] * 0.2;

Another way to sample fractionally is to use cubic hermite interpolation. It’s more expensive to compute but interpolates more smoothly, and perserves first order derivatives.

You can read more about that on my blog post here: Cubic Hermite Interpolation

Lastly, there are a couple parameters to these techniques that have to be hand tuned for the situation they are used in:

Grain Size – Some usage cases want larger grain sizes, while others want smaller grain sizes. You’ll have to play with it and see what’s best for your usage case. Again, I’ve heard that typical grain sizes can range between 10 and 100 milliseconds.
Envelope Size – The size of the envelope used for cross fading can affect the result as well. Too short an envelope will start to make popping happen, but too long an envelop will muddy your sound.

Drums and other percussion instruments have a particularly hard time with granular synthesis because they are usually made up of very short but noticeable sounds. If these sounds get (partially?) repeated, it will sound weird and wrong.

Check the links at the end of the post for deeper reads into these topics and more!

Experiments & Results

For all experiments, I used a sound clip from one of my favorite movies “Legend” where Tim Curry plays the devil, and Tom Cruise and Mia Sara are the main characters fighting him.

legend1.mp3:

The experiments use cubic hermite interpolation to sample fractionally, and they use crossfading to fight popping between grains. Everything uses a grain size of 20 milliseconds, and a cross fade of 2 milliseconds.

Naive Pitch / Length Adjustment

To start out, we can play the sound faster and slower to naively adjust pitch and length.

Here’s a fast / high pitched version (70% of time):

Here’s an even faster / higher pitched version (40% of time):

Here’s a slow / low pitched version (130% of time):

And here’s an even slower / lower pitched version (210% of time):

Granular Synth Length Adjustment

Here is a sound made shorter (70% of time) using granular synthesis, so is the same length as the fast/high version, but has the same pitch as the original. Pretty cool, right?

Here is the sound made even shorter (40% of time) so is the same length as the faster/higher version, but has the same pitch as the original again.

Here is the sound made longer (130% of time), but again has the same length as the original.

And here is the even longer version (210% of time).

Granular Synth Pitch Adjustment

If we want to adjust the pitch but leave the length alone, there are two ways you could do that.

The first way is to use granular synthesis to change the length of the sound (longer or shorter), keeping the pitch the same, then use the regular “naive” method to make that resulting sound be the original sound length again.

If you made the sound shorter to start with, this process would decrease the pitch. If you made the sound longer to start with, this process would increase the pitch.

Here is a sound where that process is used to make the pitch about 1.43 times higher (1.0/0.7), but keeps the same sound length.

Another way to get a very similar result is to just change the playback rate of the grains themselves – INDEPENDENTLY of how many times you repeat the grains (0 to N times) which changes sound length.

Here is a sound made doing that. It plays each grain back ~1.43 times faster, but makes a sound that is the same length in the end. My ears can’t tell the difference, even though I do know there is one. This is the technique we’ll be using for the rest of the experiments.

Here is a higher pitched sound that plays back each grain 2.5 times faster (1.0 / 0.4).

You can use this to make the sound lower pitched as well. Here is a sound where we play the grains back at 0.77 speed (1.0 / 1.3).

Here they play back at 0.48 speed (1.0 / 2.1).

Granular Synth Pitch and Length Adjustment

To really drive home how pitch and length adjustments can be made independent with granular synthesis, here is a sound where it plays back more slowly (130% as much time), but it plays back at a higher pitch (~1.43 times as high).

And the opposite… here is the sound played back more quickly (70% as much time), but it plays back at a lower pitch (~0.77 times as high)

Dynamic Parameters

Something else fun is that the parameters don’t have to be fixed at runtime.

In the example code, I have a version of the granular synthesis function that calls back to a lambda for every grain to see what time and pitch multiplier it should use.

Here the pitch is on a 10 cycle sine wave going between 0.75 and 1.25.

Here the sound length is on a 13 cycle sine wave going between 0.5 and 2.5.

And lastly, here it combines the pitch and sound length parameters described above.

Links

Below are some great links for more information about granular synthesis. I also recommend searching youtube for “granular synthesis examples” to hear some really out there stuff.

https://www.soundonsound.com/techniques/granular-synthesis

https://granularsynthesis.com/guide.php

https://theproaudiofiles.com/granular-synthesis/

I also want to mention that the basics of this technique was kindly described to me at lunch by a co-worker. His web page is at http://antonte.com/

C++ Differentiable Programming: Searching For An Optimal Dither Pattern

The simple standalone C++ source code that implements this blog post and replicates the results shown is on github at: https://github.com/Atrix256/DitherFindGradientDescent

Neural networks are a hot topic right now. There is a lot of mystery and mystique surrounding them, but at their core, they are just simple programs where parameters are tuned using gradient descent.

(If curious about neural networks, you might find this interesting: How to Train Neural Networks With Backpropagation)

Gradient descent can be used in a lot of other situations though, and in fact, you can even generalize the core functionality of neural networks to work on other types of programs. That is exactly what we are doing in this post.

To be able to use gradient descent to optimize parameters of a program, your program has to be roughly of the form of:

It has parameters that specify how it processes some other data
There is some way for you to give a score to how well it did

Beyond those two points, much like as a shader program or a SIMD program, you want your program to be as branchless as possible. The reason for this is because ideally your entire program should be made up of differentiable operations. Branches (if statements) cause discontinuities and are not differentiable. There are ways to deal with branches, and some branches don’t actually impact the result, but it’s a good guideline to keep in mind. Because of this, you also want to stay away from non differentiable functions – such as a “step” function which you might be tempted to use instead of an if statement.

This post is going to go into detail about using differentiable programming in C++ for a specific goal. Results are shown, and the simple / no external dependency C++ code that generated them are at https://github.com/Atrix256/DitherFindGradientDescent.

First, let’s have a short introduction to gradient descent.

One Dimensional Gradient Descent

If you have a function of the form $f(x)$ , it takes one input so is one dimensional.

You can think of a function like this as having a value for every point on the number line.

You can visualize those values as a height, which gives you a function of the form $y=f(x)$ which we are still going to call one dimensional, despite it now having two dimensions.

Let’s look at a function $y=3x+1$

You might remember that the equation of a line is $y=mx+b$ where m is the slope of the line ( $\frac{\text{rise}}{\text{run}}$ or $\frac{y}{x}$ ) and b is where the line crosses the y axis.

In calculus, you learn that the slope m is also the derivative of the function: $\frac{dy}{dx}$

The slope / derivative tells you how much is added to y for every 1 you add to x.

Let’s say that you were on this graph at the point $x=1$ (which puts you at $y=4$ ), and let’s say that you want to go downhill from where you were at. You could do that by looking at the slope / derivative at that point, which is 3 (it’s 3 for every point on the line). Since the derivative is positive, that means going to the right will make the y value larger (you’ll go up hill) and going to the left will make the y value smaller (you’ll go down hill).

So, if you want to go downhill to a smaller y value, you know that you need to subtract values from x.

A simpler way to think of this is that you need to subtract the derivative from your x value to make your y value smaller.

That is a core fact that will help guide you through things as they get more difficult: subtract the derivative (later, subtract the gradient) to make your value smaller. The value subtracted is often multiplied by some scalar value to make it move faster or slower.

What happens if you have a more complex function, such as $y=(x-2)^2$ ?

Let’s say that you are on this graph at the point $x=1$ , which puts you at $y=1$ . Now, which way do you move to go downhill?

The derivative of this function is $y=2x-4$ , which you can plug your x value into to get the slope / derivative at that point: -2.

Remembering that we subtract that derivative to go down hill, that means we need to subtract a negative value from our x; aka we need to ADD a value to our x.

As you can see, adding a value to x and making it move to the right does in fact make us go down hill.

The rule works, hooray!

Two Dimensional Gradient Descent

Things do get a little more complex when there’s more than one dimension, but not really that much more complex, so hang in there!

Let’s look at the function $z=xy$

Let’s say that we are at the (x,y) point (1,1) – in the upper right corner – which puts us at $z=1$ , and let’s say that we want to go down hill. Instead of just having one variable to take the derivative of (x), we now have two variables (x and y). How are we going to deal with this?

The answer is PARTIAL derivatives.

First up, we are going to pretend that y is a constant value, and not actually a variable. This will give us the partial derivative for x: $\frac{\partial z}{\partial x}$ . That tells us how much we would add to z if we added one to x. It’s a slope that is specifically down the x axis.

In this case, the partial derivative of z with respect to x is just: y.

Doing the same thing for the other variable, the partial derivative of z with respect to y is just: x.

Now that we have partial derivatives for each variable, we put them into a vector. This vector is called the gradient, and has some intimidating notation that looks like this:

$\nabla z = \nabla f(x,y) = (\frac{\partial z}{\partial x}, \frac{\partial z}{\partial y})$

For this function, the gradient is:

$\nabla z = \nabla f(x,y) = (y,x)$

That makes the gradient at our specific point:

$\nabla z = \nabla f(1,1) = (1,1)$

In the last section we saw that the derivative / slope pointed to where the function got larger. The same thing is true of gradients, they point in the direction where the function gets larger too!

So, if we want to go downhill, we need to subtract values from our x and our y to go there. In fact, we know that the steepest way down from our current point is when we subtract the same value from both x and y. This is because the gradient doesn’t just point to where it gets larger, it points to where it gets larger the FASTEST. So, the reverse of the gradient also points to where it gets smaller the fastest.

Pretty cool huh?

You can confirm this visually by looking at the graph of the function.

One last things about slopes, derivatives and gradients before moving on. While they do point in the direction of greatest increase, they are only valid for an infinitely small point on the graph for functions that are non linear. This will be important later when we move in the opposite direction of the gradients, but do so with very small steps to help make sure we find the lowest points on the graph.

Why Gradient Descent?

Why do we want to use gradient descent? Imagine that we have a function:

$w=f(x,y,z)$

Sure, we can pick some random starting values for x,y and z, and then use gradient descent to find the smallest w, but who cares?

Let’s give some other names to these variables and see if the value becomes a little more apparent:

$DamageTakenMultiplier = CalculateDamageTakenMultiplier(Armor, Dodge, Resist)$

Now, by only changing the names of the variables, we can see that we could use gradient descent to find what amount of Armor, Dodge and Resist would make it so our character takes the least amount of damage. This can now tell you how to distribute stat points to a character to get the best results 😛

Note that if you are ever trying to find the highest number possible, instead of the lowest, you can just multiply your function by -1 and do everything else the same way. You could also do gradient ASCENT, but it’s equivalent to multiplying by -1 and doing gradient descent.

Problems

Here are a few common problems you can encounter when doing gradient descent.

Local minima – when you get to the bottom of a bowl, but it isn’t the deepest bowl.
Flat derivatives – these make it hard to escape a local area because the derivatives are very small, which will make each movement also very small.
Discontinuities – The problem space (graph) changes abruptly without warning, making gradient descent do the wrong thing

Here’s an example of a local minima versus a global minima. You can see that depending on where you start on this graph, you might end up in the deeper bowl, or the shallower bowl if your only rule is “move downhill”.

(Image from wikipedia By KSmrq – http://commons.wikimedia.org/wiki/File:Extrema_example.svg, GFDL 1.2, https://commons.wikimedia.org/w/index.php?curid=6870865)

Here’s an example of a flat derivative. You can imagine that if you were at x=1, that you could see that the derivative would tell you to go to the left to decrease the y value, but it’s a very, very small number. This is a problem because it’s common to multiply the derivative or gradient by a multiplier before subtracting it, so you’d only take a very small step towards the goal.

It’s also possible to hit a perfectly flat derivative, which will be exactly 0. In this case, no matter how big or small of a number you multiply the derivative by, you won’t move AT ALL!

Below is a discontinuous function where if x is less than 0.5, the value is 1, otherwise the value is x. This essentially shows you what happens when you use if statements in differentiable programming. If you start on the right side, it’s going to correctly tell you that you should move left to improve your score. However, it’ll keep telling you to move left, until you get to x being less than 0.5, at which point your score will suddenly get a lot worse and your derivative will become 0. You will now be stuck!

There are ways to deal with these problems, but they are deep topics. If nothing else, you should know these problems exist, so you can know when they are affecting you, and/or why you should avoid them if you have a choice.

What If I Want to Avoid Calculus?

Let’s say that you don’t get a kick out of calculating all these partial derivatives. Or, more pragmatically, you don’t want to sit down and manually calculate the gradient function of some generic C++ code!

I have some great news for you.

While we do need partial derivatives for our gradients, we aren’t going to have to do all this calculus to get them!

Here are a few other ways to get partial derivatives:

Finite Differences – Conceptually super simple, but slow to calculate and not always very precise. More info: Finite Differences
Backpropagation – What neural networks use. Also called backwards mode automatic differentiation. Fast but a bit complex mentally. I linked this already but for more info: How to Train Neural Networks With Backpropagation
Dual Numbers – Also called forward mode automatic differentiation. Not as fast as backwards mode, but in the same neighborhood for speed. Super, super convinient and awesome for programmers. I love these. More info: Dual Numbers & Automatic Differentiation

Care to guess which one we are going to use? Yep, Dual Numbers!

In a nutshell, if you have code that uses floats, you can change it to use a templated type instead. Then, you put dual numbers through your code instead of floats. The output you get will be the specific value output from your code, but also the GRADIENT of your code at that value. Even better, this isn’t a numerical method (it’s not an approximation), it’s analytical (it’s exact).

That is seriously all there is to it. Dual numbers are amazing!

Since you made the code templated, you can still use it for floats when you don’t want or need the gradient.

Differentiable Programming / Gradient Descent Skeleton

Here’s the general skeleton we are going to be following for using gradient descent with our differentiable program.

Initialize the parameters to random (but valid) values, storing them in dual numbers.
Run the code that does our work, taking dual numbers as input for the parameters of how it does the work.
Put the result (which is dual numbers) into a scoring function to give us a score. Usually the score is such that smaller numbers are better. If not, just multiply the score by -1 so it is.
Since we did the work and calculated the score using dual numbers, we now have a gradient which describes how we need to adjust the parameters to make our score better.
Adjust our parameters using the gradient and go back to step 2. Repeating until whatever exit condition we want is hit: maybe when a certain number of iterations happen, or maybe when our score gets below a certain value.

That’s our game plan. Let’s dive into the specific problem we are going to be attacking.

Searching For an Ideal Dithering Pattern

Here is the problem we want to tackle:

We want to find a 3×3 dithering pattern such that when we use it to dither an image (by repeating the 3×3 pattern over and over across the image), and then blur the result by a specific amount, that it’s as close as possible to the original image being blurred by that same amount.

That sounds a bit challenging right? It’s not actually that bad, don’t worry (:

The steps the code has to do (differentiably) are:

Dither the source image
Blur the results
Blur the source image
Calculate a score for how similar they are
Use all this with Gradient Descent to optimize the dither pattern

Once again, we need to do this stuff differentiably, using dual numbers, so that we get a gradient for how to modify the dither pattern to better our score.

Step 1 – Dither Source Image

Dithering an image is a pretty simple process.

We are going to be dithering it such that we take a greyscale image as input and convert it to a black and white image using the dither pattern.

(If you are starting with a color image, this shows how to convert it to greyscale: Converting RGB to Grayscale)

For every pixel (x,y) in the source image, you look at pixel (x%3, y%3) in the dither pattern, and if the dither pattern pixel is less than the source, you write a black pixel out, else you write a white pixel out.

if (sourcePixel(x,y) < ditherPixel(x%3, y%3))
    pixelOut(x,y) = 0.0;
else
    pixelOut(x,y) = 1.0;

There’s a problem though… this is a branch, which makes a discontinuity, which will make it so we can’t have good derivatives to help us get to the goal.

Another way to write the dithering operation above is to write it like this:

difference = ditherPixel(x%3, y%3) - sourcePixel(x,y);
pixelOut(x,y) = step(difference);

Where “step” is the “heaviside step function”, which is 1 if x >= 0, otherwise is 0.

(Image from Wikipedia By Omegatron (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)%5D, via Wikimedia Commons)

That got rid of the branch (if statement), but we still have a discontinuous function.

Luckily we can approximate a step function with other functions. I decided to use the formula $0.5+atan(100*x)/pi$ which looks like this:

Unfortunately, I found that my results weren’t that good, so i switched it to $0.5+atan(10000*x)/pi$ which ended up working better for me:

This function does have the problem of having flat derivatives, but I found that it worked pretty well anyways. The flat derivatives don’t seem to be a big problem in this case luckily.

To put it all together, the differentiable version of dithering a pixel that I use looks like this:

difference = ditherPixel(x%3, y%3) - sourcePixel(x,y);
pixelOut(x,y) = 0.5+atan(10000.0f * difference) / pi;

As input to this dithering process, we take:

The source image
a 3×3 dither pattern, where each pixel is a dual number

As output this dithering process gives us:

A dithered image that is converted to black and white (either a 1.0 or 0.0 value per pixel)
It’s the same size as the source image
Each pixel is a dual number with 9 derivatives. There is one derivative per dither pixel.

Step 2 – Blur the Results

Blurring the results of the dither wasn’t that difficult. I used a Gaussian blur, but other blurs could be used easily.

I had some Gaussian blur code laying around (from this blog post: Gaussian Blur) and I converted it to using a templated type instead of floats/pixels where appropriate, also making sure there were no branches or anything discontinuous.

It turned out there wasn’t a whole lot to fix up here luckily so wasn’t too difficult.

This allowed me to take the dithered results which are a dual number per pixel, and do a Gaussian blur on them, preserving and correctly modifying the gradient (derivatives) as it did the Blur.

Step 3 – Blur the Source Image

Blurring the source image was easy since the last step made a generic gaussian blur function. I used the generic Gaussian blur function to blur the image. This doesn’t need to be done as dual numbers, so it was regular pixels in and regular pixels out.

You might wonder why this part doesn’t need to be done as dual numbers.

The simple answer is because these values are in no way dependant on the dither pattern (which are what we are tracking with the derivatives).

The more mathematical explanation is that you could in fact consider these dual numbers, which just have a gradient of zero because they are essentially constants that have nothing to do (yet) with the parameters of the function. The gradient would just implicitly be zero, like any other constant value you might introduce to the function.

Step 4 – Calculating a Similarity Score

Next up I needed to calculate a similarity score between the dithered then blurred results (which is made up of dual numbers), and the source image which was blurred (and is made up of regular pixels).

The similarity score I went with is just MSE or “Mean Squared Error”.

To calculate MSE, for every pixel you do this:

error = ditheredBlurredImage(x,y) - blurredImage(x,y);
errorSquared = error * error;

After you have the squared error for every pixel, you just take the average of them to get the MSE.

An interesting thing about MSE is that because errors are squared, it will favor smaller errors much more than larger errors, which is a nice property.

A not so nice property about MSE is that it might decide something is a small difference mathematically even though a human would decide that it was a huge difference perceptually. The reverse is also true. Despite this, I chose it because it is simple and I ended up getting decent results with it.

If you want to go down the rabbit hole of looking at “perceptual similarity scores of images” check out these links:

SSIM – http://www.cns.nyu.edu/~lcv/ssim/
GMSD – https://arxiv.org/ftp/arxiv/papers/1308/1308.3052.pdf
Multiple Scale SSIM -http://www.cns.nyu.edu/~zwang/files/papers/msssim.pdf

After this step, we have an MSE value which says how similar the images are. A lower value means lower average squared error, so lower numbers are indeed better.

What else is nice is that the MSE value is a dual number with a gradient that has the 9 partial derivatives that describe how much the MSE changes as you adjust each parameter.

That gradient tells us how to adjust the parameters (the 3×3 dither pixels!) to lower the MSE!

Step 5 – Putting it All Together

Now it’s time to put all of this together and use gradient descent to make our dither pattern better.

Here’s how the program operates:

Initialize the 3×3 dither pattern to random values, setting the derivatives to 1.0 in the gradient, for the variable that they represent.
do 1000 iterations of this loop:

Dither and blur the source image
Calculate MSE of this result compared to the source image blurred
Using the gradient from the MSE value, subtract the respective partial derivative from each of the pixels in the dither pattern, but scaling the partial derivative by a “learning rate”.

Output the best result found

The learning rate starts at 3.0 at loop iteration 0, but decays with each iteration, down to 0.1 at iteration 999. It starts above 1 to help escape local minima, and uses a very small rate at the end to try and get deeper into whatever minimum it has found.

After adjusting the dither pattern pixels, I clamp them to be between 0 and 1.

Something else I ought to mention is that while I’m doing the gradient descent, I keep track of the best scoring dither pattern seen.

This way, after the 1000 iterations are up, if we ever saw anything better than where we are at currently, we just use that instead of the final result.

Presumably, if you tune your parameters (learning rate, iterations, etc!) correctly, this won’t come up often, but it’s always a possibility that your final state is not the best state encountered, so this is a nice way to get better results more often.

Results

Did you notice that I called this post “searching for an ideal dither pattern” instead of “finding an ideal dither pattern”? (:

The results are decent, but I know they could be better. Even so, I think the techniques talked about here are a good start going down the path of differentiable programming, and similar topics.

Here are some results I was able to get with the code. Click to see the full size images. The shrunken down images have aliasing issues.

The images left to right are: The original, the dither pattern used (repeated), the dithered image, the blurred dither image, and lastly the blurred original image. The program aims to make the last two images look as close as possible as it can, using MSE as the metric for how close they are.

Here is the starting state of using a Gaussian blur with a sigma of 10:

Here it is after the 1000 iterations of gradient descent. Notice the black blob at the top is gone compared to where it started.

Here’s the starting state when using a Gaussian blur sigma of 1:

And here it is after 1000 iterations, which is pretty decent results:

Lastly, here it is with no blurring whatsoever:

And after 1000 iterations, I think it actually looks worse!

Using no blur at all makes for some really awful results. The blur gives the algorithm more freedom on how it can succeed, whereas with no blur, there is a lot less wiggle room for finding a solution.

Another benefit of using the blur before MSE calculation is that a blur is a low pass filter. That means that higher frequencies are gone before the MSE calculation. The result of this is that the algorithm will favor results which are closer to blue noise dithering. Pretty neat right?!

Closing

I hope you enjoyed this journey through differentiable programming and gradient descent, and I hope you were able to follow along.

Here are some potentially interesting things to do beyond what we talked about here:

Have it learn from a set of images, instead of only this single image. This should help prevent “over fitting” and let it find a dither pattern which works well for all images instead of just this one specific image.
Use a separate set of images to gauge the accuracy of the result that weren’t used as part of the training, to help prove that it really hasn’t overfit the training data.
Try applying “small corruption” in the learning to help prevent overfitting or getting stuck in local minima – one idea would be to have some percentage chance per derivative that you don’t apply the change to the dither pattern pixel. This would add some randomness to the gradient descent instead of it only being down the steepest direction all of the time.
Instead of optimizing the dithering patterns, you could make a formula that generated the dithering patterns, and instead optimize the coefficients / terms of that formula. If you get good results, you’ll end up with a formula you can use for dithering instead of a pattern, which might be nice for the case of avoiding a texture read in a pixel shader to do the dithering.

I’m not a data scientist or machine learning expert by any means, so there are plenty of improvements to be made. There is a lot of overlap with what is being done here and other algorithms – both in the machine learning realm and outside of the machine learning realm.

For one, you can use Newton’s method for gradient descent. It can find minima faster by using the second derivative in the calculations as well.

However, this algorithm is almost purely “exploitative” in that wherever you start with your parameters, it will try to go from there to the deepest point in whatever valley it’s already in. Some other types of algorithms differ from this in that they are more “explorative” and try to find other valleys, but aren’t always as good at finding the deepest part of the valleys that they do find. More explorative algorithms include simulated annealing, differential evolution, and genetic algorithms.

If you enjoyed this post, check out this book for deeper details on algorithms relating to gradient descent (simulated annealing, genetic algorithms, etc!). It’s a very good book and very easy to read!
Essentials of Metaheuristics

Any corrections to what i’ve said, the code, or suggestions for improvements, please let me know by leaving a comment here, or hitting me up on twitter: https://twitter.com/Atrix256

Demystifying Floating Point Precision

Floating point numbers have limited precision. If you are a game programmer, you have likely encountered bugs where things start breaking after too much time has elapsed, or after something has moved too far from the origin.

This post aims to show you how to answer the questions:

What precision do I have at a number?
When will I hit precision issues?

First, a very quick look at the floating point format.

Floating Point Format

Floating point numbers (Wikipedia: IEEE 754) have three components:

Sign bit – whether the number is positive or negative
Exponent bits – the magnitude of the number
Mantissa bits – the fractional bits

32 bit floats use 1 bit for sign, 8 bits for exponent and 23 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 127 to get the actual exponent, meaning the exponent can be from -126 to +127.

64 bit doubles use 1 bit for sign, 11 bits for exponent and 52 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 1023 to get the actual exponent, meaning the exponent can be from -1022 to +1023.

16 bit half floats use 1 bit for sign, 5 bits for exponent and 10 bits for mantissa. Whatever number is encoded in the exponent bits, you subtract 15 to get the actual exponent, meaning the exponent can be from -14 to +15.

For all of the above, an exponent of all zeros has the special meaning “exponent 0” (and this is where the denormals / subnormals come into play) and all ones has the special meaning “infinity”

The exponent bits tell you which power of two numbers you are between – $[2^{exponent}, 2^{exponent+1})$ – and the mantissa tells you where you are in that range.

What precision do I have at a number?

Let’s look at the number 3.5.

To figure out the precision we have at that number, we figure out what power of two range it’s between and then subdivide that range using the mantissa bits.

3.5 is between 2 and 4. That means we are diving the range of numbers 2 to 4 using the mantissa bits. A float has 23 bits of mantissa, so the precision we have at 3.5 is:

$\frac{4-2}{2^{23}} = \frac{2}{8388608} \approx 0.000000238418579$

3.5 itself is actually exactly representable by a float, double or half, but the amount of precision numbers have at that scale is that value. The smallest number you can add or subtract to a value between 2 and 4 is that value. That is the resolution of the values you are working with when working between 2 and 4 using a float.

Using a double instead of a float gives us 52 bits of mantissa, making the precision:

$\frac{4-2}{2^{52}} = \frac{2}{4503599627370496} \approx 0.00000000000000044408921$

Using a half float with 10 bits of mantissa it becomes:

$\frac{4-2}{2^{10}} = \frac{2}{1024} = 0.001953125$

Here’s a table showing the amount of precision you get with each data type at various exponent values. N/A is used when an exponent is out of range for the specific data type.

$\begin{array}{c|c|c|c|c} exponent & range & half & float & double \\ \hline 0 & [1,2) & 0.0009765625 & 0.00000011920929 & 0.0000000000000002220446 \\ 1 & [2,4) & 0.001953125 & 0.000000238418579 & 0.00000000000000044408921 \\ 2 & [4,8) & 0.00390625 & 0.000000476837158 & 0.00000000000000088817842 \\ 9 & [512, 1024) & 0.5 & 0.00006103515 & 0.00000000000011368684 \\ 10 & [1024,2048) & 1 & 0.00012207031 & 0.00000000000022737368 \\ 11 & [2048,4096) & 2 & 0.00024414062 & 0.00000000000045474735 \\ 12 & [4096,8192) & 4 & 0.00048828125 & 0.0000000000009094947 \\ 15 & [32768, 65536) & 32 & 0.00390625 & 0.0000000000072759576 \\ 16 & [65536, 131072) & N/A & 0.0078125 & 0.0000000000014551915 \\ 17 & [131072, 262144) & N/A & 0.015625 & 0.00000000002910383 \\ 18 & [262144, 524288) & N/A & 0.03125 & 0.000000000058207661 \\ 19 & [524288, 1048576) & N/A & 0.0625 & 0.00000000011641532 \\ 23 & [8388608,16777216) & N/A & 1 & 0.00000000186264515 \\ 52 & [4503599627370496, 9007199254740992) & N/A & 536870912 & 1 \\ \end{array}$

A quick note on the maximum number you can store in floating point numbers, by looking at the half float specifically:

A half float has a maximum exponent of 15, which you can see above puts the number range between 32768 and 65536. The precision is 32 which is the smallest step that can be made in a half float at that scale. That range includes the smaller number but not the larger number. That means that the largest number a half float can store is one step away (32) from the high side of that range. So, the largest number that can be stored is 65536 – 32 = 65504.

How Many Digits Can I Rely On?

Another helpful way of looking at floating point precision is how many digits of precision you can rely on.

A float has 23 bits of mantissa, and 2^23 is 8,388,608. 23 bits let you store all 6 digit numbers or lower, and most of the 7 digit numbers. This means that floating point numbers have between 6 and 7 digits of precision, regardless of exponent.

That means that from 0 to 1, you have quite a few decimal places to work with. If you go into the hundreds or thousands, you’ve lost a few. When you get up into the tens of millions, you’ve run out of digits for anything beyond the decimal place.

You can actually see that this is true in the table in the last section. With floating point numbers, it’s at exponent 23 (8,388,608 to 16,777,216) that the precision is at 1. The smallest value that you can add to a floating point value in that range is in fact 1. It’s at this point that you have lost all precision to the right of the decimal place. Interestingly, you still have perfect precision of the integers though.

Half floats have 10 mantissa bits and 2^10 = 1024, so they just barely have 3 digits of precision.

Doubles have 52 mantissa bits and 2^52 = 4,503,599,627,370,496. That means doubles have between 15 and 16 digits of precision.

This can help you understand how precision will break down for you when using a specific data type for a specific magnitude of numbers.

When will I hit precision issues?

Besides the loose rules above about how many digits of precision you can count on, you can also solve to see when precision will break down for you.

Let’s say that you are tracking how long your game has been running (in seconds), and you do so by adding your frame delta (in seconds) to a variable every frame.

If you have a 30fps game, your frame delta is going to be 0.0333.

Adding that each frame to a float will eventually cause the float to reach a value where that number is smaller than the smallest difference representable (smaller than the precision), at which point things will start breaking. At first your accuracy will drop and your time will be wrong, but eventually adding your frame delta to the current time won’t even change the value of the current time. Time will effectively stop!

When will this happen though?

We’ll start with the formula we saw earlier and do one step of simple algebra to get us an equation which can give us this answer.

$\frac{range}{mantissa} = precision \\ \\ range = mantissa * precision$

How we use this formula is we put the precision we want into “precision” and we put the size of the mantissa ( $2^{MantissaBits}$ ) into “mantissa”. The result tells us the range that we’ll get the precision at.

Let’s plug in our numbers:

$range = 8388608 * 0.0333 = 279340.6464$

This tells us the range of the floating point numbers where we’ll have our problems, but this isn’t the value that we’ll have our problems at, so we have another step to do. We need to find what exponent has this range.

Looking at the table earlier in the post you might notice that the range at an exponent also happens to be just $2^{exponent}$ .

That’s helpful because that just means we take log2 of the answer we got:

$log2(279340.6464) = 18.0916659875$

Looking at the table again, we can see that floating point numbers have a precision of 0.03125 at exponent value 18. So, exponent 18 is close, but it’s precision is smaller than what we want – aka the precision is still ok.

That means we need to ceil() the number we got from the log2.

Doing that, we see that things break down at exponent 19, which has precision of 0.0625. This actual value it has this problem at is 528,288 (which is $2^{19}$ ).

So, our final formula for “where does precision become this value?” becomes:

$value = pow(2, ceil(log2(mantissa * precision)))$

Note that at exponent 18, there is still imprecision happening. When adding 1/30 to 264144, It goes from 264144 to 264144.031 to 264144.063, instead of 264144, 264144.033, 264144.066. There is error, but it’s fairly small.

At exponent 19 though, things fall apart a lot more noticeably. When adding 1/30 to 528288, it goes from 528288 to 528288.063 to 528288.125. Time is actually moving almost twice as fast in this case!

At exponent 20, we start at 1056576.00 and adding 1/30 doesn’t even change the value. Time is now stopped.

It does take 6.1 days (528,288 seconds) to reach exponent 19 though, so that’s quite a long time.

If we use half floats, it falls apart at value 64. That’s right, it only takes 64 seconds for this to fall apart when using 16 bit half floats, compared to 6.1 days when using 32 bit floats!

With doubles, it falls apart at value 281,474,976,710,656. That is 8,925,512 years!

Let’s check out that equation again:

$value = pow(2, ceil(log2(mantissa * precision)))$

A possibly more programmer friendly way to do the above would be to calculate mantissa * precision and then round up to the next power of 2. That’s exactly what the formula is doing above, but in math terms, not programming terms.

Storing Integers

I recently learned that floating point numbers can store integers surprisingly well. It blows my mind that I never knew this. Maybe you are in the same boat 😛

Here’s the setup:

For any exponent, the range of numbers it represents is a power of 2.
The mantissa will always divide that range into a power of 2 different values.

It might take some time and/or brain power to soak that up (it did for me!) but what that ends up ultimately meaning is that floating point numbers can exactly represent a large number of integers.

In fact, a floating point number can EXACTLY store all integers from $-2^{MantissaBits+1}$ to $+2^{MantissaBits+1}$ .

For half floats that means you can store all integers between (and including) -2048 to +2048. ( $\pm 2^{11}$ )

For floats, it’s -16,777,216 to +16,777,216. ( $\pm 2^{24}$ )

For doubles it’s -9,007,199,254,740,992 to +9,007,199,254,740,992. ( $\pm 2^{53}$ )

Doubles can in fact exactly represent any 32 bit unsigned integer, since 2^32 = 4,294,967,296.

Links

Here are some links you might find interesting!

Floating point visually explained:
http://fabiensanglard.net/floating_point_visually_explained/

What Every Computer Scientist Should Know About Floating-Point Arithmetic:
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

A matter of precision:
http://tomforsyth1000.github.io/blog.wiki.html#[[A%20matter%20of%20precision]]

Denormal numbers – aka very small numbers that make computations slow when you use them:
https://en.m.wikipedia.org/wiki/Denormal_number

Catastrophic Cancellation – a problem you can run into when doing floating point math:
https://en.wikipedia.org/wiki/Loss_of_significance

A handy web page that lets you play with the binary representation of a float and what number it comes out as:
https://www.h-schmidt.net/FloatConverter/IEEE754.html

Half precision floating point format:
https://en.wikipedia.org/wiki/Half-precision_floating-point_format

What is the first integer that a float is incapable of representing?
https://stackoverflow.com/questions/3793838/which-is-the-first-integer-that-an-ieee-754-float-is-incapable-of-representing-e

Ready to go deeper? Bruce Dawson has some amazing write ups on deeper floating point issues:
https://randomascii.wordpress.com/category/floating-point/

This talks about how to use floating point precision limits as an activation function in a neural network (?!)
https://blog.openai.com/nonlinear-computation-in-linear-networks/

Animating Noise For Integration Over Time 2: Uniform Over Time

After I put out the last post, Mikkel Gjoel (@pixelmager), made an interesting observation that you can see summarized in his image below. (tweet / thread here)

BTW Mikkel has an amazing presentation about rendering the beautiful game “Inside” that you should check out. Lots of interesting techniques used, including some enlightening uses of noise.
YouTube –
Low Complexity, High Fidelity: The Rendering of INSIDE

The images left to right are:

One frame of white noise
N frames of white noise averaged.
N frames averaged where the first frame is white noise, and a per frame random number is added to all pixels every frame.
N frames averaged where the first frame is white noise, and 1/N is added to all pixels every frame.
N frames averaged where the first frame is white noise, and the golden ratio is added to all pixels every frame.

In the above, the smoother and closer to middle grey that an image is, the better it is – that means it converged to the true result of the integral better.

Surprisingly it looks like adding 1/N outperforms the golden ratio, which means that regular spaced samples are outperforming a low discrepancy sequence!

To compare apples to apples, we’ll do the “golden ratio” tests we did last post, but instead do them with adding this uniform value instead.

To be explicit, there are 8 frames and they are:

Frame 0: The noise
Frame 1: The noise + 1/8
Frame 2: The noise + 2/8
…
Frame 7: the noise + 7/8

Modulus is used to keep the values between 0 and 1.

Below is how white noise looks animated with golden ratio (top) vs uniform values (bottom). There are 8 frames and it’s played at 8fps so it loops every second.

Interleaved Gradient Noise. Top is golden ratio, bottom is uniform.

Blue Noise. Top is golden ratio, bottom is uniform.

The uniform ones look pretty similar. Maybe a little smoother, but it’s hard to tell by just looking at it. Interestingly, the frequency content of the blue noise seems more stable using these uniform values instead of golden ratio.

The histogram data of the noise was the same for all frames of animation, just like in last post, which is a good thing. The important bit is that adding a uniform value doesn’t modify the histogram shape, other than changing which counts go to which specific buckets. Ideally the histogram would start out perfectly even like the blue noise does, but since this post is about the “adding uniform values” process, and not about the starting noise, this shows that the process does the right thing with the histogram.

White Noise – min 213, max 306, average 256, std dev 16.51
Interleaved Gradient Noise – min 245, max 266, average 256, std dev 2.87
Blue Noise – min, max, average are 256, std dev 0.

Let’s look at the integrated animations.

White noise. Top is golden ratio, bottom is uniform.

Interleaved gradient noise. Top is golden ratio, bottom is uniform.

Blue noise. Top is golden ratio, bottom is uniform.

The differences between these animations are subtle unless you know what you are looking for specifically so let’s check out the final frames and the error graphs.

Each noise comparison below has three images. The first image is the “naive” way to animate the noise. The second uses golden ratio instead. The third one uses 1/N. The first two images (and techniques) are from (and explained in) the last post, and the third image is the technique from this post.

White noise. Naive (top), golden ratio (mid), uniform (bottom).

Interleaved gradient noise. Naive (top), golden ratio (mid), uniform (bottom).

Blue noise. Naive (top), golden ratio (mid), uniform (bottom).

So, what’s interesting is that the uniform sampling over time has lower error and standard deviation (variance) than golden ratio, which has less than the naive method. However, it’s only at the end that the uniform sampling over time has the best results, and it’s actually quite terrible until then.

The reason for this is that uniform has good coverage over the sample space, but it takes until the last frame to get that good coverage because each frame takes a small step over the remaining sample space.

What might work out better would be if our first frame was the normal noise, but then the second frame was the normal noise plus a half, so we get the most information we possibly can from that sample by splitting the sample space in half. We would then want to cut the two halves of the space space in half, and so the next two frames would be the noise plus 1/4 and the noise plus 3/4. We would then continue with 1/8, 5/8, 3/8 and 7/8 (note we didn’t do these 1/8 steps in order. We did it in the order that gives us the most information the most quickly!). At the end of all this, we would have our 8 uniformly spaced samples over time, but we would have taken the samples in an order that makes our intermediate frames look better hopefully.

Now, interestingly, that number sequence I just described has a name. It’s the base 2 Van Der Corput sequence, which is a type of low discrepancy sequence. It’s also the 1D version of the Halton sequence, and is related to other sequences as well. More info here: When Random Numbers Are Too Random: Low Discrepancy Sequences

Mikkel mentioned he thought this would be helpful, and I was thinking the same thing too. Let’s see how it does!

White noise. Uniform (top), Van Der Corput (bottom).

Interleaved gradient noise. Uniform (top), Van Der Corput (bottom).

Blue noise. Uniform (top), Van Der Corput (bottom).

The final frames look the same as before (and the same as each other), so I won’t show those again but here are the updated graphs.

Interestingly, using the Van Der Corput sequence has put intermediate frames more in line with golden ratio, while of course still being superior at the final frame.

I’ve been trying to understand why uniform sampling over time out performs the golden ratio which acts more like blue noise over time. I still don’t grasp why it works as well as it does, but the proof is in the pudding.

Theoretically, this uniform sampling over time should lead to the possibility of aliasing on the time axis, since blue noise / white noise (and other randomness) get rid of the aliasing in exchange for noise.

Noise over the time dimension would mean missing details that were smaller than the sample spacing size. in our case, we are using the time sampled values (noise + uniform value) to threshold a source image to make a sample. It may be that since we are thresholding, that aliasing isn’t possible since our sample represents everything below or equal to the value?

I’m not really sure, but will be thinking about it for a while. If you have any insights please let me know!

It would be interesting to try an actual 1d blue noise sequence and see how it compares. If it does better, it sounds like it would be worth while to try jittering the uniform sampled values on the time axis to try and approximate blue noise a bit. Mikkel tried the jittering and said it gave significantly worse results, so that seems like a no go.

Lastly, some other logical experiments from here seem to be…

See how other forms of noise and ordered dithers do, including perhaps a Bayer Matrix. IG noise seems to naturally do better on the time axis for some reason I don’t fully understand yet. There may be some interesting properties of other noise waiting to be found.
Do we get any benefits in this context by arranging the interleaved gradient noise in a spiral like Jorge mentions in his presentation? (Next Generation Post Processing In Call Of Duty: Advanced Warfare
It would be interesting to see how this works in a more open ended case – such as if you had temporal AA which was averaging a variable number of pixels each frame. Would doing a van Der Corput sequence give good results there? Would you keep track of sample counts per pixel and keep marching the Van Der Corput forward for each pixel individually? Or would you just pick something like an 8 Van Der Corput sequence, adding the current sequence to all pixels and looping that sequence every 8 frames? It really would be interesting to see what is best in that sort of a setup.

I’m sure there are all sorts of other things to try to. This is a deep, interesting and important topic for graphics and beyond (:

Code

Source code below, but it’s also available on github, along with the source images used: Github:
Atrix256/RandomCode/AnimatedNoise

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>  // for bitmap headers.  Sorry non windows people!
#include <stdint.h>
#include <vector>
#include <random>
#include <atomic>
#include <thread>
#include <complex>
#include <array>

typedef uint8_t uint8;

const float c_pi = 3.14159265359f;

// settings
const bool c_doDFT = true;

// globals 
FILE* g_logFile = nullptr;

//======================================================================================
inline float Lerp (float A, float B, float t)
{
    return A * (1.0f - t) + B * t;
}

//======================================================================================
struct SImageData
{
    SImageData ()
        : m_width(0)
        , m_height(0)
    { }
   
    size_t m_width;
    size_t m_height;
    size_t m_pitch;
    std::vector<uint8> m_pixels;
};
 
//======================================================================================
struct SColor
{
    SColor (uint8 _R = 0, uint8 _G = 0, uint8 _B = 0)
        : R(_R), G(_G), B(_B)
    { }

    inline void Set (uint8 _R, uint8 _G, uint8 _B)
    {
        R = _R;
        G = _G;
        B = _B;
    }
 
    uint8 B, G, R;
};

//======================================================================================
struct SImageDataComplex
{
    SImageDataComplex ()
        : m_width(0)
        , m_height(0)
    { }
  
    size_t m_width;
    size_t m_height;
    std::vector<std::complex<float>> m_pixels;
};
 
//======================================================================================
std::complex<float> DFTPixel (const SImageData &srcImage, size_t K, size_t L)
{
    std::complex<float> ret(0.0f, 0.0f);
  
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Get the pixel value (assuming greyscale) and convert it to [0,1] space
            const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
            float grey = float(src[0]) / 255.0f;
  
            // Add to the sum of the return value
            float v = float(K * x) / float(srcImage.m_width);
            v += float(L * y) / float(srcImage.m_height);
            ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
        }
    }
  
    return ret;
}
  
//======================================================================================
void ImageDFT (const SImageData &srcImage, SImageDataComplex &destImage)
{
    // NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
    // ImageToGrey() will convert an image to greyscale.
 
    // size the output dft data
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pixels.resize(destImage.m_width*destImage.m_height);
 
    size_t numThreads = std::thread::hardware_concurrency();
    //if (numThreads > 0)
        //numThreads = numThreads - 1;
 
    std::vector<std::thread> threads;
    threads.resize(numThreads);
 
    printf("Doing DFT with %zu threads...\n", numThreads);
 
    // calculate 2d dft (brute force, not using fast fourier transform) multithreadedly
    std::atomic<size_t> nextRow(0);
    for (std::thread& t : threads)
    {
        t = std::thread(
            [&] ()
            {
                size_t row = nextRow.fetch_add(1);
                bool reportProgress = (row == 0);
                int lastPercent = -1;
 
                while (row < srcImage.m_height)
                {
                    // calculate the DFT for every pixel / frequency in this row
                    for (size_t x = 0; x < srcImage.m_width; ++x)
                    {
                        destImage.m_pixels[row * destImage.m_width + x] = DFTPixel(srcImage, x, row);
                    }
 
                    // report progress if we should
                    if (reportProgress)
                    {
                        int percent = int(100.0f * float(row) / float(srcImage.m_height));
                        if (lastPercent != percent)
                        {
                            lastPercent = percent;
                            printf("            \rDFT: %i%%", lastPercent);
                        }
                    }
 
                    // go to the next row
                    row = nextRow.fetch_add(1);
                }
            }
        );
    }
 
    for (std::thread& t : threads)
        t.join();
 
    printf("\n");
}
 
//======================================================================================
void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
    // size the output image
    destImage.m_width = srcImage.m_width;
    destImage.m_height = srcImage.m_height;
    destImage.m_pitch = 4 * ((srcImage.m_width * 24 + 31) / 32);
    destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);
  
    // get floating point magnitude data
    std::vector<float> magArray;
    magArray.resize(srcImage.m_width*srcImage.m_height);
    float maxmag = 0.0f;
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            // Offset the information by half width & height in the positive direction.
            // This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
            int k = (x + (int)srcImage.m_width / 2) % (int)srcImage.m_width;
            int l = (y + (int)srcImage.m_height / 2) % (int)srcImage.m_height;
            const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];
  
            float mag = std::abs(src);
            if (mag > maxmag)
                maxmag = mag;
  
            magArray[y*srcImage.m_width + x] = mag;
        }
    }
    if (maxmag == 0.0f)
        maxmag = 1.0f;
  
    const float c = 255.0f / log(1.0f+maxmag);
  
    // normalize the magnitude data and send it back in [0, 255]
    for (size_t x = 0; x < srcImage.m_width; ++x)
    {
        for (size_t y = 0; y < srcImage.m_height; ++y)
        {
            float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);
  
            uint8 magu8 = uint8(src);
  
            uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
            dest[0] = magu8;
            dest[1] = magu8;
            dest[2] = magu8;
        }
    }
}

//======================================================================================
bool ImageSave (const SImageData &image, const char *fileName)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "wb");
    if (!file) {
        printf("Could not save %s\n", fileName);
        return false;
    }
   
    // make the header info
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
   
    header.bfType = 0x4D42;
    header.bfReserved1 = 0;
    header.bfReserved2 = 0;
    header.bfOffBits = 54;
   
    infoHeader.biSize = 40;
    infoHeader.biWidth = (LONG)image.m_width;
    infoHeader.biHeight = (LONG)image.m_height;
    infoHeader.biPlanes = 1;
    infoHeader.biBitCount = 24;
    infoHeader.biCompression = 0;
    infoHeader.biSizeImage = (DWORD) image.m_pixels.size();
    infoHeader.biXPelsPerMeter = 0;
    infoHeader.biYPelsPerMeter = 0;
    infoHeader.biClrUsed = 0;
    infoHeader.biClrImportant = 0;
   
    header.bfSize = infoHeader.biSizeImage + header.bfOffBits;
   
    // write the data and close the file
    fwrite(&header, sizeof(header), 1, file);
    fwrite(&infoHeader, sizeof(infoHeader), 1, file);
    fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);
    fclose(file);
  
    return true;
}

//======================================================================================
bool ImageLoad (const char *fileName, SImageData& imageData)
{
    // open the file if we can
    FILE *file;
    file = fopen(fileName, "rb");
    if (!file)
        return false;
 
    // read the headers if we can
    BITMAPFILEHEADER header;
    BITMAPINFOHEADER infoHeader;
    if (fread(&header, sizeof(header), 1, file) != 1 ||
        fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||
        header.bfType != 0x4D42 || infoHeader.biBitCount != 24)
    {
        fclose(file);
        return false;
    }
 
    // read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
    imageData.m_pixels.resize(infoHeader.biSizeImage);
    fseek(file, header.bfOffBits, SEEK_SET);
    if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
    {
        fclose(file);
        return false;
    }
 
    imageData.m_width = infoHeader.biWidth;
    imageData.m_height = infoHeader.biHeight;
    imageData.m_pitch = 4 * ((imageData.m_width * 24 + 31) / 32);
 
    fclose(file);
    return true;
}

//======================================================================================
void ImageInit (SImageData& image, size_t width, size_t height)
{
    image.m_width = width;
    image.m_height = height;
    image.m_pitch = 4 * ((width * 24 + 31) / 32);
    image.m_pixels.resize(image.m_pitch * image.m_height);
    std::fill(image.m_pixels.begin(), image.m_pixels.end(), 0);
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
template <typename LAMBDA>
void ImageForEachPixel (const SImageData& image, const LAMBDA& lambda)
{
    size_t pixelIndex = 0;
    for (size_t y = 0; y < image.m_height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < image.m_width; ++x)
        {
            lambda(*pixel, pixelIndex);
            ++pixel;
            ++pixelIndex;
        }
    }
}

//======================================================================================
void ImageConvertToLuma (SImageData& image)
{
    ImageForEachPixel(
        image,
        [] (SColor& pixel, size_t pixelIndex)
        {
            float luma = float(pixel.R) * 0.3f + float(pixel.G) * 0.59f + float(pixel.B) * 0.11f;
            uint8 lumau8 = uint8(luma + 0.5f);
            pixel.R = lumau8;
            pixel.G = lumau8;
            pixel.B = lumau8;
        }
    );
}

//======================================================================================
void ImageCombine2 (const SImageData& imageA, const SImageData& imageB, SImageData& result)
{
    // put the images side by side. A on left, B on right
    ImageInit(result, imageA.m_width + imageB.m_width, max(imageA.m_height, imageB.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B on right
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
void ImageCombine3 (const SImageData& imageA, const SImageData& imageB, const SImageData& imageC, SImageData& result)
{
    // put the images side by side. A on left, B in middle, C on right
    ImageInit(result, imageA.m_width + imageB.m_width + imageC.m_width, max(max(imageA.m_height, imageB.m_height), imageC.m_height));
    std::fill(result.m_pixels.begin(), result.m_pixels.end(), 0);

    // image A on left
    for (size_t y = 0; y < imageA.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch];
        SColor* srcPixel = (SColor*)&imageA.m_pixels[y * imageA.m_pitch];
        for (size_t x = 0; x < imageA.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image B in middle
    for (size_t y = 0; y < imageB.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3];
        SColor* srcPixel = (SColor*)&imageB.m_pixels[y * imageB.m_pitch];
        for (size_t x = 0; x < imageB.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }

    // image C on right
    for (size_t y = 0; y < imageC.m_height; ++y)
    {
        SColor* destPixel = (SColor*)&result.m_pixels[y * result.m_pitch + imageA.m_width * 3 + imageC.m_width * 3];
        SColor* srcPixel = (SColor*)&imageC.m_pixels[y * imageC.m_pitch];
        for (size_t x = 0; x < imageC.m_width; ++x)
        {
            destPixel[0] = srcPixel[0];
            ++destPixel;
            ++srcPixel;
        }
    }
}

//======================================================================================
float GoldenRatioMultiple (size_t multiple)
{
    return float(multiple) * (1.0f + std::sqrtf(5.0f)) / 2.0f;
}

//======================================================================================
void IntegrationTest (const SImageData& dither, const SImageData& groundTruth, size_t frameIndex, const char* label)
{
    // calculate min, max, total and average error
    size_t minError = 0;
    size_t maxError = 0;
    size_t totalError = 0;
    size_t pixelCount = 0;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            totalError += error;

            if ((x == 0 && y == 0) || error < minError)
                minError = error;

            if ((x == 0 && y == 0) || error > maxError)
                maxError = error;

            ++ditherPixel;
            ++truthPixel;
            ++pixelCount;
        }
    }
    float averageError = float(totalError) / float(pixelCount);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t y = 0; y < dither.m_height; ++y)
    {
        SColor* ditherPixel = (SColor*)&dither.m_pixels[y * dither.m_pitch];
        SColor* truthPixel = (SColor*)&groundTruth.m_pixels[y * groundTruth.m_pitch];
        for (size_t x = 0; x < dither.m_width; ++x)
        {
            size_t error = 0;
            if (ditherPixel->R > truthPixel->R)
                error = ditherPixel->R - truthPixel->R;
            else
                error = truthPixel->R - ditherPixel->R;

            float diff = float(error) - averageError;

            sumSquaredDiff += diff*diff;
        }
    }
    float stdDev = std::sqrtf(sumSquaredDiff / float(pixelCount - 1));

    // report results
    fprintf(g_logFile, "%s %zu error\n", label, frameIndex);
    fprintf(g_logFile, "  min error: %zu\n", minError);
    fprintf(g_logFile, "  max error: %zu\n", maxError);
    fprintf(g_logFile, "  avg error: %0.2f\n", averageError);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "\n");
}

//======================================================================================
void HistogramTest (const SImageData& noise, size_t frameIndex, const char* label)
{
    std::array<size_t, 256> counts;
    std::fill(counts.begin(), counts.end(), 0);

    ImageForEachPixel(
        noise,
        [&] (const SColor& pixel, size_t pixelIndex)
        {
            counts[pixel.R]++;
        }
    );

    // calculate min, max, total and average
    size_t minCount = 0;
    size_t maxCount = 0;
    size_t totalCount = 0;
    for (size_t i = 0; i < 256; ++i)
    {
        if (i == 0 || counts[i] < minCount)
            minCount = counts[i];

        if (i == 0 || counts[i] > maxCount)
            maxCount = counts[i];

        totalCount += counts[i];
    }
    float averageCount = float(totalCount) / float(256.0f);

    // calculate standard deviation
    float sumSquaredDiff = 0.0f;
    for (size_t i = 0; i < 256; ++i)
    {
        float diff = float(counts[i]) - averageCount;
        sumSquaredDiff += diff*diff;
    }
    float stdDev = std::sqrtf(sumSquaredDiff / 255.0f);

    // report results
    fprintf(g_logFile, "%s %zu histogram\n", label, frameIndex);
    fprintf(g_logFile, "  min count: %zu\n", minCount);
    fprintf(g_logFile, "  max count: %zu\n", maxCount);
    fprintf(g_logFile, "  avg count: %0.2f\n", averageCount);
    fprintf(g_logFile, "  stddev: %0.2f\n", stdDev);
    fprintf(g_logFile, "  counts: ");
    for (size_t i = 0; i < 256; ++i)
    {
        if (i > 0)
            fprintf(g_logFile, ", ");
        fprintf(g_logFile, "%zu", counts[i]);
    }

    fprintf(g_logFile, "\n\n");
}

//======================================================================================
void GenerateWhiteNoise (SImageData& image, size_t width, size_t height)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    ImageForEachPixel(
        image,
        [&] (SColor& pixel, size_t pixelIndex)
        {
            uint8 value = dist(rng);
            pixel.R = value;
            pixel.G = value;
            pixel.B = value;
        }
    );
}

//======================================================================================
void GenerateInterleavedGradientNoise (SImageData& image, size_t width, size_t height, float offsetX, float offsetY)
{
    ImageInit(image, width, height);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_int_distribution<unsigned int> dist(0, 255);

    for (size_t y = 0; y < height; ++y)
    {
        SColor* pixel = (SColor*)&image.m_pixels[y * image.m_pitch];
        for (size_t x = 0; x < width; ++x)
        {
            float valueFloat = std::fmodf(52.9829189f * std::fmod(0.06711056f*float(x + offsetX) + 0.00583715f*float(y + offsetY), 1.0f), 1.0f);
            size_t valueBig = size_t(valueFloat * 256.0f);
            uint8 value = uint8(valueBig % 256);
            pixel->R = value;
            pixel->G = value;
            pixel->B = value;
            ++pixel;
        }
    }
}

//======================================================================================
template <size_t NUM_SAMPLES>
void GenerateVanDerCoruptSequence (std::array<float, NUM_SAMPLES>& samples, size_t base)
{
    for (size_t i = 0; i < NUM_SAMPLES; ++i)
    {
        samples[i] = 0.0f;
        float denominator = float(base);
        size_t n = i;
        while (n > 0)
        {
            size_t multiplier = n % base;
            samples[i] += float(multiplier) / denominator;
            n = n / base;
            denominator *= base;
        }
    }
}

//======================================================================================
void DitherWithTexture (const SImageData& ditherImage, const SImageData& noiseImage, SImageData& result)
{
    // init the result image
    ImageInit(result, ditherImage.m_width, ditherImage.m_height);

    // make the result image
    for (size_t y = 0; y < ditherImage.m_height; ++y)
    {
        SColor* srcDitherPixel = (SColor*)&ditherImage.m_pixels[y * ditherImage.m_pitch];
        SColor* destDitherPixel = (SColor*)&result.m_pixels[y * result.m_pitch];

        for (size_t x = 0; x < ditherImage.m_width; ++x)
        {
            // tile the noise in case it isn't the same size as the image we are dithering
            size_t noiseX = x % noiseImage.m_width;
            size_t noiseY = y % noiseImage.m_height;
            SColor* noisePixel = (SColor*)&noiseImage.m_pixels[noiseY * noiseImage.m_pitch + noiseX * 3];

            uint8 value = 0;
            if (noisePixel->R < srcDitherPixel->R)
                value = 255;

            destDitherPixel->R = value;
            destDitherPixel->G = value;
            destDitherPixel->B = value;

            ++srcDitherPixel;
            ++destDitherPixel;
        }
    }
}

//======================================================================================
void DitherWhiteNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_whitenoise.bmp");
}

//======================================================================================
void DitherInterleavedGradientNoise (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noise;
    GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, noise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, noise, dither, combined);
    ImageSave(combined, "out/still_ignoise.bmp");
}

//======================================================================================
void DitherBlueNoise (const SImageData& ditherImage, const SImageData& blueNoise)
{
    printf("\n%s\n", __FUNCTION__);

    // dither the image
    SImageData dither;
    DitherWithTexture(ditherImage, blueNoise, dither);

    // save the results
    SImageData combined;
    ImageCombine3(ditherImage, blueNoise, dither, combined);
    ImageSave(combined, "out/still_bluenoise.bmp");
}

//======================================================================================
void DitherWhiteNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/anim_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_whitenoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateWhiteNoise(noise, ditherImage.m_width, ditherImage.m_height);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<float> dist(0.0f, 1000.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_ignoise%zu.bmp", i);

        // make noise
        SImageData noise;
        GenerateInterleavedGradientNoise(noise, ditherImage.m_width, ditherImage.m_height, dist(rng), dist(rng));

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&](SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i + 1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedIntegrated (const SImageData& ditherImage, const SImageData blueNoise[8])
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animint_bluenoise%zu.bmp", i);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, blueNoise[i], dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(blueNoise[i], dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatio (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatio (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgr_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniform (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniform (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    SImageDataComplex noiseDFT;
    SImageData noiseDFTMag;

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuni_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // DFT the noise
        if (c_doDFT)
        {
            ImageDFT(noise, noiseDFT);
            GetMagnitudeData(noiseDFT, noiseDFTMag);
        }
        else
        {
            ImageInit(noiseDFTMag, noise.m_width, noise.m_height);
            std::fill(noiseDFTMag.m_pixels.begin(), noiseDFTMag.m_pixels.end(), 0);
        }

        // Histogram test the noise
        HistogramTest(noise, i, __FUNCTION__);

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // save the results
        SImageData combined;
        ImageCombine3(noiseDFTMag, noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_whitenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_ignoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedGoldenRatioIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animgrint_bluenoise%zu.bmp", i);

        // add golden ratio to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = GoldenRatioMultiple(i);
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedUniformIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedUniformIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animuniint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = float(i) / 8.0f;
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherWhiteNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateWhiteNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_whitenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherInterleavedGradientNoiseAnimatedVDCIntegrated (const SImageData& ditherImage)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    // make noise
    SImageData noiseSrc;
    GenerateInterleavedGradientNoise(noiseSrc, ditherImage.m_width, ditherImage.m_height, 0.0f, 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_ignoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
void DitherBlueNoiseAnimatedVDCIntegrated (const SImageData& ditherImage, const SImageData& noiseSrc)
{
    printf("\n%s\n", __FUNCTION__);

    std::vector<float> integration;
    integration.resize(ditherImage.m_width * ditherImage.m_height);
    std::fill(integration.begin(), integration.end(), 0.0f);

    SImageData noise;
    ImageInit(noise, noiseSrc.m_width, noiseSrc.m_height);

    // Make Van Der Corput sequence
    std::array<float, 8> VDC;
    GenerateVanDerCoruptSequence(VDC, 2);

    // animate 8 frames
    for (size_t i = 0; i < 8; ++i)
    {
        char fileName[256];
        sprintf(fileName, "out/animvdcint_bluenoise%zu.bmp", i);

        // add uniform value to the noise after each frame
        noise.m_pixels = noiseSrc.m_pixels;
        float add = VDC[i];
        ImageForEachPixel(
            noise,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float valueFloat = (float(pixel.R) / 255.0f) + add;
                size_t valueBig = size_t(valueFloat * 255.0f);
                uint8 value = uint8(valueBig % 256);
                pixel.R = value;
                pixel.G = value;
                pixel.B = value;
            }
        );

        // dither the image
        SImageData dither;
        DitherWithTexture(ditherImage, noise, dither);

        // integrate and put the current integration results into the dither image
        ImageForEachPixel(
            dither,
            [&] (SColor& pixel, size_t pixelIndex)
            {
                float pixelValueFloat = float(pixel.R) / 255.0f;
                integration[pixelIndex] = Lerp(integration[pixelIndex], pixelValueFloat, 1.0f / float(i+1));

                uint8 integratedPixelValue = uint8(integration[pixelIndex] * 255.0f);
                pixel.R = integratedPixelValue;
                pixel.G = integratedPixelValue;
                pixel.B = integratedPixelValue;
            }
        );

        // do an integration test
        IntegrationTest(dither, ditherImage, i, __FUNCTION__);

        // save the results
        SImageData combined;
        ImageCombine2(noise, dither, combined);
        ImageSave(combined, fileName);
    }
}

//======================================================================================
int main (int argc, char** argv)
{
    // load the dither image and convert it to greyscale (luma)
    SImageData ditherImage;
    if (!ImageLoad("src/ditherimage.bmp", ditherImage))
    {
        printf("Could not load src/ditherimage.bmp");
        return 0;
    }
    ImageConvertToLuma(ditherImage);

    // load the blue noise images.
    SImageData blueNoise[8];
    for (size_t i = 0; i < 8; ++i)
    {
        char buffer[256];
        sprintf(buffer, "src/BN%zu.bmp", i);
        if (!ImageLoad(buffer, blueNoise[i]))
        {
            printf("Could not load %s", buffer);
            return 0;
        }

        // They have different values in R, G, B so make R be the value for all channels
        ImageForEachPixel(
            blueNoise[i],
            [] (SColor& pixel, size_t pixelIndex)
            {
                pixel.G = pixel.R;
                pixel.B = pixel.R;
            }
        );
    }

    g_logFile = fopen("log.txt", "w+t");
    
    // still image dither tests
    DitherWhiteNoise(ditherImage);
    DitherInterleavedGradientNoise(ditherImage);
    DitherBlueNoise(ditherImage, blueNoise[0]);

    // Animated dither tests
    DitherWhiteNoiseAnimated(ditherImage);
    DitherInterleavedGradientNoiseAnimated(ditherImage);
    DitherBlueNoiseAnimated(ditherImage, blueNoise);

    // Golden ratio animated dither tests
    DitherWhiteNoiseAnimatedGoldenRatio(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatio(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatio(ditherImage, blueNoise[0]);

    // Uniform animated dither tests
    DitherWhiteNoiseAnimatedUniform(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniform(ditherImage);
    DitherBlueNoiseAnimatedUniform(ditherImage, blueNoise[0]);

    // Animated dither integration tests
    DitherWhiteNoiseAnimatedIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedIntegrated(ditherImage);
    DitherBlueNoiseAnimatedIntegrated(ditherImage, blueNoise);

    // Golden ratio animated dither integration tests
    DitherWhiteNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedGoldenRatioIntegrated(ditherImage);
    DitherBlueNoiseAnimatedGoldenRatioIntegrated(ditherImage, blueNoise[0]);

    // Uniform animated dither integration tests
    DitherWhiteNoiseAnimatedUniformIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedUniformIntegrated(ditherImage);
    DitherBlueNoiseAnimatedUniformIntegrated(ditherImage, blueNoise[0]);

    // Van der corput animated dither integration tests
    DitherWhiteNoiseAnimatedVDCIntegrated(ditherImage);
    DitherInterleavedGradientNoiseAnimatedVDCIntegrated(ditherImage);
    DitherBlueNoiseAnimatedVDCIntegrated(ditherImage, blueNoise[0]);

    fclose(g_logFile);

    return 0;
}