Basic Methods For Finding Zeroes and Mins / Maxes of Functions

There is ~500 lines of simple standalone C++ code that goes with this post, that generated all data used and implements the things talked about. You can find it at: https://github.com/Atrix256/BasicRootFinding

Simple equations can be solved using algebra without too much fuss:

3x-6=0 \\ 3x=6 \\ x=2

You can even solve more complicated equations by cleverly using identities, the quadratic equation, and other mathematical tools you might have in your tool belt.

Some equations are beyond our ability to solve them using simple algebra though, like high degree polynomials:

3x^5+2x^2-20x+2=0

We still can find where functions like that equal zero, but we can’t do it analytically, we have to do it numerically.

That is, we do things like roll a ball down hill and hope we find a zero.

(Apparently there are some analytical solutions to be had from that function. I don’t know how to get them though, so would have to resort to numerical methods!)

This post talks about a few methods for finding zeroes of equations we can’t solve analytically, and talks about how to use those methods to find minimum and maximum values of functions as well.

The Basic Idea

Let’s say I tell you that the value of a function at a specific point is “3” and that for every step to the right you take, it subtracts 1 from that value.

Then I ask: “Where does this function equal zero?”

If you answered “three steps to the right” you are correct, and guess what – you’ve just used Newton’s method for root finding.

There is an asterisk here though. The slope (how the function value changes as you take a step to the right. Also called the derivative.) was constant in the above example, but is not going to be constant in the functions we want to solve. That slope is going to change depending where you are on the graph.

To deal with this, Newton’s method takes the step it thinks it should take, and then asks again for the value and slope (derivative) and takes another step.

The hope is that with a good initial guess, Newton’s method will find a zero without too much effort.

We’re going to get a little more formal, while also showing some results, but all the rest of what we are going to talk about is based on Newton’s method (or is even simpler).

Newton’s Method

We’ve seen this informally, but formally, a single step of Newton’s method looks like this:

x = x - \frac{f(x)}{f'(x)}

That is, we divide the y value at our current point by the derivative at our current point, and subtract that from our current x to get our new x.

Bullet Points:

  • Requires f(x) and f'(x)
  • Requires a “good guess” for an initial x value
  • Converges Quadratically

More info: https://en.wikipedia.org/wiki/Newton’s_method

Secant Method

What if you don’t have the analytical first derivative to your function and still want to use Newton’s Method?

Well, you can use finite differences to get the first derivative: move the x over a little bit, see how the y value changes over distance, and use that as your numerical derivative, instead of the analytical one. (A blog post of mine on finite differences: https://blog.demofox.org/2015/08/02/finite-differences/)

If you do that, you are now using the secant Method.

m = \frac{f(x+e)-f(x-e)}{2e}

x = x - \frac{f(x)}{m}

The above uses “central differences” to calculate the first derivative numerically. e is a tuneable parameter, but should be small (without triggering numerical issues) – like perhaps maybe 0.01.

Bullet Points:

  • Requires f(x)
  • Requires a “good guess” for an initial x value
  • Converges at a rate of 1.618… (the golden ratio! ??!) so is slower than Newton’s method, but can be handy if you don’t have the first derivative.

More info: https://en.wikipedia.org/wiki/Secant_method

Halley’s Method (& Householder’s method)

If the first derivative helped us find a zero, surely the second derivative can help us find it even faster, right?

Yep. If you generalize Newton’s method to using the second derivative, you get Halley’s method.

x = x - \frac{2f(x)f'(x)}{2f'(x)^2-f(x)f''(x)}

Bullet Points:

  • Requires f(x), f'(x) and f”(x). You can get the derivatives numerically but it will slow down convergence.
  • Requires a “good guess” for an initial x value
  • Converges cubically (fast!!)

Both Newton and Halley are part of a larger family of methods called “Householder’s Method” which generalizes Newton’s method to any order derivative.

More info:
https://en.wikipedia.org/wiki/Halley%27s_method
https://en.wikipedia.org/wiki/Householder%27s_method

Bisection

Bisection is simpler than the other methods. You start with a left x and a right x, with the requirement that the signs of the y values for these x’s have to be different (one positive, one negative) which means that a zero must be between the two x’s.

The algorithm itself is just a binary search, where you look at the value in the middle of the left and right x, and update the left or right x to be the middle, based on the sign of the y value at the middle.

Bullet Points:

  • Requires f(x)
  • Requires a min x and a max x that contains a zero between them, and the y values at those x’s have to have opposite signs
  • Converges Linearly (slow!)

More info:
https://en.wikipedia.org/wiki/Bisection_method

Experimental Results: y = x^2 – 1

Here’s the first equation: y=x^2-1

You can see visually that there’s a zero at x=-1 and at x=1.

Here’s how the various methods did at finding the roots. the x axis is how many steps were taken and the y axis is the absolute value of y at that step, which also happens to be the error too, since we are finding zeros.

Halley is the winner which is unsurprising with it’s cubic convergence rate, then newton with it’s quadratic convergence rate, then secant with it’s golden ratio convergence rate, and bisect comes in last after a strong start, with it’s linear convergence rate. Our experimental results match the theory, neat!

The values in the legend next to the result name specify what parameters were used. For newton and Halley, the value shown is the initial guess. For secant, x is the initial guess and px is the previous initial guess. For bisect it shows the range given to the bisection algorithm.

Choosing different parameters for those can drastically affect performance of the algorithms. Here are some less well chosen values shown along with the previous results. The previous results are the blue shades.

Bisection is now so bad that it dwarfs everything else. Let’s remove that.

Removing bisection, the first worst data point dropped to 100 from 2500 for all the “more poorly chosen parameters” which are orange, yellow and green. The previous values are the blues and the grey line, which basically look flat.

In these larger values with worse parameters, we are still seeing Halley winning, Newton coming in second, secant coming next, and then bisection, but we are seeing much worse performance. These parameters are important to getting good performance!

Experimental Results: y = sin(x)

Here’s the next equation: y=sin(x)

Here is the convergence for each technique:

You can’t see it, but the orange, grey and blue are all basically overlapping. Looking at the actual data below though, you can see that the winners are in the same order again.

Experimental Results: y = x^2-x-1

This is a fun equation, because a zero is at the golden ratio (and the other is at -1/goldenRatio): y=x^2-x-1

Here is the convergence graph:

An interesting thing happens here if we choose to start at the value “0.5” for Newton and Halley. The dark blue line for Newton is under the grey line for Halley which is flat and doesn’t go up and down. The reason this happens is that the first derivative is 0 at 0.5 so the algorithm doesn’t know which direction to go and gets stuck.

Using other initial guess values causes it not to be stuck, but you can see that newton and secant starting at 0.75 has a pretty big jump in error in the beginning.

Error Case: x^2+1

For the next equation we have: y=x^2+1

This graph has no zeroes. Our bisect bounds also do not have their needs met since one side is supposed to be negative and the other positive, but this graph is positive everywhere. That means we don’t have any results for bisect on this one.

Here’s the convergence graph:

Newton goes absolutely off the rails for a few samples. Lets remove it to see what the others are doing:

Secant goes real bad at the end, so let’s remove it to see that Halley is pretty well behaved despite there not actually being a root to find:

Ray vs Sphere

Let’s do something more interesting… let’s use these methods to find where a ray intersects a sphere. We can do this analytically (there is a formula to calculate this!) but we can use this as a simpler example of a more complex usage case.

We need a formula that is zero where a ray hits a sphere, and we need to be able to calculate it’s first and second derivative analytically ideally.

I fought with this for a bit but came up with a solution.

Here are the variables involved:

  • rayPos and rayDir, both are vec3’s
  • spherePos which is a vec3 and a sphereRadius which is a scalar
  • t – how far down the ray we are, and is a scalar

We are going to make the sphere center be the origin by subtracting it out of the ray position (then we don’t need spherePos anymore), so then all we have to find is where the magnitude of the vector from the origin to the ray position at time t is the sphere radius. To simplify calculating the derivatives, we are going to square both sides of the equation.

f(t) = MagnitudeSquared(rayPos + rayDir * t) - sphereRadius^2

If we can find where that equation is zero, that will tell us the time t that the ray hits the sphere, and we can get the actual intersected position (then normal, etc) from that.

For a 3d vector, Magnitude squared is just the below which is the distance equation without the square root.

MagnitudeSquared(P) = P.x^2+P.y^2+P.z^2

If you expand f(t) out for just the x axis to get the pattern per axis, you get:

rayPos.x^2+2*rayPos.x*rayDir.x*t+rayDir.x^2*t^2 - sphereRadius^2

You would add a similar thing for y and z – but not do the sphere radius part because it’s already handled above.

We can take the derivative of the above with respect to t, to get:

2*rayPos.x*rayDir.x+2*rayDir.x^2*t

So the first derivative is that equation, but also adding in the y and z axes:

f'(t) = 2*rayPos.x*rayDir.x+2*rayDir.x^2*t + 2*rayPos.y*rayDir.y+2*rayDir.y^2*t + 2*rayPos.z*rayDir.z+2*rayDir.z^2*t

That looks a bit like spaghetti but is way cleaner in code with a loop iteration for each axis ๐Ÿ˜›

We can then take the derivative of that first derivative to get the second derivative (still, with respect to t). That gives us this:

f''(t) = 2*rayDir.x^2 + 2*rayDir.y^2 + 2*rayDir.z^2

Ok cool, now that we have our equations, I used (0,0,0) for the ray position, (0,0,1) for the ray direction, (0.2, 0.3, 5.0) for the sphere position, and a sphere radius of 2.0.

Let’s see how our methods did at finding the intersection time of the ray vs the sphere.

Once again, Halley wins, newton comes in second, then secant, then bisect. Neat!

Finding Function Mins & Maxes

For being one of the headlines of the blog post title, this is actually a really simple thing to explain now that you know the rest.

The minimum and maximum of a function – by definition! – has a derivative (slope) of zero, because it’s right where the function goes flat. The function has either climbed to the top of a mountain, gone flat, and is heading back down, or it has gone to the bottom of a valley, gone flat, and is heading back up. There is actually also a third option called a saddlepoint where it is pausing it’s ascent or descent momentarily and then continues.

In any case, to find the minimum or maximum of a function, we need to find where it’s derivative is zero. We do this by doing root finding on it’s first derivative.

From there, we can plug that x value found into the original f(x) function to get our extrema value.

The code that goes with this blog post uses this technique to find the maximum value for the function y=-x^2+x+1.

It finds that the derivative (y=-2x+1) is zero at x=0.5. Plugging 0.5 into the original function for x gives us a value of 1.25. That does look like the correct x and y value for the maximum of the function, doesn’t it?

Links

The code that goes with this blog post can be found at: https://github.com/Atrix256/BasicRootFinding

A good 10 minute video on Newton’s Method

Here’s a way to get an analytic derivative of a function without having to do it symbolically. Dual numbers are a form of automatic differentiation, just like backpropagation is.
https://blog.demofox.org/2014/12/30/dual-numbers-automatic-differentiation/

You can extend this stuff to higher dimensions using the gradient instead of the derivative, and maybe a Jacobian matrix or Hesian matrix for higher derivatives. That might make a good future blog post.

We actually do something similar to Newton’s method when doing sphere tracing (ray marching with a distance estimate). We divide the distance value at a point in space (the y value, or f(x)) by the length of the gradient (which is basically the slope aka generalized derivative) to know that a surface can’t be any closer than that value, so we can step that length in whatever direction our ray wants to move. This also has applications to drawing 2d graphics. You can read about that here:
https://www.iquilezles.org/www/articles/distance/distance.htm

Using White Noise to Choose Between Red Noise and Blue Noise

There is source code that goes with this post, which generated all the images and did all the tests. You can find it at https://github.com/Atrix256/DFTRandomFibonacci

I recently saw a really cool video from @Numberphile which mixed some of my very favorite things: Fibonacci numbers (aka the golden ratio), red noise and blue noise.

The link to the video is below and is very much worth watching.
“Random Fibonacci Numbers”

It had me thinking though… they are using a coin flip (white noise) to determine if they should add the last two numbers (red noise / low pass filter) or subtract them (blue noise / high pass filter).

I was curious what the DFT of that would look like, which would show what sort of frequency content that number sequence had.

BTW I have a post talking about dice, probability distributions and noise color here that is a bit related: https://blog.demofox.org/2019/07/30/dice-distributions-noise-colors/

White Noise

Just to prime things, here is 100 uniform white noise values on the number line and the DFT of those points. The points clump together and leave holes, and the frequency spectrum shows all frequencies.

Here is the average DFT of 100,000 such point sets. The average flattens out and the grey lines showing 1 standard deviation are flat as well.

Regular Fibonacci Numbers

The first 90 Fibonacci numbers look like the below on the number line:

And here’s their DFT, where 0hz (DC) is in the middle:

Nothing super interesting really.

Randomized Fibonacci Numbers

Here is 90 randomized Fibonacci numbers on the numberline, the dft, and the average DFT of 100,000 such number sets.

It’s interesting to see that the individual randomized Fibonacci has strong low frequency content, but other frequencies too, while the average DFT of the number sets shows only low frequency content.

I think what’s going on here is that since the numbers start out small and grow larger over time, that they will always start out clumped together (red noise), but then depending on the coin flip (white noise) which have different frequency content in each set of numbers, as the numbers grow larger. This means that the only thing common among them is low frequency content, but the rest is just white noise and averages out to be flat.

Maybe not that interesting of a result, but it’s an answer at least ๐Ÿ˜›

Prime Numbers

I got a tweet from Tariq wondering what DFTing the prime numbers would look like.

Here are the first 25 on the numberline, and the DFT:

Here are the first 100:

Here are the first 200:

Here are the first 1000:

I don’t really see much of a pattern, but I guess if someone did, they would have gotten some prize by now? ๐Ÿ™‚

“The next coin flip has to be tails!”

If you saw a fair coin flipped and 10 heads came up in a row, you’d probably either think that the coin was a 2 headed coin, or that the next flip HAD to be tails.

In the code that goes with this post (https://github.com/Atrix256/DFTRandomFibonacci check out DoCoinTossTest()), I have a random number generator generate bits until there are 10 ones in a row, and then count how many times the next random bit is a one again.

I do that test 10000 times, and ran that overall test 5 times. Here are the results!

At an infinite number of coin flips, you can expect an even numbers of heads and tails, but until you reach infinity, all bets are off. When a coin has come up heads 10 times in a row, it still has a 50/50 chance of heads in the next coin flip.

That’s the definition of white noise – independent, random events.

Red noise would tend to have similar values – so, heads would clump together, and tails would clump together. This makes more sense with dice, where similar values would be rolled.

Blue noise would tend to have dissimilar values – so heads and tails would specifically NOT clump. And with dice, you’d rarely get the same, or similar values, in 2 dice rolls.

White noise doesn’t care about what came before, and just gives you a new number based on the probability distribution.

Keep this in mind if you ever play roulette!

How Do I Calculate Variance in 1 Pass?

If you google “how do i calculate variance?” you’ll get some nice explanations that say:

  1. Calculate the mean (average) of your numbers
  2. Calculate the average of: each number minus the mean, squared

That’s fine for people trying to just understand how the math works, but if you are calculating variance in a computer program, you might not realize there is a way to do it in a single pass over the data.

That can be significant to the performance and even feasibility of your algorithm!

Here is how you calculate variance in one pass:

  1. Calculate the mean (average) of your numbers
  2. In the same loop, calculate the mean (average) of your numbers squared
  3. After the loop, variance is the absolute value of #2, minus #1 squared

That might look like word salad, so here’s a code snippet.

float Lerp(float a, float b, float t)
{
    return a * (1.0f - t) + b * t;
}

float Variance_1Pass(const std::vector & data)
{
    // get the average (value) and average (value*value)
    float average_value = {};
    float average_valueSquared = {};
    for (size_t index = 0; index < data.size(); ++index)
    {
        float value = data[index];
        average_value = Lerp(average_value, value, 1.0f / float(index + 1));
        average_valueSquared = Lerp(average_valueSquared, value * value, 1.0f / float(index + 1));
    }

    // variance is absolute value of average(value*value) - (average_value*average_value)
    return abs(average_valueSquared - (average_value * average_value));
}

There is code that goes with this post, that implements it both ways and shows you that they are equivalent. You can find it at: https://github.com/Atrix256/CalculateVariance1Pass/blob/master/main.cpp

If you are wondering why I'm using "lerp" to average numbers, check out this post: https://blog.demofox.org/2016/08/23/incremental-averaging/

It turns out this one pass method can have numerical problems though, so no free lunch. Here is a more numerically robust way to do it, which also allows you to incrementally calculate variance, as numbers come in (Thanks Bart!): https://www.johndcook.com/blog/standard_deviation/

Why might you want to calculate variance?

One reason is if you are analyzing or reporting data, the average value is important to know, but it's also important to know if the numbers were usually pretty close to the average, or if they had lots of spikes above and below the average. You can square root variance to get the standard deviation, which is in the same units as the data you are reporting.

Assuming your data is a Gaussian distribution (due to something called the central limit theorem, a surprising number of things are actually gaussian distributed – like even rolling a bunch of dice and adding them up), 68% of the data points are + or – 1 standard deviation from the mean.

As an example, if the average temperature at your house over a month was 75 degrees Farenheit with a standard deviation of 5 degrees, that means that 68% of the days had a temperature between 70 and 80 degrees.

If the average temperature was still 75 but had a variance of 25 degrees, that means that 68% of the days had a temperature between 50 and 100 degrees. That is quite a difference! Just reporting the average temperature doesn't convey this information the same way as reporting average and standard deviation (or variance) does.

For more info on that, check this out: https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule

Lastly, I mentioned that doing 2 passes to calculate variance could be a deal breaker for an algorithm.

An example of this is a graphics technique called "Variance Shadow Maps" (paper: http://www.punkuser.net/vsm/vsm_paper.pdf) which ultimately calculates a mean and a variance for how far a group if pixels is away from a light source. When rendering the variance shadow map from the point of the view of the light, each pixel stores the depth, as well as the depth squared. A fun property is that you can blur these values with neighboring pixels without harming the mathematical integrity of the values. The result is soft shadows. (more info on soft shadows: https://blog.demofox.org/2017/07/01/why-are-some-shadows-soft-and-other-shadows-hard/)

When lighting a pixel later on and using the shadow map, it knows the pixel's distance from the light source, and can read the two values from the filtered (blurred) shadow map, which allow it to get the mean and variance of the objects in the shadow map (the things that are casting shadows). It then uses something called Chebyshev's inequality to get an estimate for how much in shadow the pixel is.

That is a lot of setup explanation, but if having to do 2 pass variance calculations, instead of the 1 pass that it does do, you'd have to run over the full screen of pixels and do logic for each one (subtract the average pixel value), to calculate the variance. In real time graphics, having to do an extra "full screen pass" can be pretty costly, and can easily put a game over budget, meaning the technique would have to be cut so the rest of the game could render fast enough.

This blog post is here in the hopes that the next time someone googles "how do i calculate variance" for use in a computer program, that they see this post, and implement it as a single pass. Fingers crossed! ๐Ÿ˜›

Using Low Discrepancy Sequences & Blue Noise in Loot Drop Tables for Games

I never thought I’d be much of a stats person but here we are. This post is low on formalism though, so may the gods of formalism have mercy on my soul!

This post includes the result of experiments showing the value of what is talked about, and includes some simple C++ that you can find at: https://github.com/Atrix256/LootLDS

If you’ve ever played a game that involved grinding for loot, you might have looked online and found the drop rate for a specific item, only to find that if it says it drops one out of 100 times, that it takes you 200-300 runs to get it, while your friends get the drop in the first 10 runs.

What the heck is that about?!

That, my friends, is the nature of regular old random numbers – aka white noise – the kind of random numbers you get from rolling fair dice, flipping fair coins, hashing values using good hash algorithms, or using well made random number generators.

The core issue is that white noise random numbers can take on any valid value with equal probability at any time, regardless of whatever has happened before.

If you were betting on whether a fair coin would come up heads or tails after seeing 10 heads, if you say the next will be tails (because of course it will!) you will still only be right 50% of the time. If you flip the coin an infinite number of times, you will get an even number of heads or tails, but before reaching infinity, all bets are off.

This can be a problem for game designers too. They can collect statistics about how their randomized systems are behaving, analyze the results and come to the conclusion that everything is balanced and even. While that may be true when looking at global averages, the individual player experience may vary wildly and not be balanced at all.

Tangent: This is called variance and is the source of noise in raytraced rendering.

Tangent: There’s a fun related story here about the U.S. air force realizing there is no such thing as an average pilot:
https://www.thestar.com/news/insight/2016/01/16/when-us-air-force-discovered-the-flaw-of-averages.html

In any case, is this “globally balanced but individually unbalanced” something we have to live with, or is there something we can do about it?

Luckily there is something we can do about it, and we can ensure that individual players have a more balanced, more pleasant, and more controlled experience, without sacrificing randomization.

Enter Low Discrepancy Sequences

A low discrepancy sequence is a sequence of numbers which are neither too close together nor too far apart.

If we put marks evenly spaced on a number line, the sequence of numbers at those marks would have zero discrepancy because they were evenly spaced. Low discrepancy numbers have a low discrepancy value that is greater than zero.

Examples of low discrepancy sequences that you may have heard of are: Sobol, Halton, Van Der Corput.

Some nice links of mine for learning more about low discrepancy sequences are:
https://blog.demofox.org/2017/05/29/when-random-numbers-are-too-random-low-discrepancy-sequences/
https://github.com/Atrix256/SampleZoo

Tangent: Going back to the raytraced noise example, regular sampling makes aliasing, and white noise sampling makes noise. Low discrepancy sequences sort of lay somewhere in the middle, gaining the benefits of both worlds, and actually having mystical powers of making numerical integration converge very quickly.

So what do low discrepancy sequences have to do with our problem?

If you use a low discrepancy sequence to generate 5 “random numbers” between 0 and 1, those 5 numbers will be roughly evenly spaced, which means that if you use those numbers on a loot table, the player is going to get a wider spread on the full possibilities of what the loot table has to offer.

If something has a very small percentage to drop, the player still has a low probability to get that drop, but if it’s a 1 in 100 drop, it’s more likely to happen at the 100th drop mark.

This is in constrast to white noise where the values may be clumped together and leave big holes in the 0 to 1 range, like: 0.114, 0.081, 0.093, 0.2, 0.95. There is a huge gap empty between 0.2 and 0.95, which is 75% of the possibilities!

There’s a problem with low discrepancy sequences though: They are deterministic – that is, they are predictable and not random. You get the same values from low discrepancy sequences every time.

Before admitting defeat though, there is another way to get randomization from this even though the sequences themselves are not random: You can shuffle the loot table!

Now, if you have thousands of players on an MMO server rolling numbers against a loot table, you probably just barfed in your mouth a little at my suggestion. There is a way to make a “shuffle iterator” though, so that you can get the benefits of shuffling a loot table, without actually having to keep copies of shuffled loot tables around. You’d use some unique player id (and more) as a random seed for the shuffle iterator, then could use a low discrepancy sequence to select loot. This way, each player would see different (randomized) loot drop sequences, but the loot rolls would still be low discrepancy.

Tangent: you can read more about a way to make shuffle iterators using low quality encryption in “Format Preserving Encryption” here: https://blog.demofox.org/2013/07/06/fast-lightweight-random-shuffle-functionality-fixed/

But we aren’t done yet…

Enter Randomized Low Discrepancy Sequences

The low discrepancy sequences talked about in the last section were deterministic, but what if we were able to make number sequences that had low discrepancy but were randomized?

That exists, and it’s known as “blue noise” because blue noise is random numbers which have only high frequency components (like how blue light is high frequency).

The property of both being low discrepancy, but also randomized, is great for many reasons way outside the scope of this article. For our loot drop purposes, it means that the loot will be both unpredictable, but also a player will get a balanced personalized experience, instead of only the global averages being balanced.

Tangent: Here’s a link about how to generate a blue noise sequence: https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/

The other shoe dropping is that blue noise can take a long time to generate, so is computationally expensive. In graphics, it’s common to generate blue noise in advance and just use the pre-made sequence. In loot drops, that is a less viable option because it makes your sequence deterministic and then you are back to shuffling the loot table.

Not real sure the answer here, but it may involve just keeping track of the last N loot drops, and using Mitchell’s best candidate algorithm to generate the N+1th value, adding that to the loot drop RNG list and removing the oldest one. If you get creative you might find a solution that fits your needs.

Prove it

Before we show experimental results, I wanted to defined a couple terms in regards to low discrepancy sequences.

  1. Progressive Sequence – A progressive sequence is a sequence of N values, where if you use less than N of the values, they still have the desired properties. For instance, if you make 100 blue noise distributed numbers, but only use the first 10, if it’s a progressive sequence, those first 10 will also be blue. If it isn’t a progressive sequence, you have to use all 100 before they are blue. This is also a property of deterministic low discrepancy sequences. For our loot drop purposes we NEED to use progressive sequences because other wise, the loot drops won’t be balanced until the very end, which kind of defeats the point.
  2. Open Sequence – An open sequence is one that you can always add more items to. If you regularly space 4 samples from 0 to 1 you are going to get 0, 0.25, 0.5, 0.75. If you want to add a 5th number you can’t! That means that this sequence is not open. Many low discrepancy sequences are open, and using Mitchell’s best candidate to generate blue noise does make an open sequence. For loot drops, we generally do want open sequences, because we usually don’t know how many times the player is going to loot something in advance.

The numbers below are from experiments done using the code that goes with this blog post. It’s ~380 lines of C++ in a single file using only standard includes. You can find it at: https://github.com/Atrix256/LootLDS

I used the following sequences:

  • White Noise – Straight up vanilla RNG.
  • Blue Noise – Using Mitchell’s Best Candidate to generate a progressive, open, uniform blue noise distributed number sequence.
  • Golden Ratio – Starting at 0, i add the golden ratio to the previous loot drop value to get the next loot drop value. I use modulus to keep the value between 0 and 1. The golden ratio has some interesting & good properties as a low discrepancy sequence.
  • Sobol – This is the low discrepancy sequence that leads in numerical integration convergence speeds.

For each sequence type, I generated 10 random loot tables which had 2 to 6 items, each item having a random roll weighting between 1 and 20. I then rolled as many loot drops as i needed to make it so the actual loot drop percentages were within 1% of what the loot drop table said they should be.

Higher numbers mean it took more loot drops for the actual loot dropped to reach the probabilities the loot table said they should be. Lower numbers mean it took fewer loot drops to reach the desired probabilities. I did 10 runs so that you can see if things are consistently good or consistently bad. Just doing a single test isn’t enough to show if something just got lucky one time, or if it’s a good choice.

Here are the results….

  • White Noise: 50513, 7834, 1859, 516, 8824, 3650, 1380, 24461, 35, 12455
  • Blue Noise: 72, 77, 143, 9, 129, 308, 353, 236, 176, 205
  • Golden Ratio: 47, 34, 50, 76, 55, 51, 114, 77, 21, 105
  • Sobol: 216, 155, 161, 77, 13, 71, 56, 75, 127, 51

It’s important to note that the loot tables themselves are generated with white noise, which is a source of variance in the results above. 10 samples isn’t a whole lot, so in a real analysis you might want to do more (100,000 or more runs?) but hopefully you should see that white noise really is not great. I was also surprised to see that Sobol didn’t do that well compared to blue noise and golden ratio. It must just do better at higher dimensions.

One last thing I wanted to mention is that this isn’t limited to loot drop tables. You can use these concepts for randomized events, procedural content generation, and basically anything else you use random numbers for in games.

The important takeaway here is that even if things look right when looking at global averages, using white noise makes the individual experience very different from that global average. Having better control over crafting a player’s individual experience is possible though, and has the possibility of giving a game a more hand crafted feel, even though you are still using RNG.

FRIENDS DON’T LET FRIENDS USE WHITE NOISE!