Mean Squared Error is Variance

It’s April and this is my first blog post of the year. 2020/2021 has been a hard time for me like it has been for so many other people. After being absolutely destroyed at the end of last year, I discovered I have issues with both anxiety and depression and am talking to a therapist working through the problems, essentially debugging my life and thought patterns to live a better life. The virus and the BS related to the last president pushed me to a breaking point that I just couldn’t brute force muscle through like I normally do. Much improved now though luckily!

So, onto the main topic…

When analyzing randomized things, I often find myself wanting to graph averages to show how well things converge, and also wanting to graph variance or standard deviation to show how much they swing above and below that average. Averages alone can hide that important information. Variance shows up as noise when rendering too, so low variance is a nice thing.

I’ve seen quite a few sampling papers only report variance, not averages, and I never really understood why. The other day someone casually mentioned that mean squared error is variance and it threw me for a loop.

After thinking about it a bit, I was convinced: mean squared error is in fact variance, and root mean squared error is standard deviation. Let me show you…

To calculate the variance of a stream of values, you keep track of:

  1. Average value
  2. Average squared value

Then, variance is just this:

Variance = AverageSquaredValue – AverageValue*AverageValue

And you can square root that to get the standard deviation.

(Which BTW, there is a nice and easy numerically stable way to keep a “running average” that you can read about here: https://blog.demofox.org/2016/08/23/incremental-averaging/)

When we are talking about error, we know that the average value should be 0 if our process is unbiased, so we can modify the variance equation to be the below:

Variance = AverageSquaredValue

And since the value we are tracking is error, we can write it as:

Variance = AverageSquaredError

MSE is “mean squared error” where the word average above is the mean, so…

Variance = MeanSquaredError

And you can square root that to get the standard deviation of the error, which is also RMSE “Root Mean Squared Error”.

The nice thing about MSE being variance and RMSE being std dev is that if you are ok seeing squared error instead of regular error, you can have a single graph that communicates both error and variance in one.

I also find it interesting that squared error is used because that links it to “least squares” curve fitting (https://blog.demofox.org/2016/12/22/incremental-least-squares-curve-fitting/), which is pretty darn useful, and makes it feel a lot more ok to be looking at squared error instead of regular error. A benefit of using squared error is that it makes outliers a lot larger / more costly. This means that given the choice between one large error, or many little ones that equal the same amount of error, it will choose the many little ones instead. That means less noise in a render, and less variance.

This was a short post, but I have another one in mind I want to write next – and soon – that ought to be pretty interesting, combining my favorite noise for sampling (blue noise) and a commonly used noise for procedural content generation (Perlin noise).

Until then, stay safe!


Leave a comment