# How Do I Calculate Variance in 1 Pass?

If you google “how do i calculate variance?” you’ll get some nice explanations that say:

1. Calculate the mean (average) of your numbers
2. Calculate the average of: each number minus the mean, squared

That’s fine for people trying to just understand how the math works, but if you are calculating variance in a computer program, you might not realize there is a way to do it in a single pass over the data.

That can be significant to the performance and even feasibility of your algorithm!

Here is how you calculate variance in one pass:

1. Calculate the mean (average) of your numbers
2. In the same loop, calculate the mean (average) of your numbers squared
3. After the loop, variance is the absolute value of #2, minus #1 squared

That might look like word salad, so here’s a code snippet.

```float Lerp(float a, float b, float t)
{
return a * (1.0f - t) + b * t;
}

float Variance_1Pass(const std::vector & data)
{
// get the average (value) and average (value*value)
float average_value = {};
float average_valueSquared = {};
for (size_t index = 0; index < data.size(); ++index)
{
float value = data[index];
average_value = Lerp(average_value, value, 1.0f / float(index + 1));
average_valueSquared = Lerp(average_valueSquared, value * value, 1.0f / float(index + 1));
}

// variance is absolute value of average(value*value) - (average_value*average_value)
return abs(average_valueSquared - (average_value * average_value));
}
```

There is code that goes with this post, that implements it both ways and shows you that they are equivalent. You can find it at: https://github.com/Atrix256/CalculateVariance1Pass/blob/master/main.cpp If you are wondering why I'm using "lerp" to average numbers, check out this post: https://blog.demofox.org/2016/08/23/incremental-averaging/

It turns out this one pass method can have numerical problems though, so no free lunch. Here is a more numerically robust way to do it, which also allows you to incrementally calculate variance, as numbers come in (Thanks Bart!): https://www.johndcook.com/blog/standard_deviation/

Why might you want to calculate variance?

One reason is if you are analyzing or reporting data, the average value is important to know, but it's also important to know if the numbers were usually pretty close to the average, or if they had lots of spikes above and below the average. You can square root variance to get the standard deviation, which is in the same units as the data you are reporting.

Assuming your data is a Gaussian distribution (due to something called the central limit theorem, a surprising number of things are actually gaussian distributed – like even rolling a bunch of dice and adding them up), 68% of the data points are + or – 1 standard deviation from the mean.

As an example, if the average temperature at your house over a month was 75 degrees Farenheit with a standard deviation of 5 degrees, that means that 68% of the days had a temperature between 70 and 80 degrees.

If the average temperature was still 75 but had a variance of 25 degrees, that means that 68% of the days had a temperature between 50 and 100 degrees. That is quite a difference! Just reporting the average temperature doesn't convey this information the same way as reporting average and standard deviation (or variance) does.