Bayes’ theorem is a way to calculate the probability of something happening based on evidence that may affect that probability. It’s a formula for combining probabilities together when they might affect each other. It’s also good for updating probabilities when you get new data.
The formula is:
The formula reads: “The probability of A happening, when B has happened is equal to the probability of B happening when A has happened, multiplied by the probability of A happening, divided by the probability of B happening.
Bayes’ theorem is a mouthful and looks formidable, especially if you aren’t used to the notation. Believe it or not, Bayes thought this was such a trivial formula that he himself didn’t even share it with anyone. After his death, the formula was found in his notes and made public. (https://en.wikipedia.org/wiki/Bayes%27_theorem#History)
The goal of this post is to give you the intuition for this formula such that you feel as Bayes did, that this is so trivial it isn’t even worth having a name. Despite that, it is very cool and very useful, so thanks for that, ghost of Bayes, wherever you may be.
I learned the intuition in backwards order, because I didn’t know what to look for in advance. Lucky for you, you don’t have to learn it in backwards order!
The probability of event A happening is written as .
The probability of event B happening is written as .
The probability of both events happening, is and this is called the joint probability.
If the two events are independent of each other – where one event happening has no bearing on the other event happening – you can calculate the joint probability by multiplying the probabilities together:
An example of independent (unrelated) events would be if A was “getting struck by lightning” and B was “enjoying the song wooly bully”
If the events are not independent, we have to turn to conditional probabilities.
A conditional probability is written like as and it means “What is the probability of A happening, if B has happened?”
If we are working with a conditional probability, we can still calculate a joint probability, but it is calculated differently:
Which reads “The probability of A and B happening is the probability of A happening if B has happened, multiplied by the probability of B happening”.
You can see how this calculates the joint probability because it’s still calculating the probability of both events happening. It just so happens that in this case, the two events are related.
An example of events that are not independent would be if A was “getting struck by lightning” and B was “climbing a power line in a storm”.
Putting It Together
Are you ready for the intuition?
Firstly, as we said above, the joint probability for conditional probabilities is calculated like this:
Secondly, you can re-order A and B in the joint probability. You are calculating the probability of both things happening so it doesn’t matter which event is first and which one is second in the equation.
We can write that like this:
And if you expand it out, it looks like this:
Thirdly and lastly, let’s divide both sides by the probability of B happening. This is just a simple algebra operation.
Oh my goodness, we derived Bayes’ theorem. What?!!!
Joint Probability: https://www.investopedia.com/terms/j/jointprobability.asp
An interactive visualization of Bayes’ things: https://seeing-theory.brown.edu/bayesian-inference/index.html
Bayesian Updating: http://www.statisticalengineering.com/bayesian.htm
Joint, Marginal and conditional probabilities: https://machinelearningmastery.com/joint-marginal-and-conditional-probability-for-machine-learning/