I saw this tweet go by on twitter and wondered – who of us hasn’t been in this situation, with things such as…
- Novella comments on a blog post
- stack overflow answers
- “reviewer #3” response on a research paper
- Or if it’s just review time and your boss needs to come up with some text to justify review scores that were pre-ordained by corporate politics, budgets constraints and the popularity contest farce they try to pass off as meritocracy.
What all these things have in common is that after the fact, you find yourself in possession of completely useless text.
That is, completely useless until today.
John von Neumann gave us the insight we need. If you have a biased source of bits, look at them in pairs. If they don’t match, you take the value of the first one as the unbiased bit to output. We can do this with whatever garbage we find has dropped into our lap, and turn that steaming heap of dung into something actually useful.
I implemented this and you can find the source code on github at https://github.com/Atrix256/EntropySalvager
More info on the technique: https://en.m.wikipedia.org/wiki/Fair_coin#Fair_results_from_a_biased_coin
Here is some example output, using words from the great orator MacDonald Trump as input.
Remember… When life gives you lemons, make random numbers.
Small print: Yeah this is just a joke and the numbers coming out are only as random as the numbers going in. This technique is a way to turn biased uniform random numbers into unbiased uniform random numbers and works with “random” things like coin flips and dice rolls, but less so with things like text which will make patterns in output. You’d want to use Decorrelation to turn this from a joke into a real thing 🙂 https://en.wikipedia.org/wiki/Decorrelation
At the risk of taking a joke too seriously…
Doesn’t this approach require the sequential bits to be independent?
The characters in this comment are basically all lowercase letters, which fall in the range 0b01100001 – 0b01111010. If we look at every 4th bit of the generated output, they’d almost all be 0.
LikeLiked by 1 person
Yes indeed! There’s a small print section at the end. And thanks for the bits! 🙂