# Understanding The Discrete Fourier Transform

I’ve been working on getting a better understanding of the Discrete Fourier Transform. I’ve figured out some things which have really helped my intuition, and made it a lot simpler in my head, so I wanted to write these up for the benefit of other folks, as well as for my future self when I need a refresher.

The discrete Fourier transform takes in data and gives out the frequencies that the data contains. This is useful if you want to analyze data, but can also be useful if you want to modify the frequencies then use the inverse discrete Fourier transform to generate the frequency modified data.

## Multiplying By Sinusoids (Sine / Cosine)

If you had a stream of data that had a single frequency in it at a constant amplitude, and you wanted to know the amplitude of that frequency, how could you figure that out?

One way would be to make a cosine wave of that frequency and multiply each sample of your wave by the corresponding samples.

For instance, let’s say that we have a data stream of four values that represent a 1hz cosine wave: 1, 0, -1, 0.

We could then multiply each point by a corresponding point on a cosine wave, add sum them together:

We get.. 1*1 + 0*0 + -1*-1 + 0*0 = 2.

The result we get is 2, which is twice the amplitude of the data points we had. We can divide by two to get the real amplitude.

To show that this works for any amplitude, here’s the same 1hz cosine wave data with an amplitude of five: 5, 0, -5, 0.

Multiplying by the 1hz cosine wave, we get… 5*1 + 0*0 + -5*-1 + 0*0 = 10.

The actual amplitude is 5, so we can see that our result was still twice the amplitude.

In general, you will need to divide by N / 2 where N is the number of samples you have.

What happens if our stream of data has something other than a cosine wave in it though?

Let’s try a sine wave: 0, 1, 0, -1

When we multiply those values by our cosine wave values we get: 0*1 + 1*0 + 0*-1 + -1*0 = 0.

We got zero. Our method broke!

In this case, if instead of multiplying by cosine, we multiply by sine, we get what we expect: 0*0 + 1*1 + 0*0 + -1*-1 = 2. We get results consistent with before, where our answer is the amplitude times two.

That’s not very useful if we have to know whether we are looking for a sine or cosine wave though. We might even have some other type of wave that is neither one!

The solution to this problem is actually to multiply the data points by both cosine and sine waves, and keep both results.

Let’s see how that works.

For this example We’ll take a cosine wave and shift it 0.25 radians to give us samples: 0.97, -0.25, -0.97, 0.25. (The formula is cos(x*2*pi/4+0.25) from 0 to 3)

When we multiply that by a cosine wave we get: 0.97*1 + -0.25*0 + -0.97*-1 + 0.25*0 = 1.94
When we multiply it by a sine wave we get: 0.97*0 + -0.25*1 + -0.97*0 + 0.25*-1 = -0.5

Since we know both of those numbers are twice the actual amplitude, we can divide them both by two to get:
cosine: 0.97
sine: -0.25

Those numbers kind of makes sense if you think about it. We took a cosine wave and shifted it over a little bit so our combined answer is that it’s mostly a cosine wave, but is heading towards being a sine wave (technically a sine wave that has an amplitude of -1, so is a negative sine wave, but is still a sine wave).

To get the amplitude of our phase shifted wave, we treat those values as a vector, and get the magnitude. sqrt((0.97*0.97)+(-0.25*-0.25)) = 1.00. That is correct! Our wave’s amplitude is 1.0.

To get the starting angle (phase) of the wave, we use inverse tangent of sine over cosine. We want to take atan2(sine, cosine), aka atan2(-0.25, 0.97) which gives us a result of -0.25 radians. That’s the amount that we shifted our wave, so it is also correct!

## Formulas So Far

Let’s take a look at where we are at mathematically. For $N$ samples of data, we multiply each sample of data by a sample from a wave with the frequency we are looking for and sum up the results (Note that this is really just a dot product between N dimensional vectors!). We have to do this both with a cosine wave and a sine wave, and keep the two resulting values to be able to get our true amplitude and phase information.

For now, let’s drop the “divide by two” part and look at the equations we have.

$Value_{cosine} = \sum\limits_{n=0}^{N-1} Sample_n * Cos(2\pi*Frequency*n/N)$

$Value_{sine} = \sum\limits_{n=0}^{N-1} Sample_n * Sin(2\pi*Frequency*n/N)$

Those equations above do exactly what we worked through in the samples.

The part that might be confusing is the part inside of the Cos() and Sin() so I’ll explain that real quick.

Looking at the graph cos(x) and sin(x) you’ll see that they both make one full cycle between the x values of 0 and 2*pi (aprox. 6.28):

If instead we change it to a graph of cos(2*pi*x) and sin(2*pi*x) you’ll notice that they both make one full cycle between the x values of 0 and 1:

This means that we can think of the wave forms in terms of percent (0 to 1) instead of in radians (0 to 2pi).

Next, we multiply by the frequency we are looking for. We do this because that multiplication makes the wave form repeat that many times between 0 and 1. It makes us a wave of the desired frequency, still remaining in “percent space” where we can go from 0 to 1 to get samples on our cosine and sine waves.

Here’s a graph of cos(2*pi*x*2) and sin(2*pi*x*2), to show some 2hz frequency waves:

Lastly, we multiply by n/N. In our sum, n is our index variable and it goes from 0 to N-1. This is very much like a for loop. n/N is our current percent done we are in the for loop, so when we multiply by n/N, we are just sampling at a different location (by percentage done) on our cosine and sine waves that are at the specified frequencies.

Not too difficult right?

## Imaginary Numbers

Wouldn’t it be neat if instead of having to do two separate calculations for our cosine and sine values, we could just do a single calculation that would give us both values?

Well, interestingly, we can! This is where imaginary numbers come in. Don’t get scared, they are here to make things simpler and more convenient, not to make things more complicated or harder to understand (:

There is something called Euler’s Identity which states the below:

$e^{ix} = cos(x) + i*sin(x)$

That looks pretty useful for our usage case doesn’t it? If you notice that our equations both had the same parameters for cos and sin, that means that we can just use this identity instead!

We can now take our two equations and combine them into a single equation:

$Value = \sum\limits_{n=0}^{N-1} Sample_n * e^{i *2\pi * Frequency*n/N}$

When we do the above calculation, we will get a complex number out with a real and imaginary part. The real part of the complex number is just the cosine value, while the imaginary part of the complex number is the sine value. Nothing scary has happened, we’ve just used/abused complex numbers a bit to give us what we want in a more compact form.

## Multiple Frequencies

Now we know how to look for a single, specific frequency.

What if you want to look for multiple frequencies though? Further more, what if you don’t even know what frequencies to look for?

When doing the discrete Fourier transform on a stream of samples, there are a specific set of frequencies that it checks for. If you have N samples, it only checks for N different frequencies. You could ask about other frequencies, but these frequencies are enough to reconstruct the original signal from only the frequency data.

The first N/2 frequencies are going to be the frequencies 0 (DC) through N/2. N/2 is the Nyquist frequency and is the highest frequency that your signal is able of holding.

After N/2 comes the negative frequencies, counting from a large negative frequency (N/2-1) up to the last frequency which is -1.

As an example, if you had 8 samples and ran the DFT, you’d get 8 complex numbers as outputs, which represent these frequencies:

[0hz, 1hz, 2hz, 3hz, 4hz, -3hz, -2hz, -1hz]

The important take away from this section is that if you have N samples, the DFT asks only about N very specific frequencies, which will give enough information to go from frequency domain back to time domain and get the signal data back.

That then leads to this equation below, where we just put a subscript on Value, to signify which frequency we are probing for.

$Value_{Frequency} = \sum\limits_{n=0}^{N-1} Sample_n * e^{2\pi i*Frequency*n/N}$

Frequency is an integer, and is between 0 and N-1. We can say that mathematically by listing this information along with the equation:

$Frequency \in [0,N), Frequency \in \mathbb Z$

## Final Form

Math folk use letters and symbols instead of words in their equations.

The letter $k$ is used instead of frequency.

Instead of $Value_{Frequency}$, the symbol is $X_{k}$.

Instead of $Sample_n$, the symbol is $x_n$.

This gives us a more formal version of the equation:

$X_k= \sum\limits_{n=0}^{N-1} x_n * e^{2\pi ikn/N}$

$k\in [0,N), k \in \mathbb Z$

We are close to the final form, but we aren’t quite done yet!

Remember the divide by two? I had us take out the divide by two so that I could re-introduce it now in it’s correct form. Basically, since we are querying for N frequencies, we need to divide each frequency’s result by N. I mentioned that we had to divide by N/2 to make the amplitude information correct, but since we are checking both positive AND negative frequencies, we have to divide by 2*(N/2), or just N. That makes our equation become this:

$X_k= 1/N \sum\limits_{n=0}^{N-1} x_n * e^{2\pi ikn/N}$

$k\in [0,N), k \in \mathbb Z$

Lastly, we actually want to make the waves go backwards, so we make the exponent negative. The reason for this is a deeper topic than I want to get into right now, but you can read more about why here if you are curious: DFT – Why are the definitions for inverse and forward commonly switched?

That brings us to our final form of the formula for the discrete Fourier transform:

$X_k= 1/N \sum\limits_{n=0}^{N-1} x_n * e^{-2\pi ikn/N}$

$k\in [0,N), k \in \mathbb Z$

## Taking a Test Drive

If we use that formula to calculate the DFT of a simple, 4 sample, 1hz cosine wave (1, 0, -1, 0), we get as output [0, 0.5, 0, -0.5].

That means the following:

• 0hz : 0 amplitude
• 1hz : 0.5 amplitude
• 2hz : 0 amplitude
• -1hz : -0.5 amplitude

It is a bit strange that the simple 1hz, 1.0 amplitude cosine wave was split in half, and made into a 1hz cosine wave and -1hz cosine wave, each contributing half amplitude to the result, but that is “just how it works”. I don’t have a very good explanation for this though. My personal understanding just goes that if we are checking for both positive AND negative frequencies, that makes it ambiguous whether it’s a 1hz wave with 1 amplitude, or -1hz wave with -1 amplitude. Since it’s ambiguous, both must be true, and they get half the amplitude each. If I get a better understanding, I’ll update this paragraph, or make a new post on it.

## Making The Inverse Discrete Fourier Transform

Making the formula for the Inverse DFT is really simple.

We start with our DFT formula, drop the 1/N from the front, and also make the exponent to e positive instead of negative. That is all! The intuition here for me is that we are just doing the reverse of the DFT process, to do the inverse DFT process.

$x_k= \sum\limits_{n=0}^{N-1} X_n * e^{2\pi ikn/N}$

$k\in [0,N), k \in \mathbb Z$

While the DFT takes in real valued signals and gives out complex valued frequency data, the IDFT takes in complex valued frequency data, and gives out real valued signals.

A fun thing to try is to take some data, DFT it, modify the frequency data, then IDFT it to see what comes out the other side.

## Other Notes

Here are some other things of note about the discrete Fourier transform and it’s inverse.

Data Repeating Forever

When you run the DFT on a stream of data, the math is such that it assumes that the stream of data you gave it repeats forever both forwards and backwards in time. This is important because if you aren’t careful, you can add frequency content to your data that you didn’t intend to be there.

For instance, if you tile a 1hz sine wave, it’s continuous, and there is only the 1hz frequency present:

However, if you tile a 0.9hz sine wave, there is a discontinuity, which means that there will be other, unintended frequencies present when you do a DFT of the data, to be able to re-create the discontinuity:

Fast Fourier Transform

There are a group of algorithms called “Fast Fourier Transforms”. You might notice that if we have N samples, taking the DFT is an O(N^2) operation. Fast Fourier transforms can bring it down to O(N log N).

DFT / IDFT Formula Variations

The formula we came up with is one possible DFT formula, but there are a handful of variations that are acceptable, even though different variations come up with different values!

The first option is whether to make the e exponent negative or not. Check out these two formulas to see what I mean.

$X_k= \sum\limits_{n=0}^{N-1} x_n * e^{2\pi ikn/N}$
vs
$X_k= \sum\limits_{n=0}^{N-1} x_n * e^{-2\pi ikn/N}$

Either one is acceptable, but when providing DFT’d data, you should mention which one you did, and make sure and use the opposite one in your inverse DFT formula.

The next options is whether to divide by N or not, like the below:

$X_k= \sum\limits_{n=0}^{N-1} x_n * e^{2\pi ikn/N}$
vs
$X_k= 1/N \sum\limits_{n=0}^{N-1} x_n * e^{2\pi ikn/N}$

Again, either one is acceptable, but you need to make sure and let people know which one you did, and also make sure and use the opposite one in your inverse DFT formula.

Instead of dividing by N, some people actually divide by 1/sqrt(N) in both the DFT and inverse DFT. Wolfram alpha does this for instance!

One thing to note though is that if doing 1/N on the DFT (my personal preference), the 0hz (DC) frequency bin gives you the average value of the signal, and the amplitude data you get out of the other bins is actually correct (keeping in mind the amplitudes are split in half between positive and negative frequencies).

Why Calculate Negative Frequencies

The negative frequencies are able to be calculated on demand from the positive frequency information (eg complex conjugation), so why should we even bother calculating them? Sure it’d be more computationally efficient not to calculate them, especially when just doing frequency analysis, right?!

Here’s a discussion about that: Why calculate negative frequencies of DFT?

Higher Dimensions

It’s possible to DFT in 2d, 3d, and higher. The last blog post shows how to do this with 2d images, but I’d also like to write a blog post like this one specifically talking about the intuition behind multi dimensional DFTs.

## Example Program Source Code

Here’s some simple C++ source code which calculates the DFT and inverse DFT, optionally showing work in case you want to try to work some out by hand to better understand this. Working a few out by hand really helped me get a better intuition for all this stuff.

Program Output:

Source Code:

#include <stdio.h>
#include <complex>
#include <vector>

#include <fcntl.h>
#include <io.h>

// set to 1 to have it show you the steps performed.  Set to 0 to hide the work.
// useful if checking work calculated by hand.
#define SHOW_WORK 1

#if SHOW_WORK
#define PRINT_WORK(...) wprintf(__VA_ARGS__)
#else
#define PRINT_WORK(...)
#endif

// Use UTF-16 encoding for Greek letters
static const wchar_t kPi = 0x03C0;
static const wchar_t kSigma = 0x03A3;

typedef float TRealType;
typedef std::complex<TRealType> TComplexType;

const TRealType c_pi = (TRealType)3.14159265359;
const TRealType c_twoPi = (TRealType)2.0 * c_pi;

//=================================================================================
TComplexType DFTSample (const std::vector<TRealType>& samples, int k)
{
size_t N = samples.size();
TComplexType ret;
for (size_t n = 0; n < N; ++n)
{
TComplexType calc = TComplexType(samples[n], 0.0f) * std::polar<TRealType>(1.0f, -c_twoPi * TRealType(k) * TRealType(n) / TRealType(N));
PRINT_WORK(L"    n = %i : (%f, %f)n", n, calc.real(), calc.imag());
ret += calc;
}
ret /= TRealType(N);
PRINT_WORK(L"    Sum the above and divide by %in", N);
return ret;
}

//=================================================================================
std::vector<TComplexType> DFTSamples (const std::vector<TRealType>& samples)
{
PRINT_WORK(L"DFT:  X_k = 1/N %cn[0,N) x_k * e^(-2%cikn/N)n", kSigma, kPi);

size_t N = samples.size();
std::vector<TComplexType> ret;
ret.resize(N);
for (size_t k = 0; k < N; ++k)
{
PRINT_WORK(L"  k = %in", k);
ret[k] = DFTSample(samples, k);
PRINT_WORK(L"  X_%i = (%f, %f)n", k, ret[k].real(), ret[k].imag());
}
PRINT_WORK(L"n");
return ret;
}

//=================================================================================
TRealType IDFTSample (const std::vector<TComplexType>& samples, int k)
{
size_t N = samples.size();
TComplexType ret;
for (size_t n = 0; n < N; ++n)
{
TComplexType calc = samples[n] * std::polar<TRealType>(1.0f, c_twoPi * TRealType(k) * TRealType(n) / TRealType(N));
PRINT_WORK(L"    n = %i : (%f, %f)n", n, calc.real(), calc.imag());
ret += calc;
}
PRINT_WORK(L"    Sum the above and take the real componentn");
return ret.real();
}

//=================================================================================
std::vector<TRealType> IDFTSamples (const std::vector<TComplexType>& samples)
{
PRINT_WORK(L"IDFT:  x_k = %cn[0,N) X_k * e^(2%cikn/N)n", kSigma, kPi);

size_t N = samples.size();
std::vector<TRealType> ret;
ret.resize(N);
for (size_t k = 0; k < N; ++k)
{
PRINT_WORK(L"  k = %in", k);
ret[k] = IDFTSample(samples, k);
PRINT_WORK(L"  x_%i = %fn", k, ret[k]);
}
PRINT_WORK(L"n");
return ret;
}

//=================================================================================
template<typename LAMBDA>
std::vector<TRealType> GenerateSamples (int numSamples, LAMBDA lambda)
{
std::vector<TRealType> ret;
ret.resize(numSamples);
for (int i = 0; i < numSamples; ++i)
{
TRealType percent = TRealType(i) / TRealType(numSamples);
ret[i] = lambda(percent);
}
return ret;
}

//=================================================================================
int main (int argc, char **argv)
{
// Enable Unicode UTF-16 output to console
_setmode(_fileno(stdout), _O_U16TEXT);

// You can test specific data samples like this:
//std::vector<TRealType> sourceData = { 1, 0, 1, 0 };
//std::vector<TRealType> sourceData = { 1, -1, 1, -1 };

// Or you can generate data samples from a function like this
std::vector<TRealType> sourceData = GenerateSamples(
4,
[] (TRealType percent)
{
const TRealType c_frequency = TRealType(1.0);
return cos(percent * c_twoPi * c_frequency);
}
);

// Show the source data
wprintf(L"nSource = [ ");
for (TRealType v : sourceData)
wprintf(L"%f ",v);
wprintf(L"]nn");

// Do a dft and show the results
std::vector<TComplexType> dft = DFTSamples(sourceData);
wprintf(L"dft = [ ");
for (TComplexType v : dft)
wprintf(L"(%f, %f) ", v.real(), v.imag());
wprintf(L"]nn");

// Do an inverse dft of the dft data, and show the results
std::vector<TRealType> idft = IDFTSamples(dft);
wprintf(L"idft = [ ");
for (TRealType v : idft)
wprintf(L"%f ", v);
wprintf(L"]n");

return 0;
}


Explaining how to calculate the frequencies represented by the bins of output of DFT:
How do I obtain the frequencies of each value in an FFT?

Another good explanation of the Fourier transform if it isn’t quite sinking in yet:
An Interactive Guide To The Fourier Transform

Some nice dft calculators that also have inverse dft equivelants:
DFT Calculator 1
IDFT Calculator 1
DFT Calculator 2
IDFT Calculator 2

Wolfram alpha can also do DFT and IDFT, but keep in mind that the formula used there is different and divides the results by 1/sqrt(N) in both DFT and IDFT so will be different values than you will get if you use a different formula.

Wolfram Alpha: Fourier[{1, 0, -1, 0}] = [0,1,0,1]

Wolfram Alpha: Inverse Fourier[{0, 1 , 0, 1}] = [1, 0, -1, 0]

# Fourier Transform (And Inverse) Of Images

I attended SIGGRAPH this year and there were some amazing talks.

There were quite a few talks dealing with the Fourier transform of images and sampling patterns, signal frequencies and bandwidth so I feel compelled to write up a blog post about the Fourier transform and inverse Fourier transform of images, as a transition to some other things that I want to write up.

At the bottom of this post is the source code to the program i used to make the examples. It’s a single CPP file that does not link to any libraries, and does not include any non standard headers (besides windows.h). It should be super simple to copy, paste, compile and use!

## Fourier Transform Overview

The Fourier transform converts data into the frequencies of sine and cosine waves that make up that data. Since we are going to be dealing with sampled data (pixels), we are going to be using the discrete Fourier transform.

After you perform the Fourier transform, you can run the inverse Fourier transform to get the original image back out.

You can also optionally modify the frequency data before running the inverse Fourier transform, which would give you an altered image as output.

In audio, a fourier transform is 1D, while with images, it’s 2D. That slows things down because a 1D Fourier transform is $O(N^2)$ while a 2D Fourier transform is $O(N^4)$.

This is quite an expensive operation as you can see, but there are some things that can mitigate the issue:

• The operation is separable on each axis. AKA only the naive implementation is $O(N^4)$.
• There is something called “The Fast Fourier Transform” which can make a 1D fourier transform go from $O(N^2)$ to $O(N log N)$ time complexity.
• Each pixel of output can be calculated without consideration of the other output pixels. The algorithm only needs read access to the source image or source data. This means that you can run this across however many cores you have on your CPU or GPU.

The items above are true of both the Fourier transform as well as the inverse Fourier transform.

The 2D Fourier transform takes a grid of REAL values as input of size MxN and returns a grid of COMPLEX values as output, also of size MxN.

The inverse 2D Fourier transform takes a grid of COMPLEX values as input, of size MxN and returns a grid of REAL values as output, also of size MxN.

The complex values are (of course!) made up of two components. You can get the amplitude of the frequency represented by the complex value by treating these components as a vector and getting the length. You can get the phase (angle that the frequency starts at) of the frequency by treating it like a vector and getting the angle it represents – like by using atan2(imaginary, real).

For more detailed information about the Fourier transform or the inverse Fourier transform, including the mathematical equations, please see the links at the end of this post!

## Image Examples

I’m going to show some examples of images that have been Fourier transformed, modified, and inverse Fourier transformed back. This should hopefully give you a more intuitive idea of what this stuff is all about.

I’m working with the source images in grey scale so i only have to work with one color channel (more on how to do that here: Blog.Demofox.Org: Converting RGB to Grayscale). You could easily do this same stuff with color images, but you would need to work with each color channel individually.

Zelda Guy

Here is the old man from “The Legend of Zelda” who gives you the sword. This image is 84×80 which takes about 1.75 seconds to do a fourier or inverse fourier transform with my naiive implementation of unoptimized code.

Taking a fourier transform of the greyscale version of that image gives me the following frequency amplitude (first) and phase (second) information. Note that we put the frequency amplitude through a log function to make the lesser represented frequencies show up more visibly.

Note that the center of the image represents frequency 0, aka DC. As you move out from the center, you get to higher and higher frequencies.

If you put that information into the inverse Fourier transform, you get the original image back out (in greyscale of course):

What if we changed the phase information though? Here’s what it looks like if we set all the frequencies to start at phase (angle) 0 instead of the proper angles, and then do an inverse Fourier transform:

It came out to be a completely different image! It has all the right frequencies, but the image is completely unrecognizable due to us messing with the phase data.

Interestingly, while your eyes are good at noticing differences in phase, your ears are not. That means that if this was a sound, instead of an image, you wouldn’t even be able to tell the difference. Strange isn’t it?

Now let’s do a low pass filter to our data. In other words, we are going to zero out the amplitude of all frequencies that are above a certain amount. We are going to zero out the frequencies that are farther than 10% of the image diagonal radius. That makes our frequency information look like this:

If we run the inverse Fourier transform on it, we get this:

The image got blurier because the high frequencies were removed. The high frequencies represent the small details of the image.

We also got some “ringing artifacts” which are the things that look like halos around the old man. This is also due to removing high frequency details. The short explanation for this is that it is very difficult to add sinusoids of different amplitudes and frequencies together to make a flat surface. To do so, you need a lot of small high frequency waves to fill in the areas next to the round hum humps to flatten it out. It’s the same issue you see when trying to make a square wave with additive synthesis, if you’ve read any of my posts on audio synthesis.

Now let’s try a high pass filter, where we remove frequencies that are closer than 10% of the image diagonal radius. First is the frequency amplitude information, and then the resulting image:

The results look pretty strange. These are the high frequency details that the blurry image is missing!

You could use a high pass filter on an image to do edge detection. You could use a low pass filter on an image to remove high frequency details before making the image smaller to prevent the aliasing that can happen when making an image smaller.

Let’s look at some other images that have been given similar treatment.

SIGGRAPH

Here’s a picture of myself at SIGGRAPH with my friend Paul who I used to work with at inXile! The image is 100×133 and takes about 6.5-7 seconds to do a Fourier transform or an inverse Fourier transform.

Here is the Fourier transform and inverse Fourier transform:

Here is the low pass frequency info and inverse Fourier transform:

Here is the high pass:

And here is the zero phase:

Simple Images

Lastly, here are some simple images, along with their frequency magnitude and phases. Sorry that they are so small, but hopefully you get the idea.

Horizontal Stripes:

Horizontal Stripe:

Vertical Stripes:

Vertical Stripe:

Diagonal Stripe:

You might notice that the Fourier transform frequency amplitudes actually run perpendicular to the orientation of the stripes. Look for a post soon which makes use of this property (:

## Example Code

Here is the code I used to generate the examples above.

If you pass this program the name of a 24 bit bmp file, it will generate and save the DFT, and also the inverse DFT to show that the image can survive a round trip. It will also do a low pass filter, high pass filter, and set the phase of all frequencies to zero, saving off both the frequency amplitude information, as well as the image generated from the frequency information for those operations.

The program below is written for clarity, not speed. In particular, the DFT and IDFT code is naively implemented so is O(N^4). To speed it up, it should be threaded, do the work on each axis separately, and also use a fast Fourier transform implementation.

#define _CRT_SECURE_NO_WARNINGS

#include <stdio.h>
#include <stdint.h>
#include <array>
#include <vector>
#include <complex>
#include <windows.h>  // for bitmap headers and performance counter.  Sorry non windows people!

const float c_pi = 3.14159265359f;
const float c_rootTwo = 1.41421356237f;

typedef uint8_t uint8;

struct SProgress
{
SProgress (const char* message, int total) : m_message(message), m_total(total)
{
m_amount = 0;
m_lastPercent = 0;
printf("%s   0%%", message);

QueryPerformanceFrequency(&m_freq);
QueryPerformanceCounter(&m_start);
}

~SProgress ()
{
// make it show 100%
m_amount = m_total;
Update(0);

// show how long it took
LARGE_INTEGER end;
QueryPerformanceCounter(&end);
printf(" (%0.2f seconds)n", seconds);
}

void Update (int delta = 1)
{
m_amount += delta;
int percent = int(100.0f * float(m_amount) / float(m_total));
if (percent <= m_lastPercent)
return;

m_lastPercent = percent;
printf("%c%c%c%c", 8, 8, 8, 8);
if (percent < 100)
printf(" ");
if (percent < 10)
printf(" ");
printf("%i%%", percent);
}

int m_lastPercent;
int m_amount;
int m_total;
const char* m_message;

LARGE_INTEGER m_start;
LARGE_INTEGER m_freq;
};

struct SImageData
{
SImageData ()
: m_width(0)
, m_height(0)
{ }

long m_width;
long m_height;
long m_pitch;
std::vector<uint8> m_pixels;
};

struct SImageDataComplex
{
SImageDataComplex ()
: m_width(0)
, m_height(0)
{ }

long m_width;
long m_height;
std::vector<std::complex<float>> m_pixels;
};

bool LoadImage (const char *fileName, SImageData& imageData)
{
// open the file if we can
FILE *file;
file = fopen(fileName, "rb");
if (!file)
return false;

{
fclose(file);
return false;
}

// read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4
if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)
{
fclose(file);
return false;
}

imageData.m_pitch = imageData.m_width*3;
if (imageData.m_pitch & 3)
{
imageData.m_pitch &= ~3;
imageData.m_pitch += 4;
}

fclose(file);
return true;
}

bool SaveImage (const char *fileName, const SImageData &image)
{
// open the file if we can
FILE *file;
file = fopen(fileName, "wb");
if (!file) {
printf("Could not save %sn", fileName);
return false;
}

// write the data and close the file
fclose(file);

printf("%s savedn", fileName);
return true;
}

void ImageToGrey (const SImageData &srcImage, SImageData &destImage)
{
destImage = srcImage;

for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
uint8 *dest = &destImage.m_pixels[(y * destImage.m_pitch) + x * 3];

uint8 grey = uint8((float(src[0]) * 0.3f + float(src[1]) * 0.59f + float(src[2]) * 0.11f));
dest[0] = grey;
dest[1] = grey;
dest[2] = grey;
}
}
}

std::complex<float> DFTPixel (const SImageData &srcImage, int K, int L)
{
std::complex<float> ret(0.0f, 0.0f);

for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// Get the pixel value (assuming greyscale) and convert it to [0,1] space
const uint8 *src = &srcImage.m_pixels[(y * srcImage.m_pitch) + x * 3];
float grey = float(src[0]) / 255.0f;

// Add to the sum of the return value
float v = float(K * x) / float(srcImage.m_width);
v += float(L * y) / float(srcImage.m_height);
ret += std::complex<float>(grey, 0.0f) * std::polar<float>(1.0f, -2.0f * c_pi * v);
}
}

return ret;
}

void DFTImage (const SImageData &srcImage, SImageDataComplex &destImage)
{
// NOTE: this function assumes srcImage is greyscale, so works on only the red component of srcImage.
// ImageToGrey() will convert an image to greyscale.

// size the output dft data
destImage.m_width = srcImage.m_width;
destImage.m_height = srcImage.m_height;
destImage.m_pixels.resize(destImage.m_width*destImage.m_height);

SProgress progress("DFT:", srcImage.m_width * srcImage.m_height);

// calculate 2d dft (brute force, not using fast fourier transform)
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// calculate DFT for that pixel / frequency
destImage.m_pixels[y * destImage.m_width + x] = DFTPixel(srcImage, x, y);

// update progress
progress.Update();
}
}
}

uint8 InverseDFTPixel (const SImageDataComplex &srcImage, int K, int L)
{
std::complex<float> total(0.0f, 0.0f);
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// Get the pixel value
const std::complex<float> &src = srcImage.m_pixels[(y * srcImage.m_width) + x];

// Add to the sum of the return value
float v = float(K * x) / float(srcImage.m_width);
v += float(L * y) / float(srcImage.m_height);
std::complex<float> result = src * std::polar<float>(1.0f, 2.0f * c_pi * v);

// sum up the results
total += result;
}
}

float idft = std::abs(total) / float(srcImage.m_width*srcImage.m_height);

// make sure the values are in range
if (idft < 0.0f)
idft = 0.0f;
if (idft > 1.0f)
idft = 1.0;

return uint8(idft * 255.0f);
}

void InverseDFTImage (const SImageDataComplex &srcImage, SImageData &destImage)
{
// size the output image
destImage.m_width = srcImage.m_width;
destImage.m_height = srcImage.m_height;
destImage.m_pitch = srcImage.m_width * 3;
if (destImage.m_pitch & 3)
{
destImage.m_pitch &= ~3;
destImage.m_pitch += 4;
}
destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);

SProgress progress("Inverse DFT:", srcImage.m_width*srcImage.m_height);

// calculate inverse 2d dft (brute force, not using fast fourier transform)
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// calculate DFT for that pixel / frequency
uint8 idft = InverseDFTPixel(srcImage, x, y);
uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
dest[0] = idft;
dest[1] = idft;
dest[2] = idft;

// update progress
progress.Update();
}
}
}

void GetMagnitudeData (const SImageDataComplex& srcImage, SImageData& destImage)
{
// size the output image
destImage.m_width = srcImage.m_width;
destImage.m_height = srcImage.m_height;
destImage.m_pitch = srcImage.m_width * 3;
if (destImage.m_pitch & 3)
{
destImage.m_pitch &= ~3;
destImage.m_pitch += 4;
}
destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);

// get floating point magnitude data
std::vector<float> magArray;
magArray.resize(srcImage.m_width*srcImage.m_height);
float maxmag = 0.0f;
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// Offset the information by half width & height in the positive direction.
// This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
int k = (x + srcImage.m_width / 2) % srcImage.m_width;
int l = (y + srcImage.m_height / 2) % srcImage.m_height;
const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];

float mag = std::abs(src);
if (mag > maxmag)
maxmag = mag;

magArray[y*srcImage.m_width + x] = mag;
}
}
if (maxmag == 0.0f)
maxmag = 1.0f;

const float c = 255.0f / log(1.0f+maxmag);

// normalize the magnitude data and send it back in [0, 255]
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
float src = c * log(1.0f + magArray[y*srcImage.m_width + x]);

uint8 magu8 = uint8(src);

uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
dest[0] = magu8;
dest[1] = magu8;
dest[2] = magu8;
}
}
}

void GetPhaseData (const SImageDataComplex& srcImage, SImageData& destImage)
{
// size the output image
destImage.m_width = srcImage.m_width;
destImage.m_height = srcImage.m_height;
destImage.m_pitch = srcImage.m_width * 3;
if (destImage.m_pitch & 3)
{
destImage.m_pitch &= ~3;
destImage.m_pitch += 4;
}
destImage.m_pixels.resize(destImage.m_pitch*destImage.m_height);

// get floating point phase data, and encode it in [0,255]
for (int x = 0; x < srcImage.m_width; ++x)
{
for (int y = 0; y < srcImage.m_height; ++y)
{
// Offset the information by half width & height in the positive direction.
// This makes frequency 0 (DC) be at the image origin, like most diagrams show it.
int k = (x + srcImage.m_width / 2) % srcImage.m_width;
int l = (y + srcImage.m_height / 2) % srcImage.m_height;
const std::complex<float> &src = srcImage.m_pixels[l*srcImage.m_width + k];

// get phase, and change it from [-pi,+pi] to [0,255]
float phase = (0.5f + 0.5f * std::atan2(src.real(), src.imag()) / c_pi);
if (phase < 0.0f)
phase = 0.0f;
if (phase > 1.0f)
phase = 1.0;
uint8 phase255 = uint8(phase * 255);

// write the phase as grey scale color
uint8* dest = &destImage.m_pixels[y*destImage.m_pitch + x * 3];
dest[0] = phase255;
dest[1] = phase255;
dest[2] = phase255;
}
}
}

int main (int argc, char **argv)
{
float scale = 1.0f;
int filter = 0;

bool showUsage = argc < 2;
char *srcFileName = argv[1];

if (showUsage)
{
printf("Usage: <source>nn");
return 1;
}

// trim off file extension from source filename so we can make our other file names
char baseFileName[1024];
strcpy(baseFileName, srcFileName);
for (int i = strlen(baseFileName) - 1; i >= 0; --i)
{
if (baseFileName[i] == '.')
{
baseFileName[i] = 0;
break;
}
}

// Load source image if we can
SImageData srcImage;
{
printf("%s loaded (%i x %i)n", srcFileName, srcImage.m_width, srcImage.m_height);

// do DFT on a greyscale version of the image, instead of doing it per color channel
SImageData greyImage;
ImageToGrey(srcImage, greyImage);
SImageDataComplex frequencyData;
DFTImage(greyImage, frequencyData);

// save magnitude information
{
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".raw.mag.bmp");

SImageData destImage;
GetMagnitudeData(frequencyData, destImage);
SaveImage(outFileName, destImage);
}

// save phase information
{
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".raw.phase.bmp");

SImageData destImage;
GetPhaseData(frequencyData, destImage);
SaveImage(outFileName, destImage);
}

// inverse dft the modified frequency and save the result
{
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".raw.idft.bmp");

SImageData modifiedImage;
InverseDFTImage(frequencyData, modifiedImage);
SaveImage(outFileName, modifiedImage);
}

// Low Pass Filter: Remove high frequencies, write out frequency magnitudes, write out inverse dft
{
printf("n=====LPF=====n");

// remove frequencies that are too far from frequency 0.
// Note that even though our output frequency images have frequency 0 (DC) in the center, that
// isn't actually how it's stored in our SImageDataComplex structure.  Pixel (0,0) is frequency 0.
SImageDataComplex dft = frequencyData;
float halfWidth = float(dft.m_width / 2);
float halfHeight = float(dft.m_height / 2);
for (int x = 0; x < dft.m_width; ++x)
{
for (int y = 0; y < dft.m_height; ++y)
{
float relX = 0.0f;
float relY = 0.0f;

if (x < halfWidth)
relX = float(x) / halfWidth;
else
relX = (float(x) - float(dft.m_width)) / halfWidth;

if (y < halfHeight)
relY = float(y) / halfHeight;
else
relY = (float(y) - float(dft.m_height)) / halfHeight;

float dist = sqrt(relX*relX + relY*relY) / c_rootTwo; // divided by root 2 so our distance is from 0 to 1
if (dist > 0.1f)
dft.m_pixels[y*dft.m_width + x] = std::complex<float>(0.0f, 0.0f);
}
}

// write dft magnitude data
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".lpf.mag.bmp");
SImageData destImage;
GetMagnitudeData(dft, destImage);
SaveImage(outFileName, destImage);

// inverse dft and save the image
strcpy(outFileName, baseFileName);
strcat(outFileName, ".lpf.idft.bmp");
SImageData modifiedImage;
InverseDFTImage(dft, modifiedImage);
SaveImage(outFileName, modifiedImage);
}

// High Pass Filter: Remove low frequencies, write out frequency magnitudes, write out inverse dft
{
printf("n=====HPF=====n");

// remove frequencies that are too close to frequency 0.
// Note that even though our output frequency images have frequency 0 (DC) in the center, that
// isn't actually how it's stored in our SImageDataComplex structure.  Pixel (0,0) is frequency 0.
SImageDataComplex dft = frequencyData;
float halfWidth = float(dft.m_width / 2);
float halfHeight = float(dft.m_height / 2);
for (int x = 0; x < dft.m_width; ++x)
{
for (int y = 0; y < dft.m_height; ++y)
{
float relX = 0.0f;
float relY = 0.0f;

if (x < halfWidth)
relX = float(x) / halfWidth;
else
relX = (float(x) - float(dft.m_width)) / halfWidth;

if (y < halfHeight)
relY = float(y) / halfHeight;
else
relY = (float(y) - float(dft.m_height)) / halfHeight;

float dist = sqrt(relX*relX + relY*relY) / c_rootTwo; // divided by root 2 so our distance is from 0 to 1
if (dist < 0.1f)
dft.m_pixels[y*dft.m_width + x] = std::complex<float>(0.0f, 0.0f);
}
}

// write dft magnitude data
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".hpf.mag.bmp");
SImageData destImage;
GetMagnitudeData(dft, destImage);
SaveImage(outFileName, destImage);

// inverse dft and save the image
strcpy(outFileName, baseFileName);
strcat(outFileName, ".hpf.idft.bmp");
SImageData modifiedImage;
InverseDFTImage(dft, modifiedImage);
SaveImage(outFileName, modifiedImage);
}

// ZeroPhase
{
printf("n=====Zero Phase=====n");

// Set phase to zero for all frequencies.
// Note that even though our output frequency images have frequency 0 (DC) in the center, that
// isn't actually how it's stored in our SImageDataComplex structure.  Pixel (0,0) is frequency 0.
SImageDataComplex dft = frequencyData;
float halfWidth = float(dft.m_width / 2);
float halfHeight = float(dft.m_height / 2);
for (int x = 0; x < dft.m_width; ++x)
{
for (int y = 0; y < dft.m_height; ++y)
{
std::complex<float>& v = dft.m_pixels[y*dft.m_width + x];
float mag = std::abs(v);
v = std::complex<float>(mag, 0.0f);
}
}

// write dft magnitude data
char outFileName[1024];
strcpy(outFileName, baseFileName);
strcat(outFileName, ".phase0.mag.bmp");
SImageData destImage;
GetMagnitudeData(dft, destImage);
SaveImage(outFileName, destImage);

// inverse dft and save the image
strcpy(outFileName, baseFileName);
strcat(outFileName, ".phase0.idft.bmp");
SImageData modifiedImage;
InverseDFTImage(dft, modifiedImage);
SaveImage(outFileName, modifiedImage);
}
}
else
printf("could not read 24 bit bmp file %snn", srcFileName);

return 0;
}


Here are some links that I found useful:
Fourier Transform
http://www.thefouriertransform.com/
Introduction To Fourier Transforms For Image Processing

# Intro To Audio Synthesis For Music Presentation

Today I gave a presentation at work on the basics of audio synthesis for music. It seemed to go fairly well and I was surprised to hear that so many others also dabbled in audio synth and music.

The slide deck and example C++ program (uses portaudio) are both up on github here:
https://github.com/Atrix256/MusicSynth

Questions, feedback, etc? Drop me a line (:

# The Beating Effect

This post is going to be a pretty short one, so here it is 😛

# The Beating Effect

The beating effect occurs when you play two similar frequencies of sound at the same time.

Because playing two sounds at once means adding them together, and due to the fact that sound waves are made up of positive and negative values (aka positive and negative pressures), the sounds playing at different frequencies will sometimes add peaks and troughs together to get louder, and other times the peak of one wave will add to the valley of another wave and the result will be quieter. This is know as constructive and destructive interference respectively.

The end result is that the sound will have a pulsing quality to it, like a tremolo effect. If one sound is played at frequency F1 and the other sound is played at frequency F2, the pulsing sound will occur at 2*(F2-F1) times a second.

Here’s a demo of this in action where the first sound is 200hz, the second sound is 205hz, and the result of them played together has a 10hz tremolo effect!

monobeat.wav

# Binaural Beat

The beating effect happens when sound waves physically mix together.

Believe it or not though, there is a part of your brain where it mixes (adds) the sounds from each ear together as well.

That means that if you play similar frequencies to different ears, they will mix INSIDE YOUR BRAIN, and you will perceive a different kind of beating effect.

Below is a demo of that. You might need a pair of headphones to get the full effect.

stereobeat.wav

If you think this is pretty neat, you might want to google “psycho-acoustics” for more stuff like this (:

## Source Code

Here’s the C++ code that generated these sound files.

#include <stdio.h>
#include <memory.h>
#include <inttypes.h>
#include <vector>

// constants
const float c_pi = 3.14159265359f;
const float c_twoPi = 2.0f * c_pi;

// typedefs
typedef uint16_t    uint16;
typedef uint32_t    uint32;
typedef int16_t     int16;
typedef int32_t     int32;

//this struct is the minimal required header data for a wav file
{
//the main chunk
unsigned char m_chunkID[4];
uint32		  m_chunkSize;
unsigned char m_format[4];

//sub chunk 1 "fmt "
unsigned char m_subChunk1ID[4];
uint32		  m_subChunk1Size;
uint16		  m_audioFormat;
uint16		  m_numChannels;
uint32		  m_sampleRate;
uint32		  m_byteRate;
uint16		  m_blockAlign;
uint16		  m_bitsPerSample;

//sub chunk 2 "data"
unsigned char m_subChunk2ID[4];
uint32		  m_subChunk2Size;

//then comes the data!
};

//this writes
template <typename T>
bool WriteWaveFile(const char *fileName, std::vector<T> data, int16 numChannels, int32 sampleRate)
{
int32 dataSize = data.size() * sizeof(T);
int32 bitsPerSample = sizeof(T) * 8;

//open the file if we can
FILE *File = nullptr;
fopen_s(&File, fileName, "w+b");
if (!File)
return false;

//fill out the main chunk

//fill out sub chunk 1 "fmt "
waveHeader.m_byteRate = sampleRate * numChannels * bitsPerSample / 8;
waveHeader.m_blockAlign = numChannels * bitsPerSample / 8;

//fill out sub chunk 2 "data"

//write the wave data itself
fwrite(&data[0], dataSize, 1, File);

//close the file and return success
fclose(File);
return true;
}

template <typename T>
void ConvertFloatSamples (const std::vector<float>& in, std::vector<T>& out)
{
// make our out samples the right size
out.resize(in.size());

// convert in format to out format !
for (size_t i = 0, c = in.size(); i < c; ++i)
{
float v = in[i];
if (v < 0.0f)
v *= -float(std::numeric_limits<T>::lowest());
else
v *= float(std::numeric_limits<T>::max());
out[i] = T(v);
}
}

void GenerateMonoBeatingSamples (std::vector<float>& samples, int sampleRate)
{
int sectionLength = samples.size() / 3;
int envelopeLen = sampleRate / 20;

for (int index = 0, numSamples = samples.size(); index < numSamples; ++index)
{
samples[index] = 0.0f;
int section = index / sectionLength;
int sectionOffset = index % sectionLength;

// apply an envelope at front and back  of each section keep it from popping between sounds
float envelope = 1.0f;
if (sectionOffset < envelopeLen)
envelope = float(sectionOffset) / float(envelopeLen);
else if (sectionOffset > sectionLength - envelopeLen)
envelope = (float(sectionLength) - float(sectionOffset)) / float(envelopeLen);

// first sound
if (section == 0 || section == 2)
samples[index] += sin(float(index) * c_twoPi * 200.0f / float(sampleRate)) * envelope;

// second sound
if (section == 1 || section == 2)
samples[index] += sin(float(index) * c_twoPi * 205.0f / float(sampleRate)) * envelope;

// scale it to prevent clipping
if (section == 2)
samples[index] *= 0.5f;
samples[index] *= 0.95f;
}
}

void GenerateStereoBeatingSamples (std::vector<float>& samples, int sampleRate)
{
int sectionLength = (samples.size() / 2) / 3;
int envelopeLen = sampleRate / 20;

for (int index = 0, numSamples = samples.size() / 2; index < numSamples; ++index)
{
samples[index * 2] = 0.0f;
samples[index * 2 + 1] = 0.0f;

int section = index / sectionLength;
int sectionOffset = index % sectionLength;

// apply an envelope at front and back  of each section keep it from popping between sounds
float envelope = 1.0f;
if (sectionOffset < envelopeLen)
envelope = float(sectionOffset) / float(envelopeLen);
else if (sectionOffset > sectionLength - envelopeLen)
envelope = (float(sectionLength) - float(sectionOffset)) / float(envelopeLen);
envelope *= 0.95f;

// first sound
if (section == 0 || section == 2)
samples[index * 2] += sin(float(index) * c_twoPi * 200.0f / float(sampleRate)) * envelope;

// second sound
if (section == 1 || section == 2)
samples[index * 2 + 1] += sin(float(index) * c_twoPi * 205.0f / float(sampleRate)) * envelope;
}
}

//the entry point of our application
int main(int argc, char **argv)
{
// generate the mono beating effect
{
// sound format parameters
const int c_sampleRate = 44100;
const int c_numSeconds = 4;
const int c_numChannels = 1;
const int c_numSamples = c_sampleRate * c_numChannels * c_numSeconds;

// make space for our samples
std::vector<float> samples;
samples.resize(c_numSamples);

// generate samples
GenerateMonoBeatingSamples(samples, c_sampleRate);

// convert from float to the final format
std::vector<int32> samplesInt;
ConvertFloatSamples(samples, samplesInt);

// write our samples to a wave file
WriteWaveFile("monobeat.wav", samplesInt, c_numChannels, c_sampleRate);
}

// generate the stereo beating effect (binaural beat)
{
// sound format parameters
const int c_sampleRate = 44100;
const int c_numSeconds = 4;
const int c_numChannels = 2;
const int c_numSamples = c_sampleRate * c_numChannels * c_numSeconds;

// make space for our samples
std::vector<float> samples;
samples.resize(c_numSamples);

// generate samples
GenerateStereoBeatingSamples(samples, c_sampleRate);

// convert from float to the final format
std::vector<int32> samplesInt;
ConvertFloatSamples(samples, samplesInt);

// write our samples to a wave file
WriteWaveFile("stereobeat.wav", samplesInt, c_numChannels, c_sampleRate);
}
}


For a more mathematical explanation of these, check out wikipedia (:
Wikipedia: Beat (acoustics)

Wikipedia: Binaural beats

# Synthesizing a Plucked String Sound With the Karplus-Strong Algorithm

If you are looking to synthesize the sound of a plucked string, there is an amazingly simple algorithm for doing so called the Karplus-Strong Algorithm.

Give it a listen: KarplusStrong.wav
Here it is with flange and reverb effects applied: KPFlangeReverb.wav

It works like this:

1. Fill a circular buffer with static (random numbers)
2. Play the contents of the circular buffer over and over
3. Each time you play a sample, replace that sample with the average of itself and the next sample in the buffer. Also multiplying that average by a feedback value (like say, 0.996)

Amazingly, that is all there is to it!

## Why Does That Work?!

The reason this works is that it is actually very similar to how a real guitar string pluck works.

When you pluck a guitar string, if you had a perfect pluck at the perfect location with the perfect transfer of energy, you’d get a note that was “perfect”. It wouldn’t be a pure sine wave since strings have harmonics (integer multiple frequencies) beyond their basic tuning, but it would be a pure note.

In reality, that isn’t what happens, so immediately after plucking the string, there is a lot of vibrations in there that “don’t belong” due to the imperfect pluck. Since the string is tuned, it wants to be vibrating a specific way, so over time the vibrations evolve from the imperfect pluck vibrations to the tuning of the guitar string. As you average the samples together, you are removing the higher frequency noise/imperfections. Averaging is a crude low pass filter. This makes it converge to the right frequencies over time.

It’s also important to note that with a real stringed instrument, when you play a note, the high frequencies disappear before the low frequencies. This averaging / low pass filter makes that happen as well and is part of what helps it sound so realistic.

Also while all that is going on, the energy in the string is being diminished as it becomes heat and sound waves and such, so the noise gets quieter over time. When you multiply the values by a feedback value which is less than 1, you are simulating this loss of energy by making the values get smaller over time.

## Tuning The Note

This wasn’t intuitive for me at first, but the frequency that the note plays at is determined ENTIRELY by the size of the circular buffer.

If your audio has a sample rate of 44100hz (44100 samples played a second), and you use this algorithm with a buffer size of 200 samples, that means that the note synthesized will be 220.5hz. This is because 44100/200 = 220.5.

Thinking about the math from another direction, we can figure out what our buffer size needs to be for a specific frequency. If our sample rate is 44100hz and we want to play a note at 440hz, that means we need a buffer size of 100.23 samples. This is because 44100/440 = 100.23. Since we can’t have a fractional number of samples, we can just round to 100.

You can actually deal with the fractional buffer size by stepping through the ring buffer in non integer steps and using the fraction to interpolate audio samples, but I’ll leave that as an exercise for you if you want that perfectly tuned note. IMO leaving it slightly off could actually be a good thing. What guitar is ever perfectly in tune, right?! With it being slightly out of tune, it’s more likely to make more realistic sounds and sound interactions when paired with other instruments.

You are probably wondering like I was, why the buffer size affects the frequency of the note. The reason for this is actually pretty simple and intuitive after all.

The reason is because the definition of frequency is just how many times a wave form repeats per second. The wave form could be a sine wave, a square wave, a triangle wave, or it could be something more complex, but frequency is always the number of repetitions per second. If you think about our ring buffer as being a wave form, you can now see that if we have a buffer size of 200 samples, and a sample rate of 44100hz, when we play that buffer continually, it’s going to play back 220.5 times every second, which means it will play with a frequency of 220.5!

Sure, we modify the buffer (and waveform) as we play it, but the modifications are small, so the waveform is similar from play to play.

## Some More Details

I’ve found that this algorithm doesn’t work as well with low frequency notes as it does with high frequency notes.

They say you can prime the buffer with a saw tooth wave (or other wave forms) instead of static (noise). While it still “kind of works”, in my experimentation, it didn’t work out that well.

You could try using other low pass filters to see if that affects the quality of the note generated. The simple averaging method works so well, I didn’t explore alternative options very much.

Kmm on hacker news commented that averaging the current sample with the last and next, instead of just the next had the benefit that the wave form didn’t move forward half a step each play through and that there is an audible difference between the techniques. I gave it a try and sure enough, there is an audible difference, the sound is less harsh on the ears. I believe this is so because averaging 3 samples instead of 2 is a stronger low pass filter, so gets rid of higher frequencies faster.

## Example Code

Here is the C++ code that generated the sample at the top of the post. Now that you can generate plucked string sounds, you can add some distortion, flange, reverb, etc and make some sweet (synthesized) metal without having to learn to play guitar and build up finger calluses 😛

#include <stdio.h>
#include <memory.h>
#include <inttypes.h>
#include <vector>

// constants
const float c_pi = 3.14159265359f;
const float c_twoPi = 2.0f * c_pi;

// typedefs
typedef uint16_t    uint16;
typedef uint32_t    uint32;
typedef int16_t     int16;
typedef int32_t     int32;

//this struct is the minimal required header data for a wav file
{
//the main chunk
unsigned char m_chunkID[4];
uint32		  m_chunkSize;
unsigned char m_format[4];

//sub chunk 1 "fmt "
unsigned char m_subChunk1ID[4];
uint32		  m_subChunk1Size;
uint16		  m_audioFormat;
uint16		  m_numChannels;
uint32		  m_sampleRate;
uint32		  m_byteRate;
uint16		  m_blockAlign;
uint16		  m_bitsPerSample;

//sub chunk 2 "data"
unsigned char m_subChunk2ID[4];
uint32		  m_subChunk2Size;

//then comes the data!
};

//this writes
template <typename T>
bool WriteWaveFile(const char *fileName, std::vector<T> data, int16 numChannels, int32 sampleRate)
{
int32 dataSize = data.size() * sizeof(T);
int32 bitsPerSample = sizeof(T) * 8;

//open the file if we can
FILE *File = nullptr;
fopen_s(&File, fileName, "w+b");
if (!File)
return false;

//fill out the main chunk

//fill out sub chunk 1 "fmt "
waveHeader.m_byteRate = sampleRate * numChannels * bitsPerSample / 8;
waveHeader.m_blockAlign = numChannels * bitsPerSample / 8;

//fill out sub chunk 2 "data"

//write the wave data itself
fwrite(&data[0], dataSize, 1, File);

//close the file and return success
fclose(File);
return true;
}

template <typename T>
void ConvertFloatSamples (const std::vector<float>& in, std::vector<T>& out)
{
// make our out samples the right size
out.resize(in.size());

// convert in format to out format !
for (size_t i = 0, c = in.size(); i < c; ++i)
{
float v = in[i];
if (v < 0.0f)
v *= -float(std::numeric_limits<T>::lowest());
else
v *= float(std::numeric_limits<T>::max());
out[i] = T(v);
}
}
//calculate the frequency of the specified note.
//fractional notes allowed!
float CalcFrequency(float octave, float note)
/*
Calculate the frequency of any note!
frequency = 440×(2^(n/12))

N=0 is A4
N=1 is A#4
etc...

notes go like so...
0  = A
1  = A#
2  = B
3  = C
4  = C#
5  = D
6  = D#
7  = E
8  = F
9  = F#
10 = G
11 = G#
*/
{
return (float)(440 * pow(2.0, ((double)((octave - 4) * 12 + note)) / 12.0));
}

class CKarplusStrongStringPluck
{
public:
CKarplusStrongStringPluck (float frequency, float sampleRate, float feedback)
{
m_buffer.resize(uint32(float(sampleRate) / frequency));
for (size_t i = 0, c = m_buffer.size(); i < c; ++i) {
m_buffer[i] = ((float)rand()) / ((float)RAND_MAX) * 2.0f - 1.0f;  // noise
//m_buffer[i] = float(i) / float(c); // saw wave
}
m_index = 0;
m_feedback = feedback;
}

float GenerateSample ()
{
// get our sample to return
float ret = m_buffer[m_index];

// low pass filter (average) some samples
float value = (m_buffer[m_index] + m_buffer[(m_index + 1) % m_buffer.size()]) * 0.5f * m_feedback;
m_buffer[m_index] = value;

// move to the next sample
m_index = (m_index + 1) % m_buffer.size();

// return the sample from the buffer
return ret;
}

private:
std::vector<float>  m_buffer;
size_t              m_index;
float               m_feedback;
};

void GenerateSamples (std::vector<float>& samples, int sampleRate)
{
std::vector<CKarplusStrongStringPluck> notes;

enum ESongMode {
e_twinkleTwinkle,
e_strum
};

int timeBegin = 0;
ESongMode mode = e_twinkleTwinkle;
for (int index = 0, numSamples = samples.size(); index < numSamples; ++index)
{
switch (mode) {
case e_twinkleTwinkle: {
const int c_noteTime = sampleRate / 2;
int time = index - timeBegin;
// if we should start a new note
if (time % c_noteTime == 0) {
int note = time / c_noteTime;
switch (note) {
case 0:
case 1: {
notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 0), float(sampleRate), 0.996f));
break;
}
case 2:
case 3: {
notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 7), float(sampleRate), 0.996f));
break;
}
case 4:
case 5: {
notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 9), float(sampleRate), 0.996f));
break;
}
case 6: {
notes.push_back(CKarplusStrongStringPluck(CalcFrequency(3, 7), float(sampleRate), 0.996f));
break;
}
case 7: {
mode = e_strum;
timeBegin = index+1;
break;
}
}
}
break;
}
case e_strum: {
const int c_noteTime = sampleRate / 32;
int time = index - timeBegin - sampleRate;
// if we should start a new note
if (time % c_noteTime == 0) {
int note = time / c_noteTime;
switch (note) {
case 0: notes.push_back(CKarplusStrongStringPluck(55.0f, float(sampleRate), 0.996f)); break;
case 1: notes.push_back(CKarplusStrongStringPluck(55.0f + 110.0f, float(sampleRate), 0.996f)); break;
case 2: notes.push_back(CKarplusStrongStringPluck(55.0f + 220.0f, float(sampleRate), 0.996f)); break;
case 3: notes.push_back(CKarplusStrongStringPluck(55.0f + 330.0f, float(sampleRate), 0.996f)); break;
case 4: mode = e_strum; timeBegin = index + 1; break;
}
}
break;
}
}

// generate and mix our samples from our notes
samples[index] = 0;
for (CKarplusStrongStringPluck& note : notes)
samples[index] += note.GenerateSample();

// to keep from clipping
samples[index] *= 0.5f;
}
}

//the entry point of our application
int main(int argc, char **argv)
{
// sound format parameters
const int c_sampleRate = 44100;
const int c_numSeconds = 9;
const int c_numChannels = 1;
const int c_numSamples = c_sampleRate * c_numChannels * c_numSeconds;

// make space for our samples
std::vector<float> samples;
samples.resize(c_numSamples);

// generate samples
GenerateSamples(samples, c_sampleRate);

// convert from float to the final format
std::vector<int32> samplesInt;
ConvertFloatSamples(samples, samplesInt);

// write our samples to a wave file
WriteWaveFile("out.wav", samplesInt, c_numChannels, c_sampleRate);
}


Hacker News Discussion (This got up to topic #7, woo!)

Wikipedia: Karplus-Strong String Synthesis

Princeton COS 126: Plucking a Guitar String

Shadertoy: Karplus-Strong Variation (Audio) – I tried to make a bufferless Karplus-Strong implementation on shadertoy. It didn’t quite work out but is still a bit interesting.

# Who Cares About Dynamic Array Growth Strategies?

Let’s say that you’ve allocated an array of 20 integers and have used them all. Now it’s time to allocate more, but you aren’t quite sure how many integers you will need in the end. What do you do?

Realloc is probably what you think of first for solving this problem, but let’s ignore that option for the moment. (If you haven’t used realloc before, give this a read! Alloca and Realloc – Useful Tools, Not Ancient Relics)

Without realloc you are left with allocating a new buffer of memory, copying the old buffer to the new buffer, and then freeing the old buffer.

The question remains though, how much memory should you allocate for this new, larger buffer?

You could double your current buffer size whenever you ran out of space. This would mean that as the buffer grew over time, you would do fewer allocations but would have more wasted (allocated but unused) memory.

You could also go the other way and just add 10 more int’s every time you ran out of space. This would mean that you would do a larger number of allocations (more CPU usage, possibly more fragmentation), but you’d end up with less wasted space.

Either way, it obviously depends entirely on usage patterns and it’s all subjective and situational.

…Or is it?

## A Surprising Reason For Caring

Believe it or not, growth strategies can make a huge difference. Below we will explore the difference between the seemingly arbitrary rules of making a buffer twice as big, or 1.5 times as big.

Let’s say that we have a bunch of free memory starting at address 0. Let’s analyze what happens as we resize arrays in each scenario.

2x Buffer Size

First let’s see what happens when we double a buffer’s size when it gets full.

We start by allocating 16 bytes. The allocator gives us address 0 for our pointer.

When the buffer gets full, we allocate 32 bytes (at address 16), copy the 16 bytes into it and then free our first 16 byte buffer.

When that buffer gets full, we allocate 64 bytes (at address 48), copy the 32 bytes into it and then free our 32 byte buffer.

Lastly, that buffer gets full, so we allocate 128 bytes (at address 112), copy the 64 bytes into it and then free our 64 byte buffer.

As you can see, doubling the buffer size causes our pointer to keep moving further down in address space, and a free piece of memory before it will never be large enough to hold a future allocation.

1.5x Buffer Size

Let’s see what happens when we make a buffer 1.5x as large when it gets full.

We start by allocating 16 bytes. The allocator gives us address 0 for our pointer.

When the buffer gets full, we allocate 24 bytes (at address 16), copy the 16 bytes into it and then free our first 16 byte buffer.

When that buffer gets full, we allocate 36 bytes (at address 40), copy the 24 bytes into it and free the 24 byte buffer.

When that buffer gets full, we allocate 54 bytes (at address 76), copy the 36 bytes into it and free the 36 byte buffer.

When that buffer gets full, we allocate 81 bytes (at address 130), copy the 54 bytes into it and free the 54 byte buffer.

Lastly, when that buffer gets full, we need to allocate 122 bytes (we rounded it up). In this case, there is 130 bytes of unused memory starting at address 0, so we can just allocate 122 of those bytes, copy our 81 bytes into it and free the 81 byte buffer.

Our allocations have folded back into themselves. Our pattern of resizing hasn’t created an ever moving / ever growing memory fragmentation monster, unlike the buffer size doubling, which has!

## Small Print

The above does decrease memory fragmentation, by encouraging an allocation to tend to stay in one spot in memory, but it comes at a cost. That cost is that since it’s allocating less extra memory when it runs out, that you will end up having to do more allocations to reach the same level of memory usage.

That might be a benefit though, depending on your specific needs. Another way of looking at that is that you will end up with fewer bytes of wasted memory. By wasted memory I mean allocated bytes which are not actually used to store anything.

## Realloc Makes This Moot Right?

You may be thinking “well if I use realloc, I don’t need to care about this right?”

That isn’t exactly true. If realloc is unable to give you more memory at the current pointer location, it will allocate a new buffer, copy the old data to the new buffer, free the old buffer, and return you the pointer to the new buffer. This is exactly the case that happens when you don’t use realloc.

Using the above growth strategy with realloc makes realloc work even better. It’s a good thing!

Caveat: exotic allocator behavior may not actually benefit from using this strategy with realloc, so have a look for yourself if you are in doubt!

Here’s a discussion on the topic:
What is the ideal growth rate for a dynamically allocated array?

From the link above, apparently the ideal factor to use when upsizing a buffer in general (when worrying about fragmentation like this), is the golden ratio 1.618. Weird, huh?

Thanks to Tom for mentioning this concept. Pretty interesting and surprising IMO.

# Shamir’s Quest: Collect Any 3 Keys To Unlock The Secret!

This post is on something called Shamir’s Secret Sharing. It’s a technique where you can break a secret number up into $M$ different pieces, where if you have any $N$ of those $M$ pieces, you are able to figure out the secret.

Thinking of it in video game terms, imagine there are 10 keys hidden in a level, but you can escape the level whenever you find any 7 of them. This is what Shamir’s Secret Sharing enables you to set up cryptographically.

Interestingly in this case, the term sharing in “secret sharing” doesn’t mean sharing the secret with others. It means breaking the secret up into pieces, or SHARES. Secret sharing means that you make shares out of a secret, such that if you have enough of the shares, you can recover the secret.

## How Do You Share (Split) The Secret?

The basic idea of how it works is actually really simple. This is good for us trying to learn the technique, but also good to show it’s security since there are so few moving parts.

It relies on something called the Unisolvence Theorem which is a fancy label meaning these things:

• If you have a linear equation, it takes two (x,y) points to uniquely identify that line. No matter how you write a linear equation, if it passes through those same two points, it’s mathematically equivelant.
• If you have a quadratic equation, it takes three (x,y) points to uniquely identify that quadratic curve. Again, no matter how you write a quadratic equation, if it passes through those same three points, it’s mathematically equivalent.
• The pattern continues for equations of any degree. Cubic equations require four points to be uniquely identified, Quartic equations require five points, and so on.

At a high level, how this technique works is that the number of shares (keys) you want someone to collect ($N$) defines the degree of an equation.

You use random numbers as the coefficients of the powers of $x$ in that equation, but use your secret number as the constant term.

You then create $M$ data points of the form $(x,y)$ aka $(x,f(x))$. Those are your shares. You then give individual shares to people, or go hide them in your dungeon or do whatever you are going to do with them.

As soon as any one person has $N$ of those $M$ shares (data points), they will be able to figure out the equation of the curve and thus get the secret.

The secret number is the constant term of the polynomial, which is also just $f(0)$.

This image below from wikipedia is great for seeing how you may have two points of a cubic curve, but without a third point you can’t be sure what the quadratic equation is. In fact, there are an infinite number of quadratic curves that pass through any two points! Because of that, it takes the full number of required shares for you to be able to unlock the secret.

## Example: Sharing (Splitting) The Secret

First you decide how many shares you want it to take to unlock the secret. This determines the degree of your equation.

Let’s say you wanted a person to have to have four shares to unlock the secret. This means our equation will be a cubic equation, since it takes four points to uniquely define a cubic equation.

Our equation is:

$f(x) = R_1x^3 + R_2x^2 + R_3x + S$

Where the $R_i$ values are random numbers, and $S$ is the secret value.

Let’s say that our secret value is 435, and that we picked some random numbers for the equation, making the below:

$f(x) = 28x^3 + 64x^2 + 9x + 435$

We now have a function that is uniquely identifiable by any 4 points of data on it’s curve.

Next we decide how many pieces we are going to create total. We need at least 4 so that it is in fact solvable. Let’s make 6 shares.

To do this, you just plug in 6 different values of x and pair each x value with it’s y value. Let’s do that:

$\begin{array}{c|c} x & f(x) \\ \hline 1 & 536 \\ 2 & 933 \\ 3 & 1794 \\ 4 & 3287 \\ 5 & 5580 \\ 6 & 8841 \\ \end{array}$

When doing this part, remember that the secret number is $f(0)$, so make sure and not share what the value of the function is when x is 0!

You could then distribute the shares (data pairs) as you saw fit. Maybe some people are more important, so you give them more than one share, requiring a smaller amount of cooperation with them to unlock the secret.

Share distribution details are totally up to you, but we now have our shares, whereby if you have any of the 4 of the 6 total shares, you can unlock the secret.

## How Do You Join The Secret?

Once you have the right number of shares and you know the degree of the polynomial (pre-shared “public” information), unlocking the secret is a pretty straightforward process too. To unlock the secret, you just need to use ANY method available for creating an equation of the correct degree from a set of data points.

This can be one of several different interpolation techniques, but the most common one to use seems to be Lagrange interpolation, which is something I previously wrote up that you can read about here: Lagrange Interpolation.

Once you have the equation, you can either evaluate $f(0)$, or you can write the equation in polynomial form and the constant term will be the secret value.

## Example: Joining the Secret

Let’s say that we have these four shares and are ready to get the cubic function and then unlock the secret number:

$\begin{array}{c|c} x & y \\ \hline 1 & 536 \\ 2 & 933 \\ 4 & 3287 \\ 6 & 8841 \\ \end{array}$

We could bust out some Lagrange interpolation and figure this out, but let’s be lazy… err efficient I mean. Wolfram alpha can do this for us!

Wolfram Alpha: cubic fit (1, 536), (2, 933), (4, 3287), (6, 8841)

That gives us this equation, saying that it is a perfect fit (which it is!)
$28x^3 + 64x^2 + 9x + 435$

You can see that our constant term (and $f(0)$) is the correct secret value of 435.

Daaaayummm Bru… that is lit AF! We just got hacked by wolfram alpha 😛

## A Small Complication

Unfortunately, the above has a weakness. The weakness is that each share you get gives you a little bit more information about the secret value. You can read more about this in the links section at the end if you want to know more details.

Ideally, you wouldn’t have any information about the secret value until you had the full number of shares required to unlock the secret.

To address this problem, we are going to choose some prime number $k$ and instead of shares being $(x,y)$ data points on the curve, they are going to be $(x,y \bmod k)$. In technical terms we are going to be using points on a finite field, or a Galois field.

The value we choose for $k$ needs to be larger than any of the coefficients of our terms (the random numbers) as well as larger than our secret value and larger than the number of shares we want to create. The larger the better besides that, because a larger $k$ value means a larger “brute force” space to search.

If you want to use this technique in a situation which has real needs for security, please make sure and read more on this technique from more authoritative sources. I’m glossing over the details of security quite a bit, and just trying to give an intuitive understanding of this technique (:

## Source Code

Below is some sample source code that implements Shamir’s Secret Sharing in C++.

I use 64 bit integers, but if you were going to be using this in a realistic situation you could very well overflow 64 bit ints and get the wrong answers. I hit this problem for instance when trying to require more than about 10 shares, using a prime of 257, and generating 50 shares. If you hit the limit of 64 bit ints you can use a multi precision math library instead to have virtually unlimited sized ints. The boost multiprecision header library is a decent choice for multi precision integers, specifically cpp_int.

#include <stdio.h>
#include <array>
#include <vector>
#include <math.h>
#include <random>
#include <assert.h>
#include <stdint.h>
#include <inttypes.h>

typedef int64_t TINT;
typedef std::array<TINT, 2> TShare;
typedef std::vector<TShare> TShares;

class CShamirSecretSharing
{
public:
CShamirSecretSharing (size_t sharesNeeded, TINT prime)
: c_sharesNeeded(sharesNeeded), c_prime(prime)
{
// There needs to be at least 1 share needed
assert(sharesNeeded > 0);
}

// Generate N shares for a secretNumber
TShares GenerateShares (TINT secretNumber, TINT numShares) const
{
// calculate our curve coefficients
std::vector<TINT> coefficients;
{
// store the secret number as the first coefficient;
coefficients.resize((size_t)c_sharesNeeded);
coefficients[0] = secretNumber;

// randomize the rest of the coefficients
std::array<int, std::mt19937::state_size> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 gen(seq);
std::uniform_int_distribution<TINT> dis(1, c_prime - 1);
for (TINT i = 1; i < c_sharesNeeded; ++i)
coefficients[(size_t)i] = dis(gen);
}

// generate the shares
TShares shares;
shares.resize((size_t)numShares);
for (size_t i = 0; i < numShares; ++i)
shares[i] = GenerateShare(i + 1, coefficients);
return shares;
}

// use lagrange polynomials to find f(0) of the curve, which is the secret number
TINT JoinShares (const TShares& shares) const
{
// make sure there is at elast the minimum number of shares
assert(shares.size() >= size_t(c_sharesNeeded));

// Sigma summation loop
TINT sum = 0;
for (TINT j = 0; j < c_sharesNeeded; ++j)
{
TINT y_j = shares[(size_t)j][1];

TINT numerator = 1;
TINT denominator = 1;

// Pi product loop
for (TINT m = 0; m < c_sharesNeeded; ++m)
{
if (m == j)
continue;

numerator = (numerator * shares[(size_t)m][0]) % c_prime;
denominator = (denominator * (shares[(size_t)m][0] - shares[(size_t)j][0])) % c_prime;
}

sum = (c_prime + sum + y_j * numerator * modInverse(denominator, c_prime)) % c_prime;
}
return sum;
}

const TINT GetPrime () const { return c_prime; }
const TINT GetSharesNeeded () const { return c_sharesNeeded; }

private:

// Generate a single share in the form of (x, f(x))
TShare GenerateShare (TINT x, const std::vector<TINT>& coefficients) const
{
TINT xpow = x;
TINT y = coefficients[0];
for (TINT i = 1; i < c_sharesNeeded; ++i) {
y += coefficients[(size_t)i] * xpow;
xpow *= x;
}
return{ x, y % c_prime };
}

// Gives the decomposition of the gcd of a and b.  Returns [x,y,z] such that x = gcd(a,b) and y*a + z*b = x
static const std::array<TINT, 3> gcdD (TINT a, TINT b) {
if (b == 0)
return{ a, 1, 0 };

const TINT n = a / b;
const TINT c = a % b;
const std::array<TINT, 3> r = gcdD(b, c);

return{ r[0], r[2], r[1] - r[2] * n };
}

// Gives the multiplicative inverse of k mod prime.  In other words (k * modInverse(k)) % prime = 1 for all prime > k >= 1
static TINT modInverse (TINT k, TINT prime) {
k = k % prime;
TINT r = (k < 0) ? -gcdD(prime, -k)[2] : gcdD(prime, k)[2];
return (prime + r) % prime;
}

private:

// Publically known information
const TINT          c_prime;
const TINT          c_sharesNeeded;
};

void WaitForEnter ()
{
printf("Press Enter to quit");
fflush(stdin);
getchar();
}

int main (int argc, char **argv)
{
// Parameters
const TINT c_secretNumber = 435;
const TINT c_sharesNeeded = 7;
const TINT c_prime = 439;   // must be a prime number larger than the other three numbers above

// set up a secret sharing object with the public information
CShamirSecretSharing secretSharer(c_sharesNeeded, c_prime);

// split a secret value into multiple shares

// shuffle the shares, so it's random which ones are used to join
std::array<int, std::mt19937::state_size> seed_data;
std::random_device r;
std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
std::mt19937 gen(seq);
std::shuffle(shares.begin(), shares.end(), gen);

// join the shares
TINT joinedSecret = secretSharer.JoinShares(shares);

// show the public information and the secrets being joined
printf("%" PRId64 " shares needed, %i shares maden", secretSharer.GetSharesNeeded(), shares.size());
printf("Prime = %" PRId64 "nn", secretSharer.GetPrime());
for (TINT i = 0, c = secretSharer.GetSharesNeeded(); i < c; ++i)
printf("Share %" PRId64 " = (%" PRId64 ", %" PRId64 ")n", i+1, shares[i][0], shares[i][1]);

// show the result
printf("nJoined Secret = %" PRId64 "nActual Secret = %" PRId64 "nn", joinedSecret, c_secretNumber);
assert(joinedSecret == c_secretNumber);
WaitForEnter();
return 0;
}


## Example Output

Here is some example output of the program:

Wikipedia: Shamir’s Secret Sharing (Note: for some reason the example javascript implementation here only worked for odd numbered keys required)
Wikipedia: Finite Field
Cryptography.wikia.com: Shamir’s Secret Sharing
Java Implementation of Shamir’s Secret Sharing (Note: I don’t think this implementation is correct, and neither is the one that someone posted to correct them!)

When writing this post I wondered if maybe you could use the coefficients of the other terms as secrets as well. These two links talk about the details of that:
Cryptography Stack Exchange: Why only one secret value with Shamir’s secret sharing?
Cryptography Stack Exchange: Coefficients in Shamir’s Secret Sharing Scheme

Now that you understand this, you are probably ready to start reading up on elliptic curve cryptography. Give this link below a read if you are interested in a gentle introduction on that!
A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography

# Turning a Truth Table Into A digital Circuit (ANF)

In this post I’m going to show how you turn a truth table into a digital logic circuit that uses XOR and AND gates.

## My Usage Case

My specific usage case for this is in my investigations into homomorphic encryption, which as you may recall is able to perform computation on encrypted data. This lets encrypted data be operated on by an untrusted source, given back to you, and then you can decrypt your data to get a result.

Lots of use cases if this can ever get fast enough to become practical, such as doing cloud computing with private data. However, when doing homomorphic encryption (at least currently, for the techniques I’m using), you only have XOR and AND logic operations.

So, I’m using the information in this post to be able to turn a lookup table, or a specific boolean function, into a logic circuit that I can feed into a homomorphic encryption based digital circuit.

Essentially I want to figure out how to do a homomorphic table lookup to try and make some simple as possible circuits, that will in turn be as fast and lean as possible.

If you want to know more about homomorphic encryption, here’s a post I wrote which explains a very simple algorithm: Super Simple Symmetric Leveled Homomorphic Encryption Implementation

## Algebraic Normal Form

Algebraic normal form (ANF) is a way of writing a boolean function using only XOR and AND.

Since it’s a normal form, two functions that do the same thing will be the same thing in ANF.

There are other forms for writing boolean logic, but ANF suits me best for my homomorphic encryption circuit needs!

An example of boolean logic in ANF is the below:

$f(x_1, x_2, x_3, x_4) = x_1 x_2 \oplus x_1 x_3 \oplus x_1 x_4$

It is essentially a boolean polynomial, where AND is like multiplication, and XOR is like addition. It even factors the same way. In fact, ANF is not always the smallest circuit possible, you’d have to factor common ANDs to find the smallest way you could represent the circuit, like the below:

$f(x_1, x_2, x_3, x_4) = x_1 (x_2 \oplus x_3 \oplus x_4)$

That smaller form does 1 AND and 2 XORs, versus the ANF which does 3 ANDs and 2 XORs. In homomorphic encryption, since AND is so much more costly than XOR, minimizing the ANDs is a very nice win, and worth the effort.

## Truth Tables and Lookup Tables

A truth table is just where you specify the inputs into a boolean function and the output of that boolean function for the given input:

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array}$

A lookup table is similar in functionality, except that it has multi bit output. When dealing with digital circuits, you can make a lookup table by making a truth table per output bit. For instance, the above truth table might just be the low bit of the lookup table below, which is just a truth table for addition of the input bits.

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 00 \\ 0 & 0 & 1 & 01 \\ 0 & 1 & 0 & 01 \\ 0 & 1 & 1 & 10 \\ 1 & 0 & 0 & 01 \\ 1 & 0 & 1 & 10 \\ 1 & 1 & 0 & 10 \\ 1 & 1 & 1 & 11 \\ \end{array}$

## Converting Truth Table to ANF

When I first saw the explanation for converting a truth table to ANF, it looked pretty complicated, but luckily it turns out to be pretty easy.

The basic idea is that you make a term for each possible combination of x inputs, ANDing a term by each constant, and then solving for those constants.

Let’s use the truth table from the last section:

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array}$

For three inputs, the starting equation looks like this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3 \\ \oplus a_{123} x_1 x_2 x_3$

Now we have to solve for the a values.

To solve for $a_{123}$, we just look in the truth table for function $f(x_1, x_2, x_3)$ to see if we have an odd or even number of ones in the output of the function. If there is an even number, it is 0, else it is a 1.

Since we have an even number of ones, the value is 0, so our equation becomes this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3 \\ \oplus 0 \land x_1 x_2 x_3$

Note that $\land$ is the symbol for AND. I’m showing it explicitly because otherwise the equation looks weird, and a multiplication symbol isn’t correct.

Since 0 ANDed with anything else is 0, and also since n XOR 0 = n, that whole last term disappears, leaving us with this equation:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3$

Next up, to solve for $a_{12}$, we need to limit our truth table to $f(x_1, x_2, 0)$. That truth table is below, made from the original truth table, but throwing out any row where $x_{3}$ is 1.

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, 0) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 \\ \end{array}$

We again just look at whether there are an odd or even number of ones in the function output, and use that to set $a_{12}$ appropriately. In this case, there are an even number, so we set it to 0, which makes that term disappear again. Our function is now down to this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3$

If we look at $f(x_1,0,x_3)$, we find that it also has an even number of ones, making $a_{13}$ become 0 and making that term disappear.

Looking at $f(0,x_2,x_3)$, it also has an even number of ones, making $a_{23}$ become 0 and making that term disappear as well.

That leaves us with this equation:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3$

To solve for $a_1$, we look at the truth table for $f(x_1,0,0)$, which is below:

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, 0, 0) \\ \hline 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ \end{array}$

There are an odd number of ones in the output, so $a_1$ becomes 1. Finally, we get to keep a term! The equation is below:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus 1 \land x_1 \oplus a_2 x_2 \oplus a_3 x_3$

Since 1 AND n = n, we can drop the explicit 1 to become this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus x_1 \oplus a_2 x_2 \oplus a_3 x_3$

If you do the same process for $a_2$ and $a_3$, you’ll find that they also have odd numbers of ones in the output so also become ones. That puts our equation at:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus x_1 \oplus x_2 \oplus x_3$

Solving for $a_0$, is just looking at whether there are an odd or even number of ones in the function $f(0,0,0)$ which you can look up directly in the lookup table. It’s even, so $a_0$ becomes 0, which makes our full final equation into this:

$f(x_1, x_2, x_3) = x_1 \oplus x_2 \oplus x_3$

We are done! This truth table can be implemented with 3 XORs and 0 ANDs. A pretty efficient operation!

You can see this is true if you work it out with the truth table. Try it out and see!

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array}$

## Sample Code

Here is some sample code that lets you define a lookup table by implementing an integer function, and it generates the ANF for each output bit for the truth table. It also verifies that the ANF gives the correct answer. It shows you how to use this to make various circuits: bit count, addition, multiplication, division and modulus.

#include <stdio.h>
#include <array>
#include <vector>

#define PRINT_TRUTHTABLES() 0
#define PRINT_NUMOPS() 1
#define PRINT_ANF() 1

void WaitForEnter ()
{
printf("Press Enter to quit");
fflush(stdin);
getchar();
}

template <size_t NUM_INPUT_BITS>
{
for (size_t i = 0; i < NUM_INPUT_BITS; ++i)
{
const size_t bitMask = 1 << i;
return false;
}
return true;
}

template <size_t NUM_INPUT_BITS>
bool ANFHasTerm (const std::array<size_t, 1 << NUM_INPUT_BITS> &lookupTable, size_t outputBitIndex, size_t termMask)
{
const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;

int onesCount = 0;
for (size_t i = 0; i < c_inputValueCount; ++i)
{
onesCount++;
}

return (onesCount & 1) != 0;
}

template <size_t NUM_INPUT_BITS>
void MakeANFTruthTable (const std::array<size_t, 1 << NUM_INPUT_BITS> &lookupTable, std::array<size_t, 1 << NUM_INPUT_BITS> &reconstructedLookupTable, size_t outputBitIndex)
{
const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;
printf("-----Output Bit %u-----rn", outputBitIndex);

// print truth table if we should
#if PRINT_TRUTHTABLES()
for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
printf("  [%u] = %urn", inputValue, ((lookupTable[inputValue] >> outputBitIndex) & 1) ? 1 : 0);
printf("rn");
#endif

// find each ANF term
std::vector<size_t> terms;
{
}

// print function params
#if PRINT_ANF()
printf("f(");
for (size_t i = 0; i < NUM_INPUT_BITS; ++i)
{
if (i > 0)
printf(",");
printf("x%i",i+1);
}
printf(") = rn");
#endif

// print ANF and count XORs and ANDs
size_t numXor = 0;
size_t numAnd = 0;
if (terms.size() == 0)
{
#if PRINT_ANF()
printf("0rn");
#endif
}
else
{
for (size_t termIndex = 0, termCount = terms.size(); termIndex < termCount; ++termIndex)
{
if (termIndex > 0) {
#if PRINT_ANF()
printf("XOR ");
#endif
++numXor;
}

size_t term = terms[termIndex];
if (term == 0)
{
#if PRINT_ANF()
printf("1");
#endif
}
else
{
bool firstProduct = true;
for (size_t bitIndex = 0; bitIndex < NUM_INPUT_BITS; ++bitIndex)
{
const size_t bitMask = 1 << bitIndex;
if ((term & bitMask) != 0)
{
#if PRINT_ANF()
printf("x%i ", bitIndex + 1);
#endif
if (firstProduct)
firstProduct = false;
else
++numAnd;
}
}
}
#if PRINT_ANF()
printf("rn");
#endif
}
}
#if PRINT_ANF()
printf("rn");
#endif

#if PRINT_NUMOPS()
printf("%u XORs, %u ANDsrnrn", numXor, numAnd);
#endif

// reconstruct a bit of the reconstructedLookupTable for each entry to be able to verify correctness
const size_t c_outputBitMask = 1 << outputBitIndex;
for (size_t valueIndex = 0; valueIndex < c_inputValueCount; ++valueIndex)
{
bool xorSum = false;
for (size_t termIndex = 0, termCount = terms.size(); termIndex < termCount; ++termIndex)
{
size_t term = terms[termIndex];
if (term == 0)
{
xorSum = 1 ^ xorSum;
}
else
{
bool andProduct = true;
for (size_t bitIndex = 0; bitIndex < NUM_INPUT_BITS; ++bitIndex)
{
const size_t bitMask = 1 << bitIndex;
if ((term & bitMask) != 0)
{
if ((valueIndex & bitMask) == 0)
andProduct = false;
}
}
xorSum = andProduct ^ xorSum;
}
}
if (xorSum)
}
}

template <size_t NUM_INPUT_BITS, size_t NUM_OUTPUT_BITS, typename LAMBDA>
void MakeANFLookupTable (const LAMBDA& lambda)
{
// make lookup table
const size_t c_outputValueMask = (1 << NUM_OUTPUT_BITS) - 1;
const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;
std::array<size_t, c_inputValueCount> lookupTable;
for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
lookupTable[inputValue] = lambda(inputValue, NUM_INPUT_BITS, NUM_OUTPUT_BITS) & c_outputValueMask;

// make the anf for each truth table (each output bit of the lookup table)
std::array<size_t, c_inputValueCount> reconstructedLookupTable;
std::fill(reconstructedLookupTable.begin(), reconstructedLookupTable.end(), 0);
for (size_t outputBitIndex = 0; outputBitIndex < NUM_OUTPUT_BITS; ++outputBitIndex)
MakeANFTruthTable<NUM_INPUT_BITS>(lookupTable, reconstructedLookupTable, outputBitIndex);

// verify that our anf expressions perfectly re-create the lookup table
for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
{
if (lookupTable[inputValue] != reconstructedLookupTable[inputValue])
printf("ERROR: expression / lookup mismatch for index %urn", inputValue);
}
printf("expression / lookup verification complete.rnrn");
}

size_t CountBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
// Count how many bits there are
int result = 0;
while (inputValue)
{
if (inputValue & 1)
result++;
inputValue = inputValue >> 1;
}
return result;
}

size_t AddBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
// break the input bits in half and add them
const size_t bitsA = numInputBits / 2;
const size_t mask = (1 << bitsA) - 1;

size_t a = inputValue & mask;
size_t b = inputValue >> bitsA;

return a+b;
}

size_t MultiplyBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
// break the input bits in half and add them
const size_t bitsA = numInputBits / 2;
const size_t mask = (1 << bitsA) - 1;

size_t a = inputValue & mask;
size_t b = inputValue >> bitsA;

return a * b;
}

size_t DivideBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
// break the input bits in half and add them
const size_t bitsA = numInputBits / 2;
const size_t mask = (1 << bitsA) - 1;

size_t a = inputValue & mask;
size_t b = inputValue >> bitsA;

// workaround for divide by zero
if (b == 0)
return 0;

return a / b;
}

size_t ModulusBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
// break the input bits in half and add them
const size_t bitsA = numInputBits / 2;
const size_t mask = (1 << bitsA) - 1;

size_t a = inputValue & mask;
size_t b = inputValue >> bitsA;

// workaround for divide by zero
if (b == 0)
return 0;

return a % b;
}

int main (int argc, char **argv)
{
//MakeANFLookupTable<3, 2>(CountBits);    // Output bits needs to be enough to store the number "input bits"
//MakeANFLookupTable<4, 3>(AddBits);      // Output bits needs to be (InputBits / 2)+1
//MakeANFLookupTable<4, 4>(MultiplyBits); // Output bits needs to be same as input bits
//MakeANFLookupTable<4, 2>(DivideBits);   // Output bits needs to be half of input bits (rounded down)
//MakeANFLookupTable<4, 2>(ModulusBits);  // Output bits needs to be half of input bits (rounded down)
//MakeANFLookupTable<10, 5>(DivideBits);  // 5 bit vs 5 bit division is amazingly complex!
MakeANFLookupTable<4, 2>(ModulusBits);  // Output bits needs to be half of input bits (rounded down)
WaitForEnter();
return 0;
}


## Sample Code Runs

Here is the program output for a “bit count” circuit. It counts the number of bits that are 1, in the 3 bit input, and outputs the answer as 2 bit output. Note that the bit 0 output is the same functionality as the example we worked through by hand, and you can see that it comes up with the same answer.

Here is the program output for an adder circuit. It adds two 2 bit numbers, and outputs a 3 bit output.

Here is the program output for a multiplication circuit. It multiplies two 2 bit numbers, and outputs a 4 bit number.

Here is the program output for a division circuit. It divides a 2 bit number by another 2 bit number and outputs a 2 bit number. When higher bit counts are involved, the division circuit gets super complicated, it’s really crazy! 5 bit divided by 5 bit is several pages of output for instance. Note that it returns 0 whenever it would divide by 0.

Lastly, here is the program output for a modulus circuit. It divides a 2 bit number by another 2 bit number and outputs the remainder as a 2 bit number.

While the above shows you how to turn a single bit truth table into ANF, extending this to a multi bit lookup table is super simple; you just do the same process for each output bit in the lookup table.

Finding Boolean/Logical Expressions for truth tables in algebraic normal form(ANF)

Finding Boolean/Logical Expressions for truth tables

# Game Development Needs Data Pipeline Middleware

In 15 years I’ve worked at 7 different game studios, ranging from small to large, working on many different kinds of projects in a variety of roles.

At almost every studio, there was some way for the game to load data at runtime that controlled how it behaved – such as the damage a weapon would do or the cost of an item upgrade.

The studios that didn’t have this setup could definitely have benefited from having it. After all, this is how game designers do their job!

Sometimes though, this data was maintained via excel spreadsheets (export as csv for instance and have the game read that). That is nearly the worst case scenario for data management. Better though is to have an editor which can edit that data, preferably able to edit data described by schemas, which the game also uses to generate code to load that data.

Each studio I’ve worked at that did have game data each had their own solution for their data pipeline, and while they are all of varying qualities, I have yet to see something that is both fast and has most of the features you’d reasonably want or expect.

We really need some middleware to tackle this “solved problem” and offer it to us at a reasonable price so we can stop dealing with it. Open sourced would be fine too. Everyone from engineers to production to content people will be much happier and more productive!

# Required Features

Here are the features I believe are required to satisfy most folks:

1. Be able to define the structure of your data in some format (define data schema).
2. Have an editor that is able to quickly launch, quickly load up data in the data schema and allow a nice interface to editing the data as well as searching the data.
3. This edited data should be in some format that merges well (for dealing with branching), and preferably is standardized so you can use common tools on the data – such as XSLT if storing data as xml. XML isn’t commonly very mergable so not sure the solution there other than perhaps a custom merge utility perhaps?
4. The “data solution” / project file should store your preferences about how you want the final data format to be: xml, json, binary, other? Checkboxes for compression and encryption, etc. Switching the data format should take moments.
5. There should be a cooking process that can be run from the data editor or via command line which transforms the edited data into whatever format the destination data should be in. AKA turn the human friendly XML into machine friendly binary files which you load in with a single read and then do pointer fixup on.
6. This pipeline should generate the code that loads and interacts with the data as described in the data schema. For instance you say “load my data” and it does all the decompression, decryption, parsing, etc giving you back a root data structure which contains compile time defined strongly typed structures. This is important because when you change the format of the data that the game uses, no game code actually has to know or care. Whatever it takes to load your data happens when you call the function.

# Bonus Points

Here are some bonus point features that would be great to have:

1. Handle live editing of data. When the game and editor is both open, and data is edited, have it change the data on the game side in real time, and perhaps allow a callback to be intercepted in case the game needs to clear out any cached values or anything. This helps iteration time by letting people make data changes without having to relaunch the game. Also needs to be able to connect to a game over tcp/ip and handle endian correction as needed as well as 32 vs 64 bit processes using the same data.
2. Handle the usual problems associated with DLC and versioning in an intelligent way. Many data systems that support DLC / Patching / Schema Updates post ship have strange rules about what data you can and can’t change. Often times if you get it wrong, you make a bug that isn’t always obvious. If support for this was built in, and people didnt have to concern themselves with it, it’d be great.
3. On some development environments, data must be both forwards and backwards compatible. Handling that under the covers in an intelligent way would be awesome.
4. The editor should be extensible with custom types and plugins for visualizations of data, as well as interactive editing of data. This same code path could be used to integrate parts of the game engine with the editor for instance (slippery slope to making the editor slow, however).
5. Being able to craft custom curves, and being able to query them simply and efficiently from the game side at runtime would be awesome.
6. Support “cook time computations”. The data the user works with isn’t always set up the way that would be best for the machine. It’d be great to be able to do custom calculations and computations at runtime. Also great for building acceleration data structures.
7. You should be able to run queries against the data or custom scripts. To answer questions like “Is anyone using this feature?” and “I need to export data from our game into a format that this other program can read”
8. Being able to export data as c++ literal data structures, for people who want to embed (at least some of) their data in the exe to reduce complexity, loading times, etc.

It should also be as fast and lightweight as possible. It should allow games to specify memory and file i/o overrides.

Localized text is also a “solved problem” that needs an available solution. It could perhaps be rolled into this, or maybe it would make most sense for it to be separate.

As another example of how having something like this would be useful, on multiple occasions at previous studios, people have suggested we change the format of the data that the game uses at runtime. For instance, from json to a binary format. In each case this has come up so far, people have said it would take too long and it got backlogged (ie killed). With data pipeline middleware that works as i describe it, you would click a few checkboxes, recook your data and test it to have your runtime results. That’s as hard as it SHOULD be, but in practice it’s much harder because everyone rolls their own and the cobbler never has time to fix his shoes (;

Anyone out there want to make this happen? (:

# Using Wang Tiles to Simulate Turing Machines

Wang tiles were invented by Hao Wang in 1961 for mathematical reasons, but they find great use in games for making tile based art which gives results that don’t look tiled – both with 2d tiled textures, as well as 3d tiled models.

Apparently Wang tiles are also able to execute Turing machines, and so are thus Turing complete – meaning they can execute any program.

That is a pretty amazing and perplexing statement, so this post explores that a bit.

## Wang Tiles Overview

Wang tiles are rectangular tiles where each edge will only fit with other specific edges, but that for any specific edge, there is more than one possible tile that can fit with that edge. By fit with that edge, I mean they are seamless when put together, without any visual artifacts to hint at there actually being a seam between the tiles.

This is useful for graphics because this lets you have seamless tiled graphics, but the specific configuration of how the tiles are placed can be completely randomized, so long as their edges are all compatible. The result is tiled graphics that doesn’t look at all tiled, due to visible patterns being much less noticeable than with traditional tiled graphics.

Here is an example I made. The graphics are programmer art but hopefully you get the idea. This was made with 16 tiles, where there were two different edge types per edge.

## Turing Machine Overview

Turing machines were invented in 1936 by Alan Turing as a generic computing machine that was proven to be able to execute any algorithm.

The turing machine is made up of a few main components: the memory tape, the read/write head, and the state machine.

The memory tape is infinitely long, so has infinite storage, and is initialized to all zeroes to start out.

The read/write head starts at a position on the tape, and can read or write values, and also move left or right on the tape.

The state machine knows what state it is in and has rules about what to do in each state when it reads a value from the tape.

For instance, in state A, if a 0 is read from the tape, the rule may be to write a 1 to the current position on the tape, move the read/write head to the right, and go to state B. State B may have completely different logic, and could either transition back to state A, state in state B, or move to another state entirely.

Using simple state transition logic like that, any computer algorithm can be performed.

In a Turing machine there can also be a “Halt State” which means the program is finished executing and the answer it was trying to calculate has been calculated.

Looking at some programs, you can easily see that they will halt eventually, or that they will be an infinite loop and never halt. Some programs in-between are complex and can’t very easily be determined if they will ever halt or not. Turing proved that there is no general solution to whether a Turing machine (aka any computer program) will halt or not, and this is called the Halting Problem. In general, the only way to know if a program will halt or not is to wait and see. So, effectively the answers to whether a program in general will halt or not are “yes” and “not yet” – although for many specific programs you can in fact see that they will halt eventually if you were to run them.

## Wang Tile Computation

It turns out that Wang tiles can simulate a Turing machine, and so are “Turing complete” meaning that they too can perform any computer algorithm.

To make this happen, we’ll make a column of tiles that represent the state of the Turing machine at a specific point in time, starting with time 0 at the left most column. We’ll place tiles in the column to the right making sure all edge rules are respected, and then do the column to the right of that one etc until the program halts (or forever if it doesn’t halt). If we set up our set of tiles correctly, the act of satisfying the edge rules as we place our tiles is enough to execute the Turing machine.

Let’s walk through a simple example where we have the following state machine logic rules:

1. When in state A, if a 0 is read, we will write a 1, move the read/write head down and move to state B.
2. When in state A, if a 1 is read, we will halt (enter the halt state).
3. When in state B, if a 0 is read, we will write a 1, move the read/write head up and move to state A.
4. When in state B, if a 1 is read, we will halt (enter the halt state).

### Tape Memory Storage

The first thing we need is persistent storage of memory for the tape. For this, we’ll need the following two tiles:

To see this working, we can set up a section of tape with some values (make a column of wang tiles), and we can see that the only valid wang tiles to place next to the starting column are tiles which propogate the 0 and the 1 values forward in time without modifying them.

In the diagram below, we initialize the tape to 0101 in the left most column (time 0). By only placing down tiles with compatible edges you can see that our memory values persist forever. Our memory storage is implemented, huzzah!

We’ll start our example with all memory initialized to 0, but the above shows that we can have persistent memory.

The read/write head of the Turing machine is represented as part of the edge information. In this way, besides an edge storing the 0 or 1, if that is where the read/write head is, it also stores the state of the state machine.

Our example uses two states (besides the halt state): A and B. If a 1 is read in while being in either state A or B, the program halts.

To handle that, we need the tiles below:

Now that we have the rules for entering the halt state done (rule #2 and rule #4), we have to figure out how to implement the rules that control switching from one state to another (rule #1 and rule #3).

Rule #1 says that if we are in state A and read a 0, we should write a 1, move the read/write head down and move to state B.

We’ll need this tile to cause reading a 0 in state A to write a 1 as output, and to tell the tile below to move to state B.

The tile below that one could either be a 0 or a 1, and without knowing which, we want it to keep it’s value but accept the read/write head and be in state B. To do that we need two tiles, one for if there is a 0 on the tape at that position, and another for if there is a 1 on the tape.

Rule #3 says that if we are in state B and read a 0, we should write a 1, move the read/write head up and move to state A.

To do that, we’ll need a similar construction as for rule #1 but we are moving up instead of down. These 3 tiles will give us what we need:

## Starting Column Tiles

We are going to treat the boundaries of our simulation area as if they have edges of “x”.

This means that to make our starting column (the Turing machine at time 0), we are going to need 2 special tiles. One tile will be for storing a 0 on the tape, which is what the tape is initialized to, and the other tile will be for storing the position of the read/write head in state A, which is our starting state.

Here are those two tiles:

## Final Tileset

Here’s the full set of 12 tiles that we are going to use:

## Full Simulation

Here is the initial setup at time 0 for our Turing machine. Note that this is one possible starting state, but this is the starting state we are choosing. We are not leaving it up to chance where the read/write head starts, or if it is even present at all. If we only followed edge rules we may get 4 read/write heads or 0, or anything in between.

From here, to build the second column, we start from the top and work towards the bottom, choosing the tile that fits the constraints of the edge it touches. In this first step, the head reads a 0, writes a 1, moves down, and moves to state B.

Heres is the second step, where the read reads a 0, writes a 1, moves up, and moves to state A.

Here is the final step, where the head reads a 1 and enters the halt state, signifying that the program has terminated.

The program halted, and gave an output value of “0110” or 6. This output isn’t really meaningful but other programs can give output that is meaningful. For instance you could have a Turing machine add two numbers, and the output would be the sum of those two numbers.

## An Important Detail

There is an important detail that the above doesn’t address, and that many explanations of Wang tile Turing machines don’t seem to talk about.

When placing the second tile for time 2, the only constraint from the edges is that the tile must have an x on top and a 1 on the left. This actually makes it ambiguous which tile should be chosen between the two tiles below.

How do we choose the right one then?

The answer is that you make a guess and just choose one. If the wrong one was chosen in this case, when we moved to the next tile, we’d be looking for a tile which had an x on top and a B0 on the left. There is no such tile so you’d be unable to place a tile. When this happened, you’d take a step back to the last tile, and try one of the other possibilities.

So, unfortunately there is some literal trial and error involved when simulating Turing machines with Wang tiles, but it is fairly manageable at least. It definitely makes it a bit more complex to calculate in a pixel shader if you were so inclined (or other massively parallel processing units), but it shouldn’t be that much more costly.

Some of the links below talk about Wang tiles and Turing machines, but don’t seem to strictly be Turing machines anymore. For instance, you might notice that some examples allow data to travel “back in time” where after the program halts, the answer is in the tape at time 0 of the Turing machine, even though that data wasn’t actually there at time 0. This shows that Wang tiles can do computation in their own right, beyond simulating Turing machines, but I’m not really sure what that technique itself would be called.

Also if you are wondering if this is useful to do computation with Wang tiles, I’m not really sure of any practical usage cases myself. However, apparently scientists have found that DNA can act much like Wang tiles act, where they will fit together only if edges are compatible. Because of this, there is ongoing research into DNA based computation that is based on the work of Wang tile computation. pretty interesting stuff!

Here is a shadertoy implementation of wang tiles computing prime numbers in a webgl pixel shader:

Here are some great videos on Turing machines and the halting problem:
Turing Machines Explained – Computerphile
Turing & The Halting Problem – Computerphile