# DIY Synth: Convolution Reverb & 1D Discrete Convolution of Audio Samples

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

If there is a physical location that has the reverb you are looking for (a church, a cave, your bathroom, the subway, whatever), you can go there and record the sound of some short, loud sound (a clap, a starter pistol, whatever). Now that you have this sound file, you can use a mathematical technique called “Discrete Convolution” (not as difficult as it sounds) to apply that reverb to other sounds.

Not only that, you can use convolution to apply various properties of one sound to another sound. Like you could convolve a saw wave and a recording of a voice and make it sound robotic.

This is pretty cool right?

## Convolution

There are two types of convolution out there… continuous convolution and discrete convolution.

Continuous convolution is a method where you can take two formulas and perform some operations (algebra, calculus, etc) to get a third formula.

In the cases that us programmers usually work with, however, you usually only have data, and not a mathematical formula describing that data.

When you have specific data points instead of an equation, you can use something called discrete convolution instead.

Discrete convolution is a method where you take two streams of data (IE two arrays of data!) and perform some operations (only looping, addition and multiplication!) to get a third stream of data.

Interestingly, this third stream of data will be the size of the first stream of data, plus the size of the second stream of data, minus one.

This link explains discrete convolution much better than I could, and even has an interactive demo, so check it out: Discrete Convolution

Here is some pseudo code that shows how discrete convolution works.

```// We are calculating bufferA * bufferB (discrete convolution)
// initialize out buffer to empty, and make sure it's the right size
outBuffer[sizeof(bufferA)+sizeof(bufferB)-1] = 0

// reverse one of the buffers in preperation of the convolution
reverse(bufferB);

// Convolve!
for (int outBufferIndex = 0; outBufferIndex < sizeof(outBuffer); ++outBufferIndex)
{
bufferAIndex = 0;
bufferBIndex = 0;
if (outBufferIndex < sizeof(bufferB))
bufferBIndex = sizeof(bufferB) - outBufferIndex - 1;
else
bufferAIndex = outBufferIndex - sizeof(bufferB);

for (; bufferAIndex < sizeof(bufferA) && bufferBIndex < sizeof(bufferB); ++bufferAIndex, ++bufferBIndex)
{
outBuffer&#91;outBufferIndex&#93; += bufferA&#91;bufferAIndex&#93; * bufferB&#91;bufferBIndex&#93;;
}
}
&#91;/cpp&#93;

That's all there is to it!

<h2>Convolution Reverb</h2>

Like I mentioned above, to do convolution reverb you just convolve the audio file you want to add reverb to with the recorded sound from the place that has the reverb you want.  That second sound is called the Impulse Response, or IR for short.

Besides letting you choose a reverb model (like "cavernous" or "small hallway" or "bright tiled room"), many reverbs will have a "wetness" parameter that controls how much reverb you hear versus how much of the origional noise you hear.  A wetness of 1.0 means all you hear is the reverb (the convolution), while a wetness of 0.0 means all you hear is the origional sound (the dry sound).  It's just a percentage and literally is just used to lerp between the two signals.

Here is an example of how changes in wetness can sound, and also give you a first glimpse of just how good this technique works!

Convolved with 100% wetness: <a href="http://blog.demofox.org/wp-content/uploads/2015/03/c_RL_SBite.wav" target="_blank">c_RL_SBite.wav</a>
Convolved with 66% wetness: <a href="http://blog.demofox.org/wp-content/uploads/2015/03/c_RL_SBite_66.wav" target="_blank">c_RL_SBite_66.wav</a>
Convolved with 33% wetness: <a href="http://blog.demofox.org/wp-content/uploads/2015/03/c_RL_SBite_33.wav" target="_blank">c_RL_SBite_33.wav</a>

<h2>Sample Sound Files</h2>

Let's convolve some sounds together and see what results we get!

Here are 11 sounds convoled against each other (and themselves) at 100% wetness so all you hear is the convolution.  Convolution is commutative (A*B == B*A) so that cuts down on the number of combinations (66 instead of 121!)

There are 3 impulse responses, 4 short basic wave forms, 2 voice samples a cymbal drum pattern and a jaw harp twang.

Some of the results are pretty interesting, for instance check out SoundBite vs SoundCymbal.  The beat of the cymbal pattern remains, but the actual symbal sound itself is replaced by the words!

<table>
<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>

<tr>
<td></td>
</tr>
</table>

<h2>Sample Code</h2>

Here is some sample code which will read in "in.wav" and "reverb.wav", do convolution on them and save the fully wet result (convolution result only, no original in.wav mixed into the output) as "out.wav".

Note that this code was used to process the sample sound files above.  Usual caveat: The sound loading code isn't bulletproof.  I've found it works best with 16 bit mono sound files.  If you need better sound loading, check out libsndfile!

#define _CRT_SECURE_NO_WARNINGS

#include <array>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <cmath>
#include <vector>
#include <chrono>
#include <ctime>

#define _USE_MATH_DEFINES
#include <math.h>

//=====================================================================================
// SNumeric - uses phantom types to enforce type safety
//=====================================================================================
template <typename T, typename PHANTOM_TYPE>
struct SNumeric
{
public:
explicit SNumeric(const T &value) : m_value(value) { }
SNumeric() : m_value() { }
inline T& Value() { return m_value; }
inline const T& Value() const { return m_value; }

typedef SNumeric<T, PHANTOM_TYPE> TType;
typedef T TInnerType;

// Math Operations
TType operator+ (const TType &b) const
{
return TType(this->Value() + b.Value());
}

TType operator- (const TType &b) const
{
return TType(this->Value() - b.Value());
}

TType operator* (const TType &b) const
{
return TType(this->Value() * b.Value());
}

TType operator/ (const TType &b) const
{
return TType(this->Value() / b.Value());
}

TType& operator+= (const TType &b)
{
Value() += b.Value();
return *this;
}

TType& operator-= (const TType &b)
{
Value() -= b.Value();
return *this;
}

TType& operator*= (const TType &b)
{
Value() *= b.Value();
return *this;
}

TType& operator/= (const TType &b)
{
Value() /= b.Value();
return *this;
}

TType& operator++ ()
{
Value()++;
return *this;
}

TType& operator-- ()
{
Value()--;
return *this;
}

// Extended Math Operations
template <typename T>
T Divide(const TType &b)
{
return ((T)this->Value()) / ((T)b.Value());
}

// Logic Operations
bool operator< (const TType &b) const {
return this->Value() < b.Value();
}
bool operator<= (const TType &b) const {
return this->Value() <= b.Value();
}
bool operator> (const TType &b) const {
return this->Value() > b.Value();
}
bool operator>= (const TType &b) const {
return this->Value() >= b.Value();
}
bool operator== (const TType &b) const {
return this->Value() == b.Value();
}
bool operator!= (const TType &b) const {
return this->Value() != b.Value();
}

private:
T m_value;
};

//=====================================================================================
// Typedefs
//=====================================================================================

typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef int16_t int16;
typedef int32_t int32;

// type safe types!
typedef SNumeric<float, struct S__Frequency>      TFrequency;
typedef SNumeric<uint32, struct S__TimeMs>        TTimeMs;
typedef SNumeric<uint32, struct S__Samples>       TSamples;
typedef SNumeric<float, struct S__FractSamples>   TFractionalSamples;
typedef SNumeric<float, struct S__Decibels>       TDecibels;
typedef SNumeric<float, struct S__Amplitude>      TAmplitude;
typedef SNumeric<float, struct S__Phase>          TPhase;

//=====================================================================================
// Constants
//=====================================================================================

static const float c_pi = (float)M_PI;
static const float c_twoPi = c_pi * 2.0f;

//=====================================================================================
// Structs
//=====================================================================================

struct SSoundSettings
{
TSamples        m_sampleRate;
};

//=====================================================================================
// Conversion Functions
//=====================================================================================
inline TDecibels AmplitudeToDB(TAmplitude volume)
{
return TDecibels(log10(volume.Value()));
}

inline TAmplitude DBToAmplitude(TDecibels dB)
{
return TAmplitude(pow(10.0f, dB.Value() / 20.0f));
}

TSamples SecondsToSamples(const SSoundSettings &s, float seconds)
{
return TSamples((int)(seconds * (float)s.m_sampleRate.Value()));
}

TSamples MilliSecondsToSamples(const SSoundSettings &s, float milliseconds)
{
return SecondsToSamples(s, milliseconds / 1000.0f);
}

TTimeMs SecondsToMilliseconds(float seconds)
{
return TTimeMs((uint32)(seconds * 1000.0f));
}

TFrequency Frequency(float octave, float note)
{
/* frequency = 440×(2^(n/12))
Notes:
0  = A
1  = A#
2  = B
3  = C
4  = C#
5  = D
6  = D#
7  = E
8  = F
9  = F#
10 = G
11 = G# */
return TFrequency((float)(440 * pow(2.0, ((double)((octave - 4) * 12 + note)) / 12.0)));
}

template <typename T>
T AmplitudeToAudioSample(const TAmplitude& in)
{
const T c_min = std::numeric_limits<T>::min();
const T c_max = std::numeric_limits<T>::max();
const float c_minFloat = (float)c_min;
const float c_maxFloat = (float)c_max;

float ret = in.Value() * c_maxFloat;

if (ret < c_minFloat)
return c_min;

if (ret > c_maxFloat)
return c_max;

return (T)ret;
}

TAmplitude GetLerpedAudioSample(const std::vector<TAmplitude>& samples, TFractionalSamples& index)
{
// get the index of each sample and the fractional blend amount
uint32 a = (uint32)floor(index.Value());
uint32 b = a + 1;
float fract = index.Value() - floor(index.Value());

// get our two amplitudes
float ampA = 0.0f;
if (a >= 0 && a < samples.size())
ampA = samples&#91;a&#93;.Value();

float ampB = 0.0f;
if (b >= 0 && b < samples.size())
ampB = samples&#91;b&#93;.Value();

// return the lerped result
return TAmplitude(fract * ampB + (1.0f - fract) * ampA);
}

void NormalizeSamples(std::vector<TAmplitude>& samples, TAmplitude maxAmplitude, TSamples envelopeTimeFrontBack)
{
// nothing to do if no samples
if (samples.size() == 0)
return;

// 1) find the largest absolute value in the samples.
TAmplitude largestAbsVal = TAmplitude(abs(samples.front().Value()));
std::for_each(samples.begin() + 1, samples.end(), [&largestAbsVal](const TAmplitude &a)
{
TAmplitude absVal = TAmplitude(abs(a.Value()));
if (absVal > largestAbsVal)
largestAbsVal = absVal;
}
);

// 2) adjust largestAbsVal so that when we divide all samples, none will be bigger than maxAmplitude
// if the value we are going to divide by is <= 0, bail out
largestAbsVal /= maxAmplitude;
if (largestAbsVal <= TAmplitude(0.0f))
return;

// 3) divide all numbers by the largest absolute value seen so all samples are &#91;-maxAmplitude,+maxAmplitude&#93;
// also apply front and back envelope
const TSamples c_frontEnvelopeEnd(envelopeTimeFrontBack);
const TSamples c_backEnvelopeStart(samples.size() - envelopeTimeFrontBack.Value());

for (TSamples index(0), numSamples(samples.size()); index < numSamples; ++index)
{
// calculate envelope
TAmplitude envelope(1.0f);
if (index < c_frontEnvelopeEnd)
envelope = TAmplitude(index.Divide<float>(envelopeTimeFrontBack));
else if (index > c_backEnvelopeStart)
envelope = TAmplitude(1.0f) - TAmplitude((index - c_backEnvelopeStart).Divide<float>(envelopeTimeFrontBack));

// apply envelope while normalizing
samples[index.Value()] = samples[index.Value()] * envelope / largestAbsVal;

}
}

void ResampleData(std::vector<TAmplitude>& samples, int srcSampleRate, int destSampleRate)
{
//if the requested sample rate is the sample rate it already is, bail out and do nothing
if (srcSampleRate == destSampleRate)
return;

//calculate the ratio of the old sample rate to the new
float fResampleRatio = (float)destSampleRate / (float)srcSampleRate;

//calculate how many samples the new data will have and allocate the new sample data
int nNewDataNumSamples = (int)((float)samples.size() * fResampleRatio);

std::vector<TAmplitude> newSamples;
newSamples.resize(nNewDataNumSamples);

//get each lerped output sample.  There are higher quality ways to resample
for (int nIndex = 0; nIndex < nNewDataNumSamples; ++nIndex)
newSamples&#91;nIndex&#93; = GetLerpedAudioSample(samples, TFractionalSamples((float)nIndex / fResampleRatio));

//free the old data and set the new data
std::swap(samples, newSamples);
}

void ChangeNumChannels(std::vector<TAmplitude>& samples, int nSrcChannels, int nDestChannels)
{
//if the number of channels requested is the number of channels already there, or either number of channels is not mono or stereo, return
if (nSrcChannels == nDestChannels ||
nSrcChannels < 1 || nSrcChannels > 2 ||
nDestChannels < 1 || nDestChannels > 2)
{
return;
}

//if converting from mono to stereo, duplicate the mono channel to make stereo
if (nDestChannels == 2)
{
std::vector<TAmplitude> newSamples;
newSamples.resize(samples.size() * 2);
for (size_t index = 0; index < samples.size(); ++index)
{
newSamples&#91;index * 2&#93; = samples&#91;index&#93;;
newSamples&#91;index * 2 + 1&#93; = samples&#91;index&#93;;
}

std::swap(samples, newSamples);
}
//else converting from stereo to mono, mix the stereo channels together to make mono
else
{
std::vector<TAmplitude> newSamples;
newSamples.resize(samples.size() / 2);
for (size_t index = 0; index < samples.size() / 2; ++index)
newSamples&#91;index&#93; = samples&#91;index * 2&#93; + samples&#91;index * 2 + 1&#93;;

std::swap(samples, newSamples);
}
}

float PCMToFloat(unsigned char *pPCMData, int nNumBytes)
{
switch (nNumBytes)
{
case 1:
{
uint8 data = pPCMData&#91;0&#93;;
return (float)data / 255.0f;
}
case 2:
{
int16 data = pPCMData&#91;1&#93; << 8 | pPCMData&#91;0&#93;;
return ((float)data) / ((float)0x00007fff);
}
case 3:
{
int32 data = pPCMData&#91;2&#93; << 16 | pPCMData&#91;1&#93; << 8 | pPCMData&#91;0&#93;;
return ((float)data) / ((float)0x007fffff);
}
case 4:
{
int32 data = pPCMData&#91;3&#93; << 24 | pPCMData&#91;2&#93; << 16 | pPCMData&#91;1&#93; << 8 | pPCMData&#91;0&#93;;
return ((float)data) / ((float)0x7fffffff);
}
default:
{
return 0.0f;
}
}
}

//=====================================================================================
// Wave File Writing Code
//=====================================================================================
{
//the main chunk
unsigned char m_szChunkID&#91;4&#93;;      //0
uint32        m_nChunkSize;        //4
unsigned char m_szFormat&#91;4&#93;;       //8

//sub chunk 1 "fmt "
unsigned char m_szSubChunk1ID&#91;4&#93;;  //12
uint32        m_nSubChunk1Size;    //16
uint16        m_nAudioFormat;      //18
uint16        m_nNumChannels;      //20
uint32        m_nSampleRate;       //24
uint32        m_nByteRate;         //28
uint16        m_nBlockAlign;       //30
uint16        m_nBitsPerSample;    //32

//sub chunk 2 "data"
unsigned char m_szSubChunk2ID&#91;4&#93;;  //36
uint32        m_nSubChunk2Size;    //40

//then comes the data!
};

//this writes a wave file
template <typename T>
bool WriteWaveFile(const char *fileName, const std::vector<TAmplitude> &samples, const SSoundSettings &sound)
{
//open the file if we can
FILE *file = fopen(fileName, "w+b");
if (!file)
return false;

//calculate bits per sample and the data size
const int32 bitsPerSample = sizeof(T) * 8;
const int dataSize = samples.size() * sizeof(T);

//fill out the main chunk

//fill out sub chunk 1 "fmt "
waveHeader.m_nByteRate = sound.m_sampleRate.Value() * 1 * bitsPerSample / 8;
waveHeader.m_nBlockAlign = 1 * bitsPerSample / 8;

//fill out sub chunk 2 "data"

//write the wave data itself, converting it from float to the type specified
std::vector<T> outSamples;
outSamples.resize(samples.size());
for (size_t index = 0; index < samples.size(); ++index)
outSamples&#91;index&#93; = AmplitudeToAudioSample<T>(samples[index]);
fwrite(&outSamples[0], dataSize, 1, file);

//close the file and return success
fclose(file);
return true;
}

//loads a wave file in.  Converts from source format into the specified format
// TOTAL HONESTY: some wave files seem to have problems being loaded through this function and I don't have
// time to investigate why.  It seems to work best with 16 bit mono wave files.
// If you need more robust file loading, check out libsndfile at http://www.mega-nerd.com/libsndfile/
bool ReadWaveFile(const char *fileName, std::vector<TAmplitude>& samples, int32 sampleRate)
{
//open the file if we can
FILE *File = fopen(fileName, "rb");
if (!File)
{
return false;
}

//read the main chunk ID and make sure it's "RIFF"
char buffer[5];
buffer[4] = 0;
if (fread(buffer, 4, 1, File) != 1 || strcmp(buffer, "RIFF"))
{
fclose(File);
return false;
}

uint32 nChunkSize;
if (fread(&nChunkSize, 4, 1, File) != 1)
{
fclose(File);
return false;
}

//read the format and make sure it's "WAVE"
if (fread(buffer, 4, 1, File) != 1 || strcmp(buffer, "WAVE"))
{
fclose(File);
return false;
}

long chunkPosFmt = -1;
long chunkPosData = -1;

while (chunkPosFmt == -1 || chunkPosData == -1)
{
//read a sub chunk id and a chunk size if we can
if (fread(buffer, 4, 1, File) != 1 || fread(&nChunkSize, 4, 1, File) != 1)
{
fclose(File);
return false;
}

//if we hit a fmt
if (!strcmp(buffer, "fmt "))
{
chunkPosFmt = ftell(File) - 8;
}
//else if we hit a data
else if (!strcmp(buffer, "data"))
{
chunkPosData = ftell(File) - 8;
}

fseek(File, nChunkSize, SEEK_CUR);
}

//we'll use this handy struct to load in

//load the fmt part if we can
fseek(File, chunkPosFmt, SEEK_SET);
if (fread(&waveData.m_szSubChunk1ID, 24, 1, File) != 1)
{
fclose(File);
return false;
}

//load the data part if we can
fseek(File, chunkPosData, SEEK_SET);
if (fread(&waveData.m_szSubChunk2ID, 8, 1, File) != 1)
{
fclose(File);
return false;
}

//verify a couple things about the file data
if (waveData.m_nAudioFormat != 1 ||       //only pcm data
waveData.m_nNumChannels < 1 ||        //must have a channel
waveData.m_nNumChannels > 2 ||        //must not have more than 2
waveData.m_nBitsPerSample > 32 ||     //32 bits per sample max
waveData.m_nBitsPerSample % 8 != 0 || //must be a multiple of 8 bites
waveData.m_nBlockAlign > 8)           //blocks must be 8 bytes or lower
{
fclose(File);
return false;
}

//figure out how many samples and blocks there are total in the source data
int nBytesPerBlock = waveData.m_nBlockAlign;
int nNumBlocks = waveData.m_nSubChunk2Size / nBytesPerBlock;
int nNumSourceSamples = nNumBlocks * waveData.m_nNumChannels;

//allocate space for the source samples
samples.resize(nNumSourceSamples);

//maximum size of a block is 8 bytes.  4 bytes per samples, 2 channels
unsigned char pBlockData[8];
memset(pBlockData, 0, 8);

//read in the source samples at whatever sample rate / number of channels it might be in
int nBytesPerSample = nBytesPerBlock / waveData.m_nNumChannels;
for (int nIndex = 0; nIndex < nNumSourceSamples; nIndex += waveData.m_nNumChannels)
{
if (fread(pBlockData, waveData.m_nBlockAlign, 1, File) != 1)
{
fclose(File);
return false;
}

//get the first sample
samples&#91;nIndex&#93;.Value() = PCMToFloat(pBlockData, nBytesPerSample);

//get the second sample if there is one
if (waveData.m_nNumChannels == 2)
{
samples&#91;nIndex + 1&#93;.Value() = PCMToFloat(&pBlockData&#91;nBytesPerSample&#93;, nBytesPerSample);
}
}

//re-sample the sample rate up or down as needed
ResampleData(samples, waveData.m_nSampleRate, sampleRate);

//handle switching from mono to stereo or vice versa
ChangeNumChannels(samples, waveData.m_nNumChannels, 1);

return true;
}

//=====================================================================================
// Oscilators
//=====================================================================================

void AdvancePhase(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
phase += TPhase(frequency.Value() / (float)sampleRate.Value());
while (phase >= TPhase(1.0f))
phase -= TPhase(1.0f);
while (phase < TPhase(0.0f))
phase += TPhase(1.0f);
}

TAmplitude AdvanceOscilator_Sine(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
return TAmplitude(sin(phase.Value()*c_twoPi));
}

TAmplitude AdvanceOscilator_Saw(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
return TAmplitude(phase.Value() * 2.0f - 1.0f);
}

TAmplitude AdvanceOscilator_Square(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
return TAmplitude(phase.Value() > 0.5f ? 1.0f : -1.0f);
}

TAmplitude AdvanceOscilator_Triangle(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
if (phase > TPhase(0.5f))
return TAmplitude((((1.0f - phase.Value()) * 2.0f) * 2.0f) - 1.0f);
else
return TAmplitude(((phase.Value() * 2.0f) * 2.0f) - 1.0f);
}

TAmplitude AdvanceOscilator_Saw_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{

// sum the harmonics
TAmplitude ret(0.0f);
for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
{
TPhase harmonicPhase = phase * TPhase((float)harmonicIndex);
ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / (float)harmonicIndex);
}

ret *= TAmplitude(2.0f / c_pi);

return ret;
}

TAmplitude AdvanceOscilator_Square_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{

// sum the harmonics
TAmplitude ret(0.0f);
for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
{
float harmonicFactor = (float)harmonicIndex * 2.0f - 1.0f;
TPhase harmonicPhase = phase * TPhase(harmonicFactor);
ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / harmonicFactor);
}

ret *= TAmplitude(4.0f / c_pi);

return ret;
}

TAmplitude AdvanceOscilator_Triangle_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{

// sum the harmonics
TAmplitude ret(0.0f);
TAmplitude signFlip(1.0f);
for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
{
float harmonicFactor = (float)harmonicIndex * 2.0f - 1.0f;
TPhase harmonicPhase = phase * TPhase(harmonicFactor);
ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / (harmonicFactor*harmonicFactor)) * signFlip;
signFlip *= TAmplitude(-1.0f);
}

ret *= TAmplitude(8.0f / (c_pi*c_pi));

return ret;
}

//=====================================================================================
// Main
//=====================================================================================
int main(int argc, char **argv)
{
// wetness parameter of reverb.  It's a percentage from 0 to 1.
TAmplitude c_reverbWetness(1.0f);

// keep track of when the process started so we can report how long it took
auto start = std::chrono::system_clock::now().time_since_epoch();

//our desired sound parameters
SSoundSettings sound;
sound.m_sampleRate = TSamples(44100);

// load the input wave file if we can and normalize it
std::vector<TAmplitude> inputData;
{
printf("could not load input wave file!");
return 0;
}
NormalizeSamples(inputData, TAmplitude(1.0f), TSamples(0));

// load the reverb wave file if we can and normalize it
std::vector<TAmplitude> reverbData;
{
printf("could not load reverb wave file!");
return 0;
}
NormalizeSamples(reverbData, TAmplitude(1.0f), TSamples(0));

// allocate space for the output file - which will be the number of samples in the two input files minus 1
// initialize it to silence!
std::vector<TAmplitude> samples;
samples.resize(inputData.size() + reverbData.size() - 1);
std::fill(samples.begin(), samples.end(), TAmplitude(0.0f));

// reverse the reverb data in preparation of convolution
std::reverse(reverbData.begin(), reverbData.end());

// report the number of samples of each file
printf("Input samples = %u\n", inputData.size());
printf("Reverb samples = %u\n", reverbData.size());

// apply the convolution for each output sample
float lastPercentageReported = 0.0f;
char percentageText[512];
percentageText[0] = 0;
printf("Progress: ");
for (TSamples sampleIndex(0), numSamples(samples.size()); sampleIndex < numSamples; ++sampleIndex)
{
// print some periodic output since this can take a while!
float percentage = sampleIndex.Divide<float>(numSamples);
if (percentage >= lastPercentageReported + 0.01f)
{
// erase the last progress text
for (int i = 0, c = strlen(percentageText); i < c; ++i)
printf("%c %c", 8, 8);

// calculte and show progress text
lastPercentageReported = percentage;
auto duration = std::chrono::system_clock::now().time_since_epoch() - start;
auto millis = std::chrono::duration_cast<std::chrono::milliseconds>(duration).count();
float expectedMillis = millis / percentage;
sprintf(percentageText, "%i%%  %0.2f / %0.2f seconds (%0.2f remaining)", int(percentage*100.0f), ((float)millis) / 1000.0f, expectedMillis / 1000.0f, (expectedMillis - millis) / 1000.0f);
printf("%s", percentageText);
}

// get a reference to our output sample
TAmplitude &outputSample = samples[sampleIndex.Value()];

// figure out the first reverb index and input index we should process.
TSamples startReverbIndex(0);
TSamples startInputIndex(0);
if (sampleIndex < TSamples(reverbData.size()))
startReverbIndex = TSamples(reverbData.size()) - sampleIndex - TSamples(1);
else
startInputIndex = sampleIndex - TSamples(reverbData.size());

// for each reverb sample
for (TSamples reverbIndex(startReverbIndex), numReverbSamples(reverbData.size()), inputIndex(startInputIndex), numInputSamples(inputData.size());
reverbIndex < numReverbSamples && inputIndex < numInputSamples;
++reverbIndex, ++inputIndex)
{
const TAmplitude &inputSample = inputData&#91;inputIndex.Value()&#93;;
const TAmplitude &reverbSample = reverbData&#91;reverbIndex.Value()&#93;;
outputSample += inputSample * reverbSample;
}
}

// normalize the reverb to the wetness volume level
NormalizeSamples(samples, c_reverbWetness, TSamples(0));

// apply the dry sound on top, per the wetness settings
for (TSamples inputIndex = TSamples(0), numInputSamples(inputData.size()); inputIndex < numInputSamples; ++inputIndex)
{
const TAmplitude &inputSample = inputData&#91;inputIndex.Value()&#93;;
TAmplitude &outputSample = samples&#91;inputIndex.Value()&#93;;
outputSample += inputSample * (TAmplitude(1.0f) - c_reverbWetness);
}

// normalize the amplitude of the samples to make sure they are as loud as possible without clipping.
// give 3db of headroom and also put an envelope on the front and back that is 50ms
NormalizeSamples(samples, DBToAmplitude(TDecibels(-3.0f)), MilliSecondsToSamples(sound, 50.0f));

// save as a wave file
WriteWaveFile<int16_t>("out.wav", samples, sound);
return 0;
}
```

## Intuition

I had to think about it for a bit but the intuition for why this works is actually pretty straight forward!

The impulse response represents all the echos that occur over time if you make a single sample of 1.0 amplitude. This is your guide for how all samples in an impulse source (IS) should be transformed, where IS is the sound you want to add reverb to.

So, starting out, you put the entire IR into your output buffer starting at output[0] (the first sample in the output buffer), multiplied by IS[0] (the first sample of the sound you are reverbing).

This leaves you with the proper reverb of the first sample in your IS, and your total output buffer length is the same length as your IR.

We only have one sample reverbed though, so we move to the next…

Next, you put the entire IR into the output buffer again, but this time start at output[1] (the second sample in the output buffer) and those IR values you are writing need to be multiplied by IS[1] (the second sample of the sound you are reverbing).

You now have the proper reverb for the first two samples in your IS, and your total output buffer length is the length of your IR + 1.

Rinse and repeat for all samples in your IS, and at the end, the reverb of all samples in your IS will be accounted for in your output buffer, and the size of your output buffer will be the length of your IR plus the length of your IS, minus one.

Pretty simple and intuitive right?

You are essentially playing the IR in the output buffer length(IS) times, where IS[outputBufferIndex] determines the volume (amplitude) that you need to play the IR at.

Convolution does exactly this, which turns out to be pretty slow due to it being a lot of math to preform.

If you have a source file (IS) you are reverbing that is 4 seconds long, running at 44100 samples per second (176,400 samples total), and you have a reverb impulse response (IR) that is 2 seconds long at 44100 samples per second (88,200 samples total), that means that you are going to essentially be mixing the IR into an output buffer 176,400 times.

Each time you play the IR you need to do a multiplication per IR sample (to scale it to the IS[index] amplitude) and an addition to add it to the resulting output buffer.

At 88,200 IR samples with a multiply and add for each sample, and doing that 176,400 times, that means at the end you will need to do 15,558,480,000 (15.5 billion) multiplies and adds.

And remember… that is only for a 4 second sound that is receiving a 2 second reverb… those are pretty small sound files involved! And, that is only a single channel. That would double in stereo, and be even worse in 5.1 or 7.1 surround sound!

So, this method works, but unfortunately it’s pretty darn slow (it took like 5 minutes for me to convolve the legend quote with the large reverb). It could be made faster by introducing threading and SSE instructions, but there is a better way that gives us an even bigger speed up.

Having audio samples means you are working in the “time domain”. If you take a fourier transform of the audio samples, you’ll have the information in the “frequency domain”. As it turns out, if you do a multiplication in the frequency domain, that is equivalent to doing a convolution in the time domain. Using the fast fourier transform on your input sound and multiplying that by the pre-transformed FFT of the impulse response, and then doing an inverse FFT on the result to get back into the time domain (aka audio samples) is MUCH FASTER than doing actual convolution.

Another problem with convolution reverb is that you need the full sound you are convolving, which makes it basically impossible to use in a live setup. The solution to this is that people convolve windows of data at a time, instead of the whole sound. I think a common window size is 256 samples.

I’ll make a post in the future that addresses both those issues to allow for fast, real time convolution reverb via windowed FFT multiplication.

Also, I said in the last post that this technique is what the pros do. We’ll I should add that this is what they do SOMETIMES. Other times they use DSP algorithms involving comb filters and multi tap delays (and other things) to make fast reverb that sounds pretty decent without needing to do convolution or FFT/IFFT. Check the links section for a link to the details of one such algorithm.

Regarding IR samples (Impulse Response recordings), if you record your own, it should work as is. However, a common thing that people do is remove the loud sound from the impulse response (via deconvolution i think?) to get ONLY the impulse response. It basically makes it so that the IR really is just an IR of that room as if you had a single sample of a “1.0” echoing in the area. Without this step, your reverb convolution will still work, but you’ll get “smearing” of the sound, due to it not being a true 1.0 sample (or in other words, not a true dirac delta).

Luckily there are IR’s available for download online as well. Some are free, some are pay. Check the links section for a few links I found for that (:

Lastly, this post basically teaches you how to do 1 dimensional convolution, showing you an application for it. You can do convolution in higher dimensions as well though, and in fact if you have used photoshop and know about things like gaussian blur, sharpen mask, etc, those guys work by 2d convolution. Bokeh is even apparently convolution, as is a “Minkowski Sum” if you have done work in game physics.

Ill make a graphics related 2d convolution post in the future as well to delve into that stuff a bit deeper.

As one final sound sample, check out this CROSS CORRELATION (not convolution) of SoundBite.wav and ReverbLarge. Cross correlation is the same as convolution except you don’t reverse one of the samples. So, in effect, it’s like I convolved SoundBite.wav with ReverbLarge.wav played backwards.

c_ReverseRL_SBite.wav

More info on cross correlation coming soon. It has uses in audio, but more often it’s used for finding time delays of sounds compared to other sounds, recognizing patterns in sound data even with the presence of noise / distortion, versus having actual audible uses (although there are some of those too!).