DIY Synth: Convolution Reverb & 1D Discrete Convolution of Audio Samples

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

I just learned about this technique a couple weeks ago and am SUPER stoked to be able to share this info. Honestly, it’s kind of bad ass and a little magical.

If there is a physical location that has the reverb you are looking for (a church, a cave, your bathroom, the subway, whatever), you can go there and record the sound of some short, loud sound (a clap, a starter pistol, whatever). Now that you have this sound file, you can use a mathematical technique called “Discrete Convolution” (not as difficult as it sounds) to apply that reverb to other sounds.

Not only that, you can use convolution to apply various properties of one sound to another sound. Like you could convolve a saw wave and a recording of a voice and make it sound robotic.

This is pretty cool right?

Convolution

There are two types of convolution out there… continuous convolution and discrete convolution.

Continuous convolution is a method where you can take two formulas and perform some operations (algebra, calculus, etc) to get a third formula.

In the cases that us programmers usually work with, however, you usually only have data, and not a mathematical formula describing that data.

When you have specific data points instead of an equation, you can use something called discrete convolution instead.

Discrete convolution is a method where you take two streams of data (IE two arrays of data!) and perform some operations (only looping, addition and multiplication!) to get a third stream of data.

Interestingly, this third stream of data will be the size of the first stream of data, plus the size of the second stream of data, minus one.

This link explains discrete convolution much better than I could, and even has an interactive demo, so check it out: Discrete Convolution

Here is some pseudo code that shows how discrete convolution works.

// We are calculating bufferA * bufferB (discrete convolution)
// initialize out buffer to empty, and make sure it's the right size
outBuffer[sizeof(bufferA)+sizeof(bufferB)-1] = 0

// reverse one of the buffers in preperation of the convolution
reverse(bufferB);

// Convolve!
for (int outBufferIndex = 0; outBufferIndex < sizeof(outBuffer); ++outBufferIndex)
{
  bufferAIndex = 0;
  bufferBIndex = 0;
  if (outBufferIndex < sizeof(bufferB))
    bufferBIndex = sizeof(bufferB) - outBufferIndex - 1;
  else
    bufferAIndex = outBufferIndex - sizeof(bufferB);

  for (; bufferAIndex < sizeof(bufferA) && bufferBIndex < sizeof(bufferB); ++bufferAIndex, ++bufferBIndex)
  {
    outBuffer[outBufferIndex] += bufferA[bufferAIndex] * bufferB[bufferBIndex];
  }
}

That's all there is to it!

Convolution Reverb

Like I mentioned above, to do convolution reverb you just convolve the audio file you want to add reverb to with the recorded sound from the place that has the reverb you want. That second sound is called the Impulse Response, or IR for short.

Besides letting you choose a reverb model (like “cavernous” or “small hallway” or “bright tiled room”), many reverbs will have a “wetness” parameter that controls how much reverb you hear versus how much of the origional noise you hear. A wetness of 1.0 means all you hear is the reverb (the convolution), while a wetness of 0.0 means all you hear is the origional sound (the dry sound). It’s just a percentage and literally is just used to lerp between the two signals.

Here is an example of how changes in wetness can sound, and also give you a first glimpse of just how good this technique works!

Dry Sound: SoundBite
Impulse Response: ReverbLarge

Convolved with 100% wetness: c_RL_SBite.wav
Convolved with 66% wetness: c_RL_SBite_66.wav
Convolved with 33% wetness: c_RL_SBite_33.wav

Sample Sound Files

Let’s convolve some sounds together and see what results we get!

Here are 11 sounds convoled against each other (and themselves) at 100% wetness so all you hear is the convolution. Convolution is commutative (A*B == B*A) so that cuts down on the number of combinations (66 instead of 121!)

There are 3 impulse responses, 4 short basic wave forms, 2 voice samples a cymbal drum pattern and a jaw harp twang.

Some of the results are pretty interesting, for instance check out SoundBite vs SoundCymbal. The beat of the cymbal pattern remains, but the actual symbal sound itself is replaced by the words!

ReverbLarge
ReverbLarge c_RL_RL.wav
ReverbMedium c_RL_RM.wav
ReverbSmall c_RL_RS.wav
SoundBite c_RL_SBite.wav
SoundCymbal c_RL_SCymbal.wav
SoundJawharp c_RL_SJawharp.wav
SoundLegend c_RL_SLegend.wav
SoundSaw c_RL_SSaw.wav
SoundSine c_RL_SSine.wav
SoundSquare c_RL_SSquare.wav
SoundTriangle c_RL_STriangle.wav
ReverbMedium
ReverbLarge c_RL_RM.wav
ReverbMedium c_RM_RM.wav
ReverbSmall c_RM_RS.wav
SoundBite c_RM_SBite.wav
SoundCymbal c_RM_SCymbal.wav
SoundJawharp c_RM_SJawharp.wav
SoundLegend c_RM_SLegend.wav
SoundSaw c_RM_SSaw.wav
SoundSine c_RM_SSine.wav
SoundSquare c_RM_SSquare.wav
SoundTriangle c_RM_STriangle.wav
ReverbSmall
ReverbLarge c_RL_RS.wav
ReverbMedium c_RM_RS.wav
ReverbSmall c_RS_RS.wav
SoundBite c_RS_SBite.wav
SoundCymbal c_RS_SCymbal.wav
SoundJawharp c_RS_SJawharp.wav
SoundLegend c_RS_SLegend.wav
SoundSaw c_RS_SSaw.wav
SoundSine c_RS_SSine.wav
SoundSquare c_RS_SSquare.wav
SoundTriangle c_RS_STriangle.wav
SoundBite
ReverbLarge c_RL_SBite.wav
ReverbMedium c_RM_SBite.wav
ReverbSmall c_RS_SBite.wav
SoundBite c_SBite_SBite.wav
SoundCymbal c_SBite_SCymbal.wav
SoundJawharp c_SBite_SJawharp.wav
SoundLegend c_SBite_SLegend.wav
SoundSaw c_SBite_SSaw.wav
SoundSine c_SBite_SSine.wav
SoundSquare c_SBite_SSquare.wav
SoundTriangle c_SBite_STriangle.wav
SoundCymbal
ReverbLarge c_RL_SCymbal.wav
ReverbMedium c_RM_SCymbal.wav
ReverbSmall c_RS_SCymbal.wav
SoundBite c_SBite_SCymbal.wav
SoundCymbal c_SCymbal_SCymbal.wav
SoundJawharp c_SCymbal_SJawharp.wav
SoundLegend c_SCymbal_SLegend.wav
SoundSaw c_SCymbal_SSaw.wav
SoundSine c_SCymbal_SSine.wav
SoundSquare c_SCymbal_SSquare.wav
SoundTriangle c_SCymbal_STriangle.wav
SoundJawharp
ReverbLarge c_RL_SJawharp.wav
ReverbMedium c_RM_SJawharp.wav
ReverbSmall c_RS_SJawharp.wav
SoundBite c_SBite_SJawharp.wav
SoundCymbal c_SCymbal_SJawharp.wav
SoundJawharp c_SJawharp_SJawharp.wav
SoundLegend c_SJawharp_SLegend.wav
SoundSaw c_SJawharp_SSaw.wav
SoundSine c_SJawharp_SSine.wav
SoundSquare c_SJawharp_SSquare.wav
SoundTriangle c_SJawharp_STriangle.wav
SoundLegend
ReverbLarge c_RL_SLegend.wav
ReverbMedium c_RM_SLegend.wav
ReverbSmall c_RS_SLegend.wav
SoundBite c_SBite_SLegend.wav
SoundCymbal c_SCymbal_SLegend.wav
SoundJawharp c_SJawharp_SLegend.wav
SoundLegend c_SLegend_SLegend.wav
SoundSaw c_SLegend_SSaw.wav
SoundSine c_SLegend_SSine.wav
SoundSquare c_SLegend_SSquare.wav
SoundTriangle c_SLegend_STriangle.wav
SoundSaw
ReverbLarge c_RL_SSaw.wav
ReverbMedium c_RM_SSaw.wav
ReverbSmall c_RS_SSaw.wav
SoundBite c_SBite_SSaw.wav
SoundCymbal c_SCymbal_SSaw.wav
SoundJawharp c_SJawharp_SSaw.wav
SoundLegend c_SLegend_SSaw.wav
SoundSaw c_SSaw_SSaw.wav
SoundSine c_SSaw_SSine.wav
SoundSquare c_SSaw_SSquare.wav
SoundTriangle c_SSaw_STriangle.wav
SoundSine
ReverbLarge c_RL_SSine.wav
ReverbMedium c_RM_SSine.wav
ReverbSmall c_RS_SSine.wav
SoundBite c_SBite_SSine.wav
SoundCymbal c_SCymbal_SSine.wav
SoundJawharp c_SJawharp_SSine.wav
SoundLegend c_SLegend_SSine.wav
SoundSaw c_SSaw_SSine.wav
SoundSine c_SSine_SSine.wav
SoundSquare c_SSine_SSquare.wav
SoundTriangle c_SSine_STriangle.wav
SoundSquare
ReverbLarge c_RL_SSquare.wav
ReverbMedium c_RM_SSquare.wav
ReverbSmall c_RS_SSquare.wav
SoundBite c_SBite_SSquare.wav
SoundCymbal c_SCymbal_SSquare.wav
SoundJawharp c_SJawharp_SSquare.wav
SoundLegend c_SLegend_SSquare.wav
SoundSaw c_SSaw_SSquare.wav
SoundSine c_SSine_SSquare.wav
SoundSquare c_SSquare_SSquare.wav
SoundTriangle c_SSquare_STriangle.wav
SoundTriangle
ReverbLarge c_RL_STriangle.wav
ReverbMedium c_RM_STriangle.wav
ReverbSmall c_RS_STriangle.wav
SoundBite c_SBite_STriangle.wav
SoundCymbal c_SCymbal_STriangle.wav
SoundJawharp c_SJawharp_STriangle.wav
SoundLegend c_SLegend_STriangle.wav
SoundSaw c_SSaw_STriangle.wav
SoundSine c_SSine_STriangle.wav
SoundSquare c_SSquare_STriangle.wav
SoundTriangle c_STriangle_STriangle.wav

Sample Code

Here is some sample code which will read in “in.wav” and “reverb.wav”, do convolution on them and save the fully wet result (convolution result only, no original in.wav mixed into the output) as “out.wav”.

Note that this code was used to process the sample sound files above. Usual caveat: The sound loading code isn’t bulletproof. I’ve found it works best with 16 bit mono sound files. If you need better sound loading, check out libsndfile!

#define _CRT_SECURE_NO_WARNINGS
 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
 
#define _USE_MATH_DEFINES
#include 
 
//=====================================================================================
// SNumeric - uses phantom types to enforce type safety
//=====================================================================================
template 
struct SNumeric
{
public:
    explicit SNumeric(const T &value) : m_value(value) { }
    SNumeric() : m_value() { }
    inline T& Value() { return m_value; }
    inline const T& Value() const { return m_value; }
 
    typedef SNumeric TType;
    typedef T TInnerType;
 
    // Math Operations
    TType operator+ (const TType &b) const
    {
        return TType(this->Value() + b.Value());
    }
 
    TType operator- (const TType &b) const
    {
        return TType(this->Value() - b.Value());
    }
 
    TType operator* (const TType &b) const
    {
        return TType(this->Value() * b.Value());
    }
 
    TType operator/ (const TType &b) const
    {
        return TType(this->Value() / b.Value());
    }
 
    TType& operator+= (const TType &b)
    {
        Value() += b.Value();
        return *this;
    }
 
    TType& operator-= (const TType &b)
    {
        Value() -= b.Value();
        return *this;
    }
 
    TType& operator*= (const TType &b)
    {
        Value() *= b.Value();
        return *this;
    }
 
    TType& operator/= (const TType &b)
    {
        Value() /= b.Value();
        return *this;
    }
 
    TType& operator++ ()
    {
        Value()++;
        return *this;
    }
 
    TType& operator-- ()
    {
        Value()--;
        return *this;
    }
 
    // Extended Math Operations
    template 
    T Divide(const TType &b)
    {
        return ((T)this->Value()) / ((T)b.Value());
    }
 
    // Logic Operations
    bool operatorValue() < b.Value();
    }
    bool operatorValue()  (const TType &b) const {
        return this->Value() > b.Value();
    }
    bool operator>= (const TType &b) const {
        return this->Value() >= b.Value();
    }
    bool operator== (const TType &b) const {
        return this->Value() == b.Value();
    }
    bool operator!= (const TType &b) const {
        return this->Value() != b.Value();
    }
 
private:
    T m_value;
};
 
//=====================================================================================
// Typedefs
//=====================================================================================
 
typedef uint8_t uint8;
typedef uint16_t uint16;
typedef uint32_t uint32;
typedef int16_t int16;
typedef int32_t int32;
 
// type safe types!
typedef SNumeric      TFrequency;
typedef SNumeric        TTimeMs;
typedef SNumeric       TSamples;
typedef SNumeric   TFractionalSamples;
typedef SNumeric       TDecibels;
typedef SNumeric      TAmplitude;
typedef SNumeric          TPhase;
 
//=====================================================================================
// Constants
//=====================================================================================
 
static const float c_pi = (float)M_PI;
static const float c_twoPi = c_pi * 2.0f;
 
//=====================================================================================
// Structs
//=====================================================================================
 
struct SSoundSettings
{
    TSamples        m_sampleRate;
};
 
//=====================================================================================
// Conversion Functions
//=====================================================================================
inline TDecibels AmplitudeToDB(TAmplitude volume)
{
    return TDecibels(log10(volume.Value()));
}
 
inline TAmplitude DBToAmplitude(TDecibels dB)
{
    return TAmplitude(pow(10.0f, dB.Value() / 20.0f));
}
 
TSamples SecondsToSamples(const SSoundSettings &s, float seconds)
{
    return TSamples((int)(seconds * (float)s.m_sampleRate.Value()));
}
 
TSamples MilliSecondsToSamples(const SSoundSettings &s, float milliseconds)
{
    return SecondsToSamples(s, milliseconds / 1000.0f);
}
 
TTimeMs SecondsToMilliseconds(float seconds)
{
    return TTimeMs((uint32)(seconds * 1000.0f));
}
 
TFrequency Frequency(float octave, float note)
{
    /* frequency = 440×(2^(n/12))
    Notes:
    0  = A
    1  = A#
    2  = B
    3  = C
    4  = C#
    5  = D
    6  = D#
    7  = E
    8  = F
    9  = F#
    10 = G
    11 = G# */
    return TFrequency((float)(440 * pow(2.0, ((double)((octave - 4) * 12 + note)) / 12.0)));
}
 
template 
T AmplitudeToAudioSample(const TAmplitude& in)
{
    const T c_min = std::numeric_limits::min();
    const T c_max = std::numeric_limits::max();
    const float c_minFloat = (float)c_min;
    const float c_maxFloat = (float)c_max;
 
    float ret = in.Value() * c_maxFloat;
 
    if (ret  c_maxFloat)
        return c_max;
 
    return (T)ret;
}
 
TAmplitude GetLerpedAudioSample(const std::vector& samples, TFractionalSamples& index)
{
    // get the index of each sample and the fractional blend amount
    uint32 a = (uint32)floor(index.Value());
    uint32 b = a + 1;
    float fract = index.Value() - floor(index.Value());
 
    // get our two amplitudes
    float ampA = 0.0f;
    if (a >= 0 && a = 0 && b < samples.size())
        ampB = samples[b].Value();
 
    // return the lerped result
    return TAmplitude(fract * ampB + (1.0f - fract) * ampA);
}
 
void NormalizeSamples(std::vector& samples, TAmplitude maxAmplitude, TSamples envelopeTimeFrontBack)
{
    // nothing to do if no samples
    if (samples.size() == 0)
        return;
 
    // 1) find the largest absolute value in the samples.
    TAmplitude largestAbsVal = TAmplitude(abs(samples.front().Value()));
    std::for_each(samples.begin() + 1, samples.end(), [&largestAbsVal](const TAmplitude &a)
        {
            TAmplitude absVal = TAmplitude(abs(a.Value()));
            if (absVal > largestAbsVal)
                largestAbsVal = absVal;
        }
    );
 
    // 2) adjust largestAbsVal so that when we divide all samples, none will be bigger than maxAmplitude
    // if the value we are going to divide by is <= 0, bail out
    largestAbsVal /= maxAmplitude;
    if (largestAbsVal <= TAmplitude(0.0f))
        return;
 
    // 3) divide all numbers by the largest absolute value seen so all samples are [-maxAmplitude,+maxAmplitude]
    // also apply front and back envelope
    const TSamples c_frontEnvelopeEnd(envelopeTimeFrontBack);
    const TSamples c_backEnvelopeStart(samples.size() - envelopeTimeFrontBack.Value());

    for (TSamples index(0), numSamples(samples.size()); index < numSamples; ++index)
    {
        // calculate envelope
        TAmplitude envelope(1.0f);
        if (index < c_frontEnvelopeEnd)
            envelope = TAmplitude(index.Divide(envelopeTimeFrontBack));
        else if (index > c_backEnvelopeStart)
            envelope = TAmplitude(1.0f) - TAmplitude((index - c_backEnvelopeStart).Divide(envelopeTimeFrontBack));

        // apply envelope while normalizing
        samples[index.Value()] = samples[index.Value()] * envelope / largestAbsVal;

    }
}
 
void ResampleData(std::vector& samples, int srcSampleRate, int destSampleRate)
{
    //if the requested sample rate is the sample rate it already is, bail out and do nothing
    if (srcSampleRate == destSampleRate)
        return;
 
    //calculate the ratio of the old sample rate to the new
    float fResampleRatio = (float)destSampleRate / (float)srcSampleRate;
 
    //calculate how many samples the new data will have and allocate the new sample data
    int nNewDataNumSamples = (int)((float)samples.size() * fResampleRatio);
 
    std::vector newSamples;
    newSamples.resize(nNewDataNumSamples);
 
    //get each lerped output sample.  There are higher quality ways to resample
    for (int nIndex = 0; nIndex < nNewDataNumSamples; ++nIndex)
        newSamples[nIndex] = GetLerpedAudioSample(samples, TFractionalSamples((float)nIndex / fResampleRatio));
 
    //free the old data and set the new data
    std::swap(samples, newSamples);
}
 
void ChangeNumChannels(std::vector& samples, int nSrcChannels, int nDestChannels)
{
    //if the number of channels requested is the number of channels already there, or either number of channels is not mono or stereo, return
    if (nSrcChannels == nDestChannels ||
        nSrcChannels  2 ||
        nDestChannels  2)
    {
        return;
    }
 
    //if converting from mono to stereo, duplicate the mono channel to make stereo
    if (nDestChannels == 2)
    {
        std::vector newSamples;
        newSamples.resize(samples.size() * 2);
        for (size_t index = 0; index < samples.size(); ++index)
        {
            newSamples[index * 2] = samples[index];
            newSamples[index * 2 + 1] = samples[index];
        }
 
        std::swap(samples, newSamples);
    }
    //else converting from stereo to mono, mix the stereo channels together to make mono
    else
    {
        std::vector newSamples;
        newSamples.resize(samples.size() / 2);
        for (size_t index = 0; index < samples.size() / 2; ++index)
            newSamples[index] = samples[index * 2] + samples[index * 2 + 1];
 
        std::swap(samples, newSamples);
    }
}
 
float PCMToFloat(unsigned char *pPCMData, int nNumBytes)
{
    switch (nNumBytes)
    {
    case 1:
    {
        uint8 data = pPCMData[0];
        return (float)data / 255.0f;
    }
    case 2:
    {
        int16 data = pPCMData[1] << 8 | pPCMData[0];
        return ((float)data) / ((float)0x00007fff);
    }
    case 3:
    {
        int32 data = pPCMData[2] << 16 | pPCMData[1] << 8 | pPCMData[0];
        return ((float)data) / ((float)0x007fffff);
    }
    case 4:
    {
        int32 data = pPCMData[3] << 24 | pPCMData[2] << 16 | pPCMData[1] << 8 | pPCMData[0];
        return ((float)data) / ((float)0x7fffffff);
    }
    default:
    {
        return 0.0f;
    }
    }
}
 
//=====================================================================================
// Wave File Writing Code
//=====================================================================================
struct SMinimalWaveFileHeader
{
    //the main chunk
    unsigned char m_szChunkID[4];      //0
    uint32        m_nChunkSize;        //4
    unsigned char m_szFormat[4];       //8
 
    //sub chunk 1 "fmt "
    unsigned char m_szSubChunk1ID[4];  //12
    uint32        m_nSubChunk1Size;    //16
    uint16        m_nAudioFormat;      //18
    uint16        m_nNumChannels;      //20
    uint32        m_nSampleRate;       //24
    uint32        m_nByteRate;         //28
    uint16        m_nBlockAlign;       //30
    uint16        m_nBitsPerSample;    //32
 
    //sub chunk 2 "data"
    unsigned char m_szSubChunk2ID[4];  //36
    uint32        m_nSubChunk2Size;    //40
 
    //then comes the data!
};
 
//this writes a wave file
template 
bool WriteWaveFile(const char *fileName, const std::vector &samples, const SSoundSettings &sound)
{
    //open the file if we can
    FILE *file = fopen(fileName, "w+b");
    if (!file)
        return false;
 
    //calculate bits per sample and the data size
    const int32 bitsPerSample = sizeof(T) * 8;
    const int dataSize = samples.size() * sizeof(T);
 
    SMinimalWaveFileHeader waveHeader;
 
    //fill out the main chunk
    memcpy(waveHeader.m_szChunkID, "RIFF", 4);
    waveHeader.m_nChunkSize = dataSize + 36;
    memcpy(waveHeader.m_szFormat, "WAVE", 4);
 
    //fill out sub chunk 1 "fmt "
    memcpy(waveHeader.m_szSubChunk1ID, "fmt ", 4);
    waveHeader.m_nSubChunk1Size = 16;
    waveHeader.m_nAudioFormat = 1;
    waveHeader.m_nNumChannels = 1;
    waveHeader.m_nSampleRate = sound.m_sampleRate.Value();
    waveHeader.m_nByteRate = sound.m_sampleRate.Value() * 1 * bitsPerSample / 8;
    waveHeader.m_nBlockAlign = 1 * bitsPerSample / 8;
    waveHeader.m_nBitsPerSample = bitsPerSample;
 
    //fill out sub chunk 2 "data"
    memcpy(waveHeader.m_szSubChunk2ID, "data", 4);
    waveHeader.m_nSubChunk2Size = dataSize;
 
    //write the header
    fwrite(&waveHeader, sizeof(SMinimalWaveFileHeader), 1, file);
 
    //write the wave data itself, converting it from float to the type specified
    std::vector outSamples;
    outSamples.resize(samples.size());
    for (size_t index = 0; index < samples.size(); ++index)
        outSamples[index] = AmplitudeToAudioSample(samples[index]);
    fwrite(&outSamples[0], dataSize, 1, file);
 
    //close the file and return success
    fclose(file);
    return true;
}
 
//loads a wave file in.  Converts from source format into the specified format
// TOTAL HONESTY: some wave files seem to have problems being loaded through this function and I don't have
// time to investigate why.  It seems to work best with 16 bit mono wave files.
// If you need more robust file loading, check out libsndfile at http://www.mega-nerd.com/libsndfile/
bool ReadWaveFile(const char *fileName, std::vector& samples, int32 sampleRate)
{
    //open the file if we can
    FILE *File = fopen(fileName, "rb");
    if (!File)
    {
        return false;
    }
 
    //read the main chunk ID and make sure it's "RIFF"
    char buffer[5];
    buffer[4] = 0;
    if (fread(buffer, 4, 1, File) != 1 || strcmp(buffer, "RIFF"))
    {
        fclose(File);
        return false;
    }
 
    //read the main chunk size
    uint32 nChunkSize;
    if (fread(&nChunkSize, 4, 1, File) != 1)
    {
        fclose(File);
        return false;
    }
 
    //read the format and make sure it's "WAVE"
    if (fread(buffer, 4, 1, File) != 1 || strcmp(buffer, "WAVE"))
    {
        fclose(File);
        return false;
    }
 
    long chunkPosFmt = -1;
    long chunkPosData = -1;
 
    while (chunkPosFmt == -1 || chunkPosData == -1)
    {
        //read a sub chunk id and a chunk size if we can
        if (fread(buffer, 4, 1, File) != 1 || fread(&nChunkSize, 4, 1, File) != 1)
        {
            fclose(File);
            return false;
        }
 
        //if we hit a fmt
        if (!strcmp(buffer, "fmt "))
        {
            chunkPosFmt = ftell(File) - 8;
        }
        //else if we hit a data
        else if (!strcmp(buffer, "data"))
        {
            chunkPosData = ftell(File) - 8;
        }
 
        //skip to the next chunk
        fseek(File, nChunkSize, SEEK_CUR);
    }
 
    //we'll use this handy struct to load in 
    SMinimalWaveFileHeader waveData;
 
    //load the fmt part if we can
    fseek(File, chunkPosFmt, SEEK_SET);
    if (fread(&waveData.m_szSubChunk1ID, 24, 1, File) != 1)
    {
        fclose(File);
        return false;
    }
 
    //load the data part if we can
    fseek(File, chunkPosData, SEEK_SET);
    if (fread(&waveData.m_szSubChunk2ID, 8, 1, File) != 1)
    {
        fclose(File);
        return false;
    }
 
    //verify a couple things about the file data
    if (waveData.m_nAudioFormat != 1 ||       //only pcm data
        waveData.m_nNumChannels  2 ||        //must not have more than 2
        waveData.m_nBitsPerSample > 32 ||     //32 bits per sample max
        waveData.m_nBitsPerSample % 8 != 0 || //must be a multiple of 8 bites
        waveData.m_nBlockAlign > 8)           //blocks must be 8 bytes or lower
    {
        fclose(File);
        return false;
    }
 
    //figure out how many samples and blocks there are total in the source data
    int nBytesPerBlock = waveData.m_nBlockAlign;
    int nNumBlocks = waveData.m_nSubChunk2Size / nBytesPerBlock;
    int nNumSourceSamples = nNumBlocks * waveData.m_nNumChannels;
 
    //allocate space for the source samples
    samples.resize(nNumSourceSamples);
 
    //maximum size of a block is 8 bytes.  4 bytes per samples, 2 channels
    unsigned char pBlockData[8];
    memset(pBlockData, 0, 8);
 
    //read in the source samples at whatever sample rate / number of channels it might be in
    int nBytesPerSample = nBytesPerBlock / waveData.m_nNumChannels;
    for (int nIndex = 0; nIndex = TPhase(1.0f))
        phase -= TPhase(1.0f);
    while (phase  0.5f ? 1.0f : -1.0f);
}
 
TAmplitude AdvanceOscilator_Triangle(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
    AdvancePhase(phase, frequency, sampleRate);
    if (phase > TPhase(0.5f))
        return TAmplitude((((1.0f - phase.Value()) * 2.0f) * 2.0f) - 1.0f);
    else
        return TAmplitude(((phase.Value() * 2.0f) * 2.0f) - 1.0f);
}
 
TAmplitude AdvanceOscilator_Saw_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
    AdvancePhase(phase, frequency, sampleRate);
 
    // sum the harmonics
    TAmplitude ret(0.0f);
    for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
    {
        TPhase harmonicPhase = phase * TPhase((float)harmonicIndex);
        ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / (float)harmonicIndex);
    }
 
    //adjust the volume
    ret *= TAmplitude(2.0f / c_pi);
 
    return ret;
}
 
TAmplitude AdvanceOscilator_Square_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
    AdvancePhase(phase, frequency, sampleRate);
 
    // sum the harmonics
    TAmplitude ret(0.0f);
    for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
    {
        float harmonicFactor = (float)harmonicIndex * 2.0f - 1.0f;
        TPhase harmonicPhase = phase * TPhase(harmonicFactor);
        ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / harmonicFactor);
    }
 
    //adjust the volume
    ret *= TAmplitude(4.0f / c_pi);
 
    return ret;
}
 
TAmplitude AdvanceOscilator_Triangle_BandLimited(TPhase &phase, TFrequency frequency, TSamples sampleRate)
{
    AdvancePhase(phase, frequency, sampleRate);
 
    // sum the harmonics
    TAmplitude ret(0.0f);
    TAmplitude signFlip(1.0f);
    for (int harmonicIndex = 1; harmonicIndex <= 4; ++harmonicIndex)
    {
        float harmonicFactor = (float)harmonicIndex * 2.0f - 1.0f;
        TPhase harmonicPhase = phase * TPhase(harmonicFactor);
        ret += TAmplitude(sin(harmonicPhase.Value()*c_twoPi) / (harmonicFactor*harmonicFactor)) * signFlip;
        signFlip *= TAmplitude(-1.0f);
    }
 
    //adjust the volume
    ret *= TAmplitude(8.0f / (c_pi*c_pi));
 
    return ret;
}
 
//=====================================================================================
// Main
//=====================================================================================
int main(int argc, char **argv)
{
    // wetness parameter of reverb.  It's a percentage from 0 to 1.  
    TAmplitude c_reverbWetness(1.0f);

    // keep track of when the process started so we can report how long it took
    auto start = std::chrono::system_clock::now().time_since_epoch();

    //our desired sound parameters
    SSoundSettings sound;
    sound.m_sampleRate = TSamples(44100);
 
    // load the input wave file if we can and normalize it
    std::vector inputData;
    if (!ReadWaveFile("in.wav", inputData, sound.m_sampleRate.Value()))
    {
        printf("could not load input wave file!");
        return 0;
    }
    NormalizeSamples(inputData, TAmplitude(1.0f), TSamples(0));

    // load the reverb wave file if we can and normalize it
    std::vector reverbData;
    if (!ReadWaveFile("reverb.wav", reverbData, sound.m_sampleRate.Value()))
    {
        printf("could not load reverb wave file!");
        return 0;
    }
    NormalizeSamples(reverbData, TAmplitude(1.0f), TSamples(0));

    // allocate space for the output file - which will be the number of samples in the two input files minus 1
    // initialize it to silence!
    std::vector samples;
    samples.resize(inputData.size() + reverbData.size() - 1);
    std::fill(samples.begin(), samples.end(), TAmplitude(0.0f));

    // reverse the reverb data in preparation of convolution
    std::reverse(reverbData.begin(), reverbData.end());

    // report the number of samples of each file
    printf("Input samples = %un", inputData.size());
    printf("Reverb samples = %un", reverbData.size());

    // apply the convolution for each output sample
    float lastPercentageReported = 0.0f;
    char percentageText[512];
    percentageText[0] = 0;
    printf("Progress: ");
    for (TSamples sampleIndex(0), numSamples(samples.size()); sampleIndex < numSamples; ++sampleIndex)
    {
        // print some periodic output since this can take a while!
        float percentage = sampleIndex.Divide(numSamples);
        if (percentage >= lastPercentageReported + 0.01f)
        {
            // erase the last progress text
            for (int i = 0, c = strlen(percentageText); i < c; ++i)
                printf("%c %c", 8, 8);

            // calculte and show progress text
            lastPercentageReported = percentage;
            auto duration = std::chrono::system_clock::now().time_since_epoch() - start;
            auto millis = std::chrono::duration_cast(duration).count();
            float expectedMillis = millis / percentage;
            sprintf(percentageText, "%i%%  %0.2f / %0.2f seconds (%0.2f remaining)", int(percentage*100.0f), ((float)millis) / 1000.0f, expectedMillis / 1000.0f, (expectedMillis - millis) / 1000.0f);
            printf("%s", percentageText);
        }

        // get a reference to our output sample
        TAmplitude &outputSample = samples[sampleIndex.Value()];

        // figure out the first reverb index and input index we should process.
        TSamples startReverbIndex(0);
        TSamples startInputIndex(0);
        if (sampleIndex < TSamples(reverbData.size()))
            startReverbIndex = TSamples(reverbData.size()) - sampleIndex - TSamples(1);
        else
            startInputIndex = sampleIndex - TSamples(reverbData.size());

        // for each reverb sample
        for (TSamples reverbIndex(startReverbIndex), numReverbSamples(reverbData.size()), inputIndex(startInputIndex), numInputSamples(inputData.size());
            reverbIndex < numReverbSamples && inputIndex < numInputSamples;
            ++reverbIndex, ++inputIndex)
        {
            const TAmplitude &inputSample = inputData[inputIndex.Value()];
            const TAmplitude &reverbSample = reverbData[reverbIndex.Value()];
            outputSample += inputSample * reverbSample;
        }
    }

    // normalize the reverb to the wetness volume level
    NormalizeSamples(samples, c_reverbWetness, TSamples(0));

    // apply the dry sound on top, per the wetness settings
    for (TSamples inputIndex = TSamples(0), numInputSamples(inputData.size()); inputIndex < numInputSamples; ++inputIndex)
    {
        const TAmplitude &inputSample = inputData[inputIndex.Value()];
        TAmplitude &outputSample = samples[inputIndex.Value()];
        outputSample += inputSample * (TAmplitude(1.0f) - c_reverbWetness);
    }

    // normalize the amplitude of the samples to make sure they are as loud as possible without clipping.
    // give 3db of headroom and also put an envelope on the front and back that is 50ms
    NormalizeSamples(samples, DBToAmplitude(TDecibels(-3.0f)), MilliSecondsToSamples(sound, 50.0f));
 
    // save as a wave file
    WriteWaveFile("out.wav", samples, sound);
    return 0;
}

Intuition

I had to think about it for a bit but the intuition for why this works is actually pretty straight forward!

The impulse response represents all the echos that occur over time if you make a single sample of 1.0 amplitude. This is your guide for how all samples in an impulse source (IS) should be transformed, where IS is the sound you want to add reverb to.

So, starting out, you put the entire IR into your output buffer starting at output[0] (the first sample in the output buffer), multiplied by IS[0] (the first sample of the sound you are reverbing).

This leaves you with the proper reverb of the first sample in your IS, and your total output buffer length is the same length as your IR.

We only have one sample reverbed though, so we move to the next…

Next, you put the entire IR into the output buffer again, but this time start at output[1] (the second sample in the output buffer) and those IR values you are writing need to be multiplied by IS[1] (the second sample of the sound you are reverbing).

You now have the proper reverb for the first two samples in your IS, and your total output buffer length is the length of your IR + 1.

Rinse and repeat for all samples in your IS, and at the end, the reverb of all samples in your IS will be accounted for in your output buffer, and the size of your output buffer will be the length of your IR plus the length of your IS, minus one.

Pretty simple and intuitive right?

You are essentially playing the IR in the output buffer length(IS) times, where IS[outputBufferIndex] determines the volume (amplitude) that you need to play the IR at.

Convolution does exactly this, which turns out to be pretty slow due to it being a lot of math to preform.

If you have a source file (IS) you are reverbing that is 4 seconds long, running at 44100 samples per second (176,400 samples total), and you have a reverb impulse response (IR) that is 2 seconds long at 44100 samples per second (88,200 samples total), that means that you are going to essentially be mixing the IR into an output buffer 176,400 times.

Each time you play the IR you need to do a multiplication per IR sample (to scale it to the IS[index] amplitude) and an addition to add it to the resulting output buffer.

At 88,200 IR samples with a multiply and add for each sample, and doing that 176,400 times, that means at the end you will need to do 15,558,480,000 (15.5 billion) multiplies and adds.

And remember… that is only for a 4 second sound that is receiving a 2 second reverb… those are pretty small sound files involved! And, that is only a single channel. That would double in stereo, and be even worse in 5.1 or 7.1 surround sound!

More Info

So, this method works, but unfortunately it’s pretty darn slow (it took like 5 minutes for me to convolve the legend quote with the large reverb). It could be made faster by introducing threading and SSE instructions, but there is a better way that gives us an even bigger speed up.

Having audio samples means you are working in the “time domain”. If you take a fourier transform of the audio samples, you’ll have the information in the “frequency domain”. As it turns out, if you do a multiplication in the frequency domain, that is equivalent to doing a convolution in the time domain. Using the fast fourier transform on your input sound and multiplying that by the pre-transformed FFT of the impulse response, and then doing an inverse FFT on the result to get back into the time domain (aka audio samples) is MUCH FASTER than doing actual convolution.

Another problem with convolution reverb is that you need the full sound you are convolving, which makes it basically impossible to use in a live setup. The solution to this is that people convolve windows of data at a time, instead of the whole sound. I think a common window size is 256 samples.

I’ll make a post in the future that addresses both those issues to allow for fast, real time convolution reverb via windowed FFT multiplication.

Also, I said in the last post that this technique is what the pros do. We’ll I should add that this is what they do SOMETIMES. Other times they use DSP algorithms involving comb filters and multi tap delays (and other things) to make fast reverb that sounds pretty decent without needing to do convolution or FFT/IFFT. Check the links section for a link to the details of one such algorithm.

Regarding IR samples (Impulse Response recordings), if you record your own, it should work as is. However, a common thing that people do is remove the loud sound from the impulse response (via deconvolution i think?) to get ONLY the impulse response. It basically makes it so that the IR really is just an IR of that room as if you had a single sample of a “1.0” echoing in the area. Without this step, your reverb convolution will still work, but you’ll get “smearing” of the sound, due to it not being a true 1.0 sample (or in other words, not a true dirac delta).

Luckily there are IR’s available for download online as well. Some are free, some are pay. Check the links section for a few links I found for that (:

Lastly, this post basically teaches you how to do 1 dimensional convolution, showing you an application for it. You can do convolution in higher dimensions as well though, and in fact if you have used photoshop and know about things like gaussian blur, sharpen mask, etc, those guys work by 2d convolution. Bokeh is even apparently convolution, as is a “Minkowski Sum” if you have done work in game physics.

Ill make a graphics related 2d convolution post in the future as well to delve into that stuff a bit deeper.

As one final sound sample, check out this CROSS CORRELATION (not convolution) of SoundBite.wav and ReverbLarge. Cross correlation is the same as convolution except you don’t reverse one of the samples. So, in effect, it’s like I convolved SoundBite.wav with ReverbLarge.wav played backwards.

c_ReverseRL_SBite.wav

More info on cross correlation coming soon. It has uses in audio, but more often it’s used for finding time delays of sounds compared to other sounds, recognizing patterns in sound data even with the presence of noise / distortion, versus having actual audible uses (although there are some of those too!).

Links

A good read explaining 1d and 2d convolution
More info on convolution reverb
Even more info on convolution reverb
Explains discrete convolution and has an interactive demo
Khan Academy: Continuous Convolution
Info on faked reverb
More info on faked reverb
A reverb blog!
Some free IRs you can download