DIY Synth 2: Common Wave Forms

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This is the second chapter in a series of tutorials about programming your own synthesizer

In this chapter we’ll talk about oscillators, and some common basic wave forms: Sine, Square, Saw, Triangle and Noise.

By the end, you should have enough knowledge to make some basic electronic melodies.

You can download the full source for this chapter here:  DIY Synthesizer: Chapter 2 Source Code

The Sine Wave

The sine wave is the basis of lots of things in audio synthesis. It can be used on it’s own to make sound, multiple sine waves can be combined to make other more complex wave forms (as we’ll see in the next chapter) and it’s also the basis of a lot of DSP theory and audio analysis. For instance, there is something called Fourier Analysis where you can analyze some audio data and it will tell you what audio frequencies are in that sound data, and how strong each is (useful for advanced synthesis and digital signal processing aka DSP). The math of how to get that information is based on some simple properties of sine waves. More info can be found here: http://en.wikipedia.org/wiki/Fourier_analysis.

If we want to use a sine wave in our audio data, the first problem we hit is that sine has a value from -1 to 1, but our audio data from the last chapter is stored in a 32 bit int, which has a range of -2,147,483,648 to 2,147,483,647, and is unable to store fractional numbers.

The solution is to just map -1 to -2,147,483,648, and 1 to 2,147,483,647 and all the numbers in between represent fractional numbers between -1 and 1.  0.25 for instance would become 536,870,911.

If instead of 32 bits, we wanted to store the data in 16 bits, or 8 bits, we could do that as well.  After generating our floating point audio data, we just convert it differently to get to those 16 bits and 8 bits.  16 bits have a range of -32,768 to 32,767 so 0.25 would convert to 8191.  In 8 bits, wave files want UNSIGNED 8 bit numbers, so the range is 0 to 255.   In that case,  0.25 would become 158.

Note, in the code for this chapter, i modified WriteWaveFile to do this conversion for us so going forward we can work with floating point numbers only and not worry about bits per sample until we want to write the wave file. When you call the function, you have to give it a template parameter specifying what TYPE you want to use for your samples. The three supported types are uint8, int16 and int32. For simple wave forms like those we are working with today, there is no audible difference between the 3, so all the samples just make 16 bit wave files.

So, we bust out some math and figure out here’s how to generate a sine wave, respecting the sample rate and frequency we want to use:

//make a naive sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fFrequency / (float)nSampleRate);
}
WriteWaveFile<int16>("sinenaive.wav",pData,nNumSamples,nNumChannels,nSampleRate);

That does work, and if you listen to the wave file, it does sound correct:
Naive Sine Wave Generation

It even looks correct:
Naive Sine Wave

There is a subtle problem when generating the sine wave that way though which we will talk about next.

Popping aka Discontinuity

The problem with how we generated the wave file only becomes apparent when we try to play two tones right next to each other, like in the following code segment:

//make a discontinuitous (popping) sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
if(nIndex < nNumSamples / 2)
{
float fCurrentFrequency = CalcFrequency(3,3);
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate);
}
else
{
float fCurrentFrequency = CalcFrequency(3,4);
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate);
}
}
WriteWaveFile<int16>("sinediscon.wav",pData,nNumSamples,nNumChannels,nSampleRate);

Quick note about a new function shown here, called CalcFrequency.  I made that function so that you pass the note you want, and the octave you want, and it will return the frequency for that note.  For instance, to get middle C aka C4 (the tone all these samples use), you use CalcFrequency(3,3), which returns approximately 261.626.

Listen to the wave file generated and you can hear a popping noise where the tone changes from one frequency to the next: Discontinuous Sine Wave

So why is this? The reason is because how we are generating our sine waves makes a discontinuity where the 2 wave files change.

Here you can see the point that the frequencies change and how a pretty small discontinuity can make a pretty big impact on your sound! The sound you are hearing has an official name, called a “pop” (DSP / synth / other audio people will talk about popping in their audio, and discontinuity is the reason for it)

Sine Wave Popping

So how do we fix it? Instead of making the sine wave be rigidly based on time, where for each point, we calculate the sine value with no regard to previous values, we use a “Free Spinning Oscillator”.

That is a fancy way of saying we just have a variable keep track of the current PHASE (angle) that we are at in the sine wave for the current sample, and to get the next sample, we advance our phase based on the frequency at the time. Basically our oscillator is a wheel that spins freely, and our current frequency just says how fast to turn the wheel (from wherever it is now) to get the value for the next sample.

Here’s what the looks like in code:


//make a continuous sine wave that changes frequencies
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
if(nIndex < nNumSamples / 2)
{
float fCurrentFrequency = CalcFrequency(3,3);
fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

pData[nIndex] = sin(fPhase);
}
else
{
float fCurrentFrequency = CalcFrequency(3,4);
fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

pData[nIndex] = sin(fPhase);
}
}
WriteWaveFile<int16>("sinecon.wav",pData,nNumSamples,nNumChannels,nSampleRate);

Note that we keep the phase between 0 and 2 * PI. There’s no mathematical reason for needing to do this, but in floating point math, if you let a value get too large, it starts to lose precision. That means, that if you made a wave file that lasted a long time, the audio would start to degrade the longer it played. I also use a while loop instead of a regular if statement, because if someone uses very large frequencies, you can pass 2 * PI a couple of times in a single sample. Also, i check that it’s above zero, because it is valid to use negative frequency values! All stuff to be mindful of when making your own synth programs (:

Here’s what the generated wave file sounds like, notice the smooth transition between the two notes:
Continuous Sine Wave

And here’s what it looks like visually where the wave changes frequency, which you can see is nice and smooth (the bottom wave). The top wave is the popping sine wave image again at the same point in time for reference. On the smooth wave it isn’t even visually noticeable that the frequency has changed.

Continuous Frequency Change

One last word on this… popping is actually sometimes desired and can help make up a part of a good sound. For instance, some percussion sounds can make use of popping to sound more appropriate!

Sine Wave Oscillator

For our final incarnation of a sine wave oscillator, here’s a nice simple helper function:

float AdvanceOscilator_Sine(float &fPhase, float fFrequency, float fSampleRate)
{
fPhase += 2 * (float)M_PI * fFrequency/fSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

return sin(fPhase);
}

You pass that function your current phase, the frequency you want, and the sample rate, and it will advance your phase, and return the value for your next audio sample.

Here’s an example of how to use it:

//make a sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate);
}
WriteWaveFile<int16>("sine.wav",pData,nNumSamples,nNumChannels,nSampleRate);

Here’s what it sounds like (nothing new at this point!):
Vanilla Sine Wave

Wave Amplitude, Volume and Clipping

You can adjust the AMPLITUDE of any wave form by multiplying each sample by a value. Values greater than one increase the amplitude, making it louder, values less than one decrease the amplitude, making it quieter, and negative values flip the wave over, but also have the ability to make it quieter or louder.

One place people use negative amplitudes (volumes) is for noise cancellation. If you have a complex sound that has some noise in it, but you know the source of the noise, you can take that noice, multiply it by -1 to get a volume of -1, and ADD IT (or MIX IT) into the more complex sound, effectively removing the noise from the sound. There are other uses too but this is one concrete, real world example.

This code sample generates a quieter wave file:

//make a quieter sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 0.4f;
}
WriteWaveFile<int16>("sinequiet.wav",pData,nNumSamples,nNumChannels,nSampleRate);

And here’s what that sounds like:
Vanilla Sine Wave – Quiet

And here’s what that looks like:
Sine Quiet

If you recall though, when we write a wave file, we map -1 to the smallest int number we can store, and 1 to the highest int number we can store. What happens if we make something too loud, so that it goes above 1.0 or below -1.0?

One way to fix this would be to “Normalize” the sound data.  To normalize it, you would loop through each sample in the stream and find the highest absolute value sample.  For instance if you had 3 samples: 1.0, -1.2, 0.8,  the highest absolute sample value would be 1.2.

Once you have this value, you loop through the samples in the stream and divide by this number.  After you do this, every sample in the stream will be within the range -1 to 1.  Note that if you had any data that would be clipping, this process has the side effect of making your entire stream quieter since it reduces the amplitude of every sample.  If you didn’t have any clipping data, this process has the side effect of making your entire stream louder because it increases the amplitude of every sample.

Another way to deal with it is to just clamp the values to the -1, 1 range.  In the case of a sine wave, that means we chop off the top and/or the bottom of the wave and there’s just a flat plateau where the numbers went out of range.

This is called clipping, and along with popping are 2 of the main problems people have with audio quality degradation.  Aliasing is a third, and is something we address in the next chapter by the way! (http://en.wikipedia.org/wiki/Aliasing)

Here’s some code for generating a clipping sine wave:

//make a clipping sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 1.4f;
}
WriteWaveFile<int16>("sineclip.wav",pData,nNumSamples,nNumChannels,nSampleRate);

And here’s what it sounds like:
Vanilla Sine Wave – Clipping

Also, here’s what it looks like:
Clipping Sine Wave

Note that in this case, it doesn’t necessarily sound BAD compared to a regular, non clipping sine wave, but it does sound different. That might be a good thing, or a bad thing, depending on your intentions. With more complex sounds, like voice, or acoustic music, this will usually make it sound terrible. Audio engineers have to carefully control the levels (volumes) of the channels being mixed (added) together to make sure the resulting output doesn’t go outside of the valid range and cause clipping. Also, in analog hardware, going out of range can cause damage to the devices if they aren’t built to protect themselves from it!

In the case of real time synthesis, as you might imagine, normalizing wave data is impossible to do because it requires that you know all the sound data up front to be able to normalize the data.  In real time applications, besides just making sure the levels keep everything in range, you also have the option of using a compressor which sort of dynamically normalizes on the fly.  Check this out for more information: http://en.wikipedia.org/wiki/Dynamic_range_compression

Square Wave Oscillator

Here’s the code for the square wave oscillator:

float AdvanceOscilator_Square(float &fPhase, float fFrequency, float fSampleRate)
{
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

if(fPhase <= 0.5f)
return -1.0f;
else
return 1.0f;
}

Note that we are using the phase as if it’s a percentage, instead of an angle. Since we are using it differently, that means if you switch from sine wave to square wave, there will be a discontinuity (a pop). However, in practice this happens anyways almost all the time because unless you change from sine to square at the very top or bottom of the sine wave, there will be discontinuity anyways. In reality, this really doesn’t matter, but you could “fix” it to switch only on those boundaries, or you could use “cross fading” or “blending” to fade one wave out (decrease amplitude from 1 to 0), while bringing the new wave in (increase amplitude from 0 to 1), adding them together to get the output. Doing so will make a smooth transition but adds some complexity, and square waves by nature constantly pop anyways – it’s what gives them their sound!

Here’s what a square wave sounds like and looks like:
Square Wave
Square Wave

Saw Wave Oscillator

We used the saw wave in chapter one. Here’s the code for a saw wave oscillator:

float AdvanceOscilator_Saw(float &fPhase, float fFrequency, float fSampleRate)
{
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

return (fPhase * 2.0f) - 1.0f;
}

Here’s what a saw wave looks and sounds like:
Saw Wave
Saw Wave

Note that sometimes saw waves point the other direction and the “drop off” is on the left instead of on the right, and the rest of the way descends instead of rises but as far as I have seen, there is no audible or practical difference.

Triangle Wave Oscillator

A lot of synths don’t even bother with a triangle wave, and those that do, are just for approximations of a sine wave. A triangle wave sounds a lot like a sine wave and looks a bit like it too.

Here’s the code for a triangle wave oscillator:

float AdvanceOscilator_Triangle(float &fPhase, float fFrequency, float fSampleRate)
{
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

float fRet;
if(fPhase <= 0.5f)
fRet=fPhase*2;
else
fRet=(1.0f - fPhase)*2;

return (fRet * 2.0f) - 1.0f;
}

Here’s what it looks and sounds like:
Triangle Wave
Triangle Wave

Noise Oscillator

Believe it or not, even static has it’s place too. It’s used sometimes for percussion (put an envelope around some static to make a “clap” sound), it can be used as a low frequency oscillator aka LFO (the old “hold and sample” type stuff) and other things as well. Static is just random audio samples.

The code for a noise oscillator is slightly different than the others. You have to pass it the last sample generated (you can pass 0 if it’s the first sample) and it will continue returning that last value until it’s time to generate a new random number. It determines when it’s time based on the frequency you pass in. A higher frequency mean more random numbers will be chosen in the same amount of audio data while a lower frequency means that fewer random numbers will be chosen.

At lower frequencies (like in the sample), it kind of sounds like an explosion or rocket ship sound effect from the 80s which is fun 😛

Here’s the code:

float AdvanceOscilator_Noise(float &fPhase, float fFrequency, float fSampleRate, float fLastValue)
{
unsigned int nLastSeed = (unsigned int)fPhase;
fPhase += fFrequency/fSampleRate;
unsigned int nSeed = (unsigned int)fPhase;

while(fPhase > 2.0f)
fPhase -= 1.0f;

if(nSeed != nLastSeed)
{
float fValue = ((float)rand()) / ((float)RAND_MAX);
fValue = (fValue * 2.0f) - 1.0f;

//uncomment the below to make it slightly more intense
/*
if(fValue < 0)
fValue = -1.0f;
else
fValue = 1.0f;
*/

return fValue;
}
else
{
return fLastValue;
}
}

Here’s what it looks and sounds like:
Noise
Noise Audio

I think it kind of looks like the Arizona desert 😛

As a quick aside, i have the random numbers as random floating point numbers (they can be anything between -1.0 and 1.0). Another way to generate noise is to make it so it will choose only EITHER -1 or 1 and nothing in between. It gives a slightly harsher sound. The code to do that is in the oscillator if you want to try it out, it’s just commented out. There are other ways to generate noise too (check out “pink noise” http://en.wikipedia.org/wiki/Pink_noise) but this ought to be good enough for our immediate needs!

More Exotic Wave Forms

Two other oscillators I’ve used on occasion is the squared sine wave and the rectangle wave.

To create a “squared sine wave” all you need to do is multiply each sample by itself (square the audio sample). This makes a wave form that is similar to sine waves, but a little bit different, and sounds a bit different too.

A rectangle wave is created by making it so the wave spends either more or less time in the “up” or “down” part of the wave. Instead of it being 50% of the time in “up”, and 50% of the time in “down” you can make it so it spends 80% of the time in up, and 20% of the time in down. It makes it sound quite a bit different, and the more different the percentages are, the “brighter” it sounds.

Also, you can add multiple wave form samples together to get more interesting wave forms (like adding a triangle and a square wave of the same frequency together, and reducing the amplitude to avoid clipping). That’s called additive synthesis and we’ll talk more about that next chapter, including how to make more correct wave forms using sine waves to avoid aliasing.

You can also multiply wave forms together to create other, more interesting waves. Strictly speaking this is called AM synthesis (amplitude modulation synthesis) which is also sometimes known as ring modulation when done a certain way.

As you can see, there are a lot of different ways to create oscillators, and the wave forms are just limited by your imagination. Play around and try to make your own oscillators and experiment!

Final Samples

Now we have the simple basics down for being able to create music. here’s a small “song” that is generated in the sample code:
Simple Song

And just to re-inforce how important keeping your wave data continuous is, here’s the same wave file, but about 0.75 seconds in a put a SINGLE -1.0 sample where it doesn’t belong. a single sample wrong when there’s 44100 samples per second and look how much it affects the audio.
Simple Song With Pop

Until Next Time…

Next up we will talk about “aliasing” and how to avoid it, making much better sounding saw, square and triangle waves that are less harsh on the ears.

DIY Synth 1: Sound Output

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This is the first in a series of tutorials on how to make your own software synthesizer.

These tutorials are aimed at C++ programmers, and the example code is meant to be as easy to understand as possible and have as few dependencies as possible. The code ought to compile and run for you no matter what system or compiler you are using with minimal if any changes required.

You can download the full source for this chapter here: DIY Synthesizer Chapter 1 Source Code

Wave File Format

Since making sound come out of computer speakers varies a lot between different systems, we’ll start out just writing a .wave file.

If you want to jump into doing real time audio, i recommend portaudio (http://www.portaudio.com/) , and i also recomend libsndfile for reading and writing other audio file formats(http://www.mega-nerd.com/libsndfile/).

I found these 2 links really helpful in understanding the wave file format:

There’s a lot of optional parts of a wave file header, but we are only going to focus on the bare minimum required to get the job done. Here’s what our wave file header struct looks like:

//this struct is the minimal required header data for a wav file
struct SMinimalWaveFileHeader
{
//the main chunk
unsigned char m_szChunkID[4];
uint32 m_nChunkSize;
unsigned char m_szFormat[4];

//sub chunk 1 "fmt "
unsigned char m_szSubChunk1ID[4];
uint32 m_nSubChunk1Size;
uint16 m_nAudioFormat;
uint16 m_nNumChannels;
uint32 m_nSampleRate;
uint32 m_nByteRate;
uint16 m_nBlockAlign;
uint16 m_nBitsPerSample;

//sub chunk 2 "data"
unsigned char m_szSubChunk2ID[4];
uint32 m_nSubChunk2Size;

//then comes the data!
};

And boringly, here’s the function that fills out the struct and writes it to disk:

bool WriteWaveFile(const char *szFileName, void *pData, int32 nDataSize, int16 nNumChannels, int32 nSampleRate, int32 nBitsPerSample)
{
//open the file if we can
FILE *File = fopen(szFileName,"w+b");
if(!File)
{
return false;
}

SMinimalWaveFileHeader waveHeader;

//fill out the main chunk
memcpy(waveHeader.m_szChunkID,"RIFF",4);
waveHeader.m_nChunkSize = nDataSize + 36;
memcpy(waveHeader.m_szFormat,"WAVE",4);

//fill out sub chunk 1 "fmt "
memcpy(waveHeader.m_szSubChunk1ID,"fmt ",4);
waveHeader.m_nSubChunk1Size = 16;
waveHeader.m_nAudioFormat = 1;
waveHeader.m_nNumChannels = nNumChannels;
waveHeader.m_nSampleRate = nSampleRate;
waveHeader.m_nByteRate = nSampleRate * nNumChannels * nBitsPerSample / 8;
waveHeader.m_nBlockAlign = nNumChannels * nBitsPerSample / 8;
waveHeader.m_nBitsPerSample = nBitsPerSample;

//fill out sub chunk 2 "data"
memcpy(waveHeader.m_szSubChunk2ID,"data",4);
waveHeader.m_nSubChunk2Size = nDataSize;

//write the header
fwrite(&waveHeader,sizeof(SMinimalWaveFileHeader),1,File);

//write the wave data itself
fwrite(pData,nDataSize,1,File);

//close the file and return success
fclose(File);
return true;
}

Nothing too crazy or all that interesting, but it gets the job done. Again, check out those links above if you are interested in the details of why things are written the way they are, or what other options there are.

Generating a Mono Wave File

Now, finally something interesting, we are going to generate some audio data and make a real wave file!

Since they are easy to generate, we’ll use a sawtooth wave for our sound. For more information about sawtooth waves, check out this wikipedia page: http://en.wikipedia.org/wiki/Sawtooth_wave.

int nSampleRate = 44100;
int nNumSeconds = 4;
int nNumChannels = 1;

The sample rate defines how many samples of audio data there are per second. A stream of audio data is nothing more than a stream of numbers, and each number is a single audio sample, so the sample rate is just how many numbers there are per second of audio data. The less numbers you use, the less “horizontal resolution” your sound file has, or, the less times the wave data can change in amplitude per second.

The sample rate also defines the maximum frequency you can store in the audio stream. The maximum frequency you can store is half of the sample rate. In other words, with a 44100 sample rate, the maximum frequency you can store is 22,050hz. The maximum audible frequency for the human ear is about 20,000hz so using a sample rate of 44100 ought to be pretty good for most needs (you might need to go higher, for complex technical reasons, but this is info enough for now!). Here’s some interesting info about audio frequencies: http://en.wikipedia.org/wiki/Audio_frequency

The number of seconds is how long (in seconds) the wave goes on for, and the number of channels is how many audio channels there are. Since this is a mono sound, there is only one audio channel.

int nNumSamples = nSampleRate * nNumChannels * nNumSeconds;
int32 *pData = new int32[nNumSamples];

Here we calculate how many actual audio samples there are and then allocate space to hold the audio data. We are using 32 bit integers, but you could also use 16 bit integers. The number of bits in your audio samples indicates the vertical resolution of your audio data, or how many unique values there are. in 16 bit ints, there are 65536 different values, and in 32 bits there are 4.2 billion different values. If you think about your data as plots on a graph (essentially, what it is, where X is time and Y is wave amplitude) the more bits per sample, and the higher the sample rate, the closer your graph can be to whatever real values you are trying to use (such as a sine wave). Less bits and a lower sample rate mean it’s farther away from the real data you are trying to model, which will cause the audio to sound less correct.

int32 nValue = 0;
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
{
nValue += 8000000;
pData[nIndex] = nValue;
}

Here we are actually creating our wave data. We are using the fact that if you have an int near the maximum value you can store, and then add some more, it will wrap around to the minimum value the int can store. If you look at this on a graph, it looks like a saw tooth wave, ie we are creating a saw tooth wave!  Normally you wouldn’t create them this way because the way we are doing it is harsh on the ear, and introduces something called aliasing (http://en.wikipedia.org/wiki/Aliasing).  In a later tutorial we’ll see how to create a band limited saw tooth wave to make higher quality sound, but for now this will work file!

you can change how much is added to nValue to change the frequency of the resulting wave. Add a smaller number to make it a lower frequency, add a higher number to make it a higher frequency. We’ll get into the math of more finely controlling frequency in another chapter so you can actually match your waves to notes you watch to hit.

WriteWaveFile("outmono.wav",pData,nNumSamples * sizeof(pData[0]),nNumChannels,nSampleRate,sizeof(pData[0])*8);

delete[] pData;
Lastly we write our wave file and free our memory.

Tada! All done, we have a sawtooth mono wave file written out, give it a listen!

DIY Synthesizer Chapter 1: outmono.wav

Writing a Stereo File

The only thing that has really changed in the stereo file is that there are 2 channels instead of 1, and how we generate the audio data is slightly different.  Since there are 2 channels, one for left, one for right, there is actually double the audio data for the same sample rate and time length wave file, since it needs a full set of data for each channel.

The audio data itself is interleaved, meaning that the first audio sample is for the left channel, the second sample is for the right channel, the third sample is for the left channel, and so on.

Here’s how the audio is generated:

int32 nValue1 = 0;
int32 nValue2 = 0;
for(int nIndex = 0; nIndex < nNumSamples; nIndex += 2)
{
nValue1 += 8000000;
nValue2 += 12000000;
pData[nIndex] = nValue1; //left channel
pData[nIndex+1] = nValue2; //right channel
}

Note that for the right channel we write a different frequency wave. I did this so that you can tell the difference between this and the mono file. Play around with the values and try muting one channel or the other to convince yourself that it really is a stereo file!

DIY Synthesizer Chapter 1: outstereo.wav

Until Next Time…

That’s all for chapter 1, thanks for reading.

Next up we’ll talk about the basic wave forms – sine, square, saw, square, and noise – and we’ll talk more about frequency and oscillators.

MoriRT: Pixel and Geometry Caching to Aid Real Time Raytracing

About half a year ago, some really intriguing ideas came to me out of the blue dealing with ways to speed up raytracing.  My plan was to create a couple games, showing off these techniques, and after some curiosity was piqued, write an article up talking about how it worked to share it with others in case they found it useful.

For those of you who don’t know what raytracing is, check out this wikipedia article:

http://en.wikipedia.org/wiki/Ray_tracing_(graphics)

Due to some distractions and technical setbacks unrelated to the raytracing itself, I’ve only gotten one small game working with these techniques.  One of the distractions is that I implemented the game using google’s native client (NaCl) and for some reason so far unknown to me, it doesn’t work on some people’s machines which is very annoying!  I plan to get a mac mini in the next couple months to try and repro / solve the problem.

Check out the game if you want to.  Open this link in google chrome to give it a play:

https://chrome.google.com/webstore/detail/kknobginfngkgcagcfhfpnnlflhkdmol?utm_source=chrome-ntp-icon

The sections of this article are:

  1. Limitations
  2. Geometry Caching
  3. Pixel Caching
  4. Future Work
  5. Compatible Game Ideas

Limitations

Admittedly, these techniques have some pretty big limitations.  I’ve explained my techniques to quite a few fellow game devs and when I mention the limitations, the first reaction people give me is the same one you are probably going to have, which is “oh… well THAT’S dumb!”.  Usually after explaining it a bit more, people perk up again, realizing that you can still work within the limitations to make some neat things.   So please hold off judgement until checking out the rest of the article! (:

The big fat, unashamed limitations are:

  • The camera can’t move (*)
  • Objects in the scene shouldn’t move too much, at least not all at the same time

(* there is a possible exception to the camera not moving limitation in the “Future Work” section)

So…. what the heck kind of games can you make with that?  We’ll get to that later on, but here’s some things these techniques are good at:

  • Changing light color and intensity is relatively inexpensive
  • Changing object color and animating textures is inexpensive
  • These techniques don’t break the parallel-izable nature of raytracing.  Use all those CPU and GPU cores to your heart’s content!

Seems a bit dodgy I’m sure, but read on.

Geometry Caching

The first technique is geometry caching.

The idea behind geometry caching is:  If no objects have moved since the last frame, why should we test which objects each ray hits?  It’s a costly part of the ray tracing, and we already KNOW that we are going to get the same results as last frame, so why even bother?  Let’s just use the info we calculated last frame instead.

Also, if some objects HAVE moved, but we know that the moving objects don’t affect all rays, we can just recalculate the rays that have been affected, without needing to recalculate all rays.

Just because we know the collision points for rays doesn’t mean that we can just skip rendering all together though.  Several things that can make us still need to re-render a ray include:  Animating textures, objects changing colors, lights dimming, lights changing color.  When these things happen, we can re-render a ray much less expensively than normal (just recalculate lighting and shading and such), so they are comparatively inexpensive operations compared to objects actually moving around.

How I handle geometry caching is give each ray (primary and otherwise) a unique ID, and I have a dynamic array that holds the collision info for each ID.

In the part of the code that actually casts a single ray, i pass the ID and a flag saying whether it’s allowed to use the geometry cache.  If it isn’t allowed to use the geometry cache, or there is no entry in the cache for the ID, the code calculates intersection info and puts it into the geometry cache.

It then uses the geometry cache information (whether it was re-calculated, or was usable as is) and applies phong shading, does texture lookups, recurses for ray refraction and reflection, and does the other things to figure out the color of the pixel.

In my first implementation of the geometry cache, it was very fast to render with once it was filled in, but it was really expensive to invalidate individual cache items.  If an object moved and a couple hundred geometry cache items needed to be marked as dirty, it was a really computationally expensive operation!

A better option, the one i use now, involves both a 2d grid (for the screen pixels) and a 3d grid (to hold the geometry of the world).

Breaking the screen into a grid, when each ray is cast into the world, I’m able to tell the ray what screen cell it belongs to.   This way, as a ray traverses the 3d grid holding the world geometry, it’s able to add itself each world grid maintains to keep track of which rays pass through that 3d cell (keeping just the unique values of course!).  Child rays know what screen cell they are in by getting that value from their parent.

If an object moves in the world, you can make a union of which world cells it occupied before it moved, and which world cells it occupies after the move.  From there, you can make a union of which screen cells sent rays into that world cell.  The last step is to mark all those screen cells as “geometry dirty” so that next frame, the rays in those cells are disallowed from using the geometry cache data, and instead will re-calculate new intersection info.

This method makes it so potentially a lot of rays re-calcuate their intersection data that don’t really need to, but by tuning the size of the screen and world grids, you can find a good happy medium for your use cases.

If you have an idea to better maintain the geometry cache, feel free to post a comment about it!

Pixel Caching

The second technique is pixel caching which is a fancy way of saying “don’t redraw pixels that we don’t have to”.  The less rays you have to cast, the faster your scene will render.

The first challenge to tackle in this problem is how do you know which pixels will be affected when an object changes color?  That is solved by the same mechanism that tells us when geometry cache data is invalidated.

When an object changes color (or has some other non-geometry change), you just get the list of world cells the object resides in, and then get the union of screen cells that sent rays through those world cells.

When you have that list, instead of marking the screen cell “geometry dirty”, you mark it as “pixel dirty”.

When rendering the screen cells, any screen cell that isn’t marked as dirty in either way can be completely skipped.  Rendering it is a no-op because it would be the same pixels as last time! (:

This is the reason why you want to minimize geometry changes (objects moving, rotating, resizing, etc) and if you have to,  rely instead on animating textures, object colors, and lighting colors / intensities.

Future Work

Here’s a smattering of ideas for future work that I think ought to bear fruit:

  • Replace the screen and/or world grid with better performing data structures
  • Pre-compute (a pack time process) the primary rays and subsequent rays of static geometry and don’t store static geometry in the world grid, but store it in something else instead like perhaps a BSP tree.  This way, at run time, if a ray misses all objects in the dynamic geometry world grid, it can just use the info from the pre-computed static geometry, no matter how complex the static geometry is.  If something DOES hit a dynamic object however, you’ll have to test subsequent rays against both the dynamic object world grid, and the data structure holding the info about the static geometry but hopefully it’ll be a net win in general.
  • Investigate to see how to integrate this with photon mapping techniques and data structures.  Photon mapping is essentially ray tracing from the opposite direction (from light to camera, instead of from camera to light).  Going the opposite direction, there are some things it’s really good at – like caustics – which ray tracing alone just isn’t suited for: http://en.wikipedia.org/wiki/Photon_mapping
  • In a real game, some things in the world will be obscured by the UI overlays.  There might be an oportunity in some places to “early out” when rendering a single ray if it was obscured by UI.  It would complicate caching those since an individual ray could remain dirty while the screen cell itself was marked as clean.
  • Orthographic camera:  If the camera is orthographic, that means you could pan the camera without invalidating the pixel and geometry cache.  This would allow the techniques to be used for a side scrolling game, overhead view game, and things of that nature – so long as orthographic projection looks good enough for the needs of the game.  I think if you got creative, it could end up looking pretty nice.
  • Screen space effects: enhance the raytracing visuals with screen space particles and such.  Could also keep a “Z-Buffer” by having a buffer that holds the time each ray took to hit the first object.  This would allow more advanced effects.
  • Interlaced rendering: to halve the rendering time, every frame could render every other horizontal line.  Un-dirtying a screen cell would take 2 frames but this ought to be a pretty straight forward and decent win if losing a little bit of quality is ok.
  • red/blue 3d glasses mode:  This is actually a feature of my snake game but figured i’d call it out.  It works by rendering the scene twice which is costly (each “camera” has it’s own geometry and pixel cache at least).  If keeping the “Z-Buffer” as mentioned above, there might be a way to fake it more cheaply but not sure.

Compatible Game Ideas

Despite the limitations, I’ve been keeping a list of games that would be compatible with these ideas.  Here’s the highlights of that list!

  • Pinball:  Only flippers, and the area around the ball would actually have geometry changes, limiting geometry cache invalidating.  Could do periodic, cycling color / lighting animations on other parts of the board to spice the board up in the “non active” areas.
  • Marble Madness Clone: Using an orthographic camera, to allow camera paning, a player could control a glass or mirrored ball through a maze with dangerous traps and time limits.  Marble Madness had very few moving objects and was more about the static geometry so there’d probably be a lot of mileage here.  You could also have animated textures for pools of acid so that they didn’t impact the geometry cache.
  • Zelda 1 and other overhead view type games: Using ortho camera to allow panning, or in the case of Zelda 1, have each “room” be a static camera.  You’d have to keep re-rendering down somehow by minimizing enemy count, while still making it challenging.  Could be difficult.
  • Metroidvania: side scroller with ortho camera to allow panning.  Could walk behind glass pillars and waterfalls for cool refractive effects.
  • Monkey Island type game: LOTS of static geometry in a game like that which would be nice.
  • Arkanoid type game: static camera, make use of screen space effects for break bricking particles etc
  • Mystery game: Static scenes where you can use a magnifying glass to LITERALLY view things better (magnification due to refraction, just like in real life) to find clues and solve the mystery.  Move from screen to screen to find new clues and find people to talk to, progress the storyline etc.
  • Puzzle Game: could possibly do a traditional “block based” puzzle game like puzzle fighters, tetris, etc.
  • Physics based puzzle game: You set up pieces on a board (only one object moves at once! your active piece!) then press “play”.  Hopefully it’d be something like a ball goes through your contraption which remains mostly motionless and you beat the level if you get the ball in the hole or something.
  • Somehow work optics into gameplay… maybe a puzzle game based on lasers and lights or something
  • Pool and board games: as always, gotta have a chess game with insane, state of the art graphics hehe
  • mini golf: A fixed camera when you are taking your shot, with a minimum of moving objects (windmills, the player, etc).  When you hit the ball, it rolls, and when it stops, the camera teleports to the new location.
  • Security gaurd game:  Have several raytraced viewports which are played up to be security camera feeds.  Could have scenes unfold in one feed at a time to keep screen pixel redraw low.
  • Turn based overhead view game:  Ortho camera for panning, and since it’s turn based, you can probably keep object movement down to one at a time.

Lastly, here’s a video describing this stuff in action.  When you view the video, the orange squares are screen tiles that are completely clean (no rendering required, they are a no-op).  Purple squares are screen tiles that were able to use the geometry cache.   Where you don’t see any squares at all, it had to re-render the screen tile from scratch and wasn’t able to make use of either caching feature.

Feedback and comments welcomed!  I’d be really interested too in hearing if anyone actually uses this for anything or tries to implement in on their own.