If you want to make a sound shorter, you can play it faster. Doing this also makes it higher pitch unfortunately.
If you want to make a sound longer, you can play it more slowly. This also makes it lower pitch though.
Sound length and pitch are tied together and there’s no way to change one without changing the other.
… Actually that’s a lie. Granular synthesis can be used to change playback speed and pitch independently!
This post talks about how granular synthesis works, gives some examples you can listen to, and also supplies simple standalone C++ code that does it. (680 lines of code, one source file, only standard C++ includes, no libraries used.)
By the end of this post, you should even be able to program your own “autotune” effect.
The code and audio samples are available on github here: https://github.com/Atrix256/GranularSynth
Granular Synthesis Basics
Granular synthesis is conceptually pretty simple. The first step is to break a sound file up into small sections of sounds called “grains” that are typically between 10 and 100 milliseconds long.
You don’t have to do anything special to make grains, you just literally cut the sound up into a bunch of pieces.
To make a sound twice as long, you then make a new sound where each grain is repeated twice. When you play it back, it will sound mostly the same and be the same pitch, but will be twice as long.
To make a sound half as long, you would just throw away every other grain. The result is a sound that mostly sounds the same, and is the same pitch, but is half as long.
You aren’t restricted to integers though. You could easily throw out every 3rd grain to make it 2/3 as long or repeat every 5th grain to make it 20% longer.
You can also adjust pitch instead of length.
To make a sound that is the same length, but has twice as high pitch, you would double each grain, but play them back twice as fast. That would result in a sound that was the same length but had frequencies that were twice as high.
To make a sound that is the same length, but had half as high of a pitch, you would throw out every other grain, but play the ones you kept twice as slowly. That would result in a sound that was the same length but had frequencies that were half as high.
Congratulations, you now understand granular synthesis!
There are other usage cases of granular synthesis, but they are a lot more exotic – like making crazy cool sounds.
There are also variations of granular synthesis where grains overlap instead of being omitted, when dealing with sounds getting shorter, or grains being played a non integer number of times.
Check out this youtube video to see an insane usage case for real time granular synthesis. WTF!
Youtube: Drum Sound Experiment – Dynamic Granular Synthesis
Some Granular Synthesis Gotchas
If you just do the above, you are going to have some issues with clicking and popping. When you put grains next to each other that weren’t next to each other before, there is going to be a discontinuity in the audio wave form, which translates into very short, very high frequencies that make a popping noise.
What’s more is if your grain size is 20 milliseconds, that means you will get a pop 50 times a second, which means you’ll get a 50hz tone from the popping.
So, how do you fix this?
One way is to use envelopes to do cross fading.
If you wanted to put down grain A and then grain C that would make a pop because A and C expect B to be between them.
Using an envelop to fix this problem, you’d put down grain A, and then next to it you’d put down grain B but have it’s volume go from 1 to 0 over some length of time (like 2 milliseconds). You then ADD grain C on top of grain B, but have it’s volume from from 0 to 1 over the same length of time.
The result is that there is no immediate “pop” from discontinuous wave forms. Instead, it gently fades from one grain to the next over the length of the cross fade.
Note that the above is equivalent to linearly interpolating from grain B to grain C over the length of the envelope, it’s just done in two passes.
I’ve heard that another way to handle this problem is that instead of cutting grains perfectly at the time / length they should be at, you make the cuts at zero crossings that are closest to the desired cut position.
What this does is make it so you can put any grain next to any other grain, and they should fit together pretty decently. This gives you C0 continuity by the way, but higher order discontinuities still affect the quality of the result. So, while this method is fast, it isn’t the highest quality. I didn’t try it personally, so am unsure how it affects the quality in practice.
Another issue you will encounter when doing granular synthesis is wanting to play back a grain at a non integer playback speed. This means you may want to sample index 0, then index 1.1, then index 2.2 and so on. How do you sample a fractional index?
A quick and easy way is to use the fractional part of the index to linearly interpolate between the two samples it’s in between. That means that sampling index 2.2 would be a linear interpolation between index 2 and index 3, and would be 20% of the way from index 2 to index 3. AKA it would be value * 0.8 + value * 0.2;
Another way to sample fractionally is to use cubic hermite interpolation. It’s more expensive to compute but interpolates more smoothly, and perserves first order derivatives.
You can read more about that on my blog post here: Cubic Hermite Interpolation
Lastly, there are a couple parameters to these techniques that have to be hand tuned for the situation they are used in:
- Grain Size – Some usage cases want larger grain sizes, while others want smaller grain sizes. You’ll have to play with it and see what’s best for your usage case. Again, I’ve heard that typical grain sizes can range between 10 and 100 milliseconds.
- Envelope Size – The size of the envelope used for cross fading can affect the result as well. Too short an envelope will start to make popping happen, but too long an envelop will muddy your sound.
Drums and other percussion instruments have a particularly hard time with granular synthesis because they are usually made up of very short but noticeable sounds. If these sounds get (partially?) repeated, it will sound weird and wrong.
Check the links at the end of the post for deeper reads into these topics and more!
Experiments & Results
For all experiments, I used a sound clip from one of my favorite movies “Legend” where Tim Curry plays the devil, and Tom Cruise and Mia Sara are the main characters fighting him.
The experiments use cubic hermite interpolation to sample fractionally, and they use crossfading to fight popping between grains. Everything uses a grain size of 20 milliseconds, and a cross fade of 2 milliseconds.
Naive Pitch / Length Adjustment
To start out, we can play the sound faster and slower to naively adjust pitch and length.
Here’s a fast / high pitched version (70% of time):
Here’s an even faster / higher pitched version (40% of time):
Here’s a slow / low pitched version (130% of time):
And here’s an even slower / lower pitched version (210% of time):
Granular Synth Length Adjustment
Here is a sound made shorter (70% of time) using granular synthesis, so is the same length as the fast/high version, but has the same pitch as the original. Pretty cool, right?
Here is the sound made even shorter (40% of time) so is the same length as the faster/higher version, but has the same pitch as the original again.
Here is the sound made longer (130% of time), but again has the same length as the original.
And here is the even longer version (210% of time).
Granular Synth Pitch Adjustment
If we want to adjust the pitch but leave the length alone, there are two ways you could do that.
The first way is to use granular synthesis to change the length of the sound (longer or shorter), keeping the pitch the same, then use the regular “naive” method to make that resulting sound be the original sound length again.
If you made the sound shorter to start with, this process would decrease the pitch. If you made the sound longer to start with, this process would increase the pitch.
Here is a sound where that process is used to make the pitch about 1.43 times higher (1.0/0.7), but keeps the same sound length.
Another way to get a very similar result is to just change the playback rate of the grains themselves – INDEPENDENTLY of how many times you repeat the grains (0 to N times) which changes sound length.
Here is a sound made doing that. It plays each grain back ~1.43 times faster, but makes a sound that is the same length in the end. My ears can’t tell the difference, even though I do know there is one. This is the technique we’ll be using for the rest of the experiments.
Here is a higher pitched sound that plays back each grain 2.5 times faster (1.0 / 0.4).
You can use this to make the sound lower pitched as well. Here is a sound where we play the grains back at 0.77 speed (1.0 / 1.3).
Here they play back at 0.48 speed (1.0 / 2.1).
Granular Synth Pitch and Length Adjustment
To really drive home how pitch and length adjustments can be made independent with granular synthesis, here is a sound where it plays back more slowly (130% as much time), but it plays back at a higher pitch (~1.43 times as high).
And the opposite… here is the sound played back more quickly (70% as much time), but it plays back at a lower pitch (~0.77 times as high)
Something else fun is that the parameters don’t have to be fixed at runtime.
In the example code, I have a version of the granular synthesis function that calls back to a lambda for every grain to see what time and pitch multiplier it should use.
Here the pitch is on a 10 cycle sine wave going between 0.75 and 1.25.
Here the sound length is on a 13 cycle sine wave going between 0.5 and 2.5.
And lastly, here it combines the pitch and sound length parameters described above.
Below are some great links for more information about granular synthesis. I also recommend searching youtube for “granular synthesis examples” to hear some really out there stuff.
I also want to mention that the basics of this technique was kindly described to me at lunch by a co-worker. His web page is at http://antonte.com/
Very nice write-up, just as I was considering writing some audio synthesis code. Not knowing about granular synthesis, my first idea to avoid a frequency shift was to double duration in frequency domain. In theory reverse FFT transform would then return a non-frequency shifted result. Doing this would also require splitting the source into “grains” for FFT transform though, and might require a lerp to avoid popping too. So it’s nice to see that a brain-dead split and lerp in intensity domain works as well. My intuition tells me that if you do some math, the two approaches might be equivalent too.
LikeLiked by 1 person
I’ve tried this! They’re not actually equivalent. For one thing: in order to get gaps between each harmonic (so that each harmonic ends up in its own “grain”, which you need otherwise) you need to take the FFT over quite a few waveform cycles. However, granular synthesis can go down to grains the size of one wavelength, so you can get great time-resolution. It does do some things well: the time-profile of each harmonic is essentially encoded in the amplitudes and phases of the “smudge” around each harmonic in the FFT, so keeping that smudge together in a single grain preserves this. However, it doesn’t do as well for frequency changes (e.g. vibrato) and the harmonics can end up not being in line the way they should.
Pingback: New top story on Hacker News: Granular Synthesis in C++ – Tech + Hckr News
Pingback: New top story on Hacker News: Granular Synthesis in C++ – ÇlusterAssets Inc.,
Pingback: Granular Audio Synthesis – Startupon.net
interesting… can one in similar way do granular comparison over 2 audio-signals? of different time-axis-consistency, i.e. tape vs LP..