## Cryptography 101: Encryption – Asymmetric Keys

This is the fourth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didnt get published because the NSA didnt want the info getting out into the public. Yay for the 1st amendment!

## Asymmetric Key Encryption (Public and Private Keys)

Unlike symmetric key encryption, which uses the same key for encryption and decryption, Asymmetric key encryption uses one key for encryption and a different key for decryption.

This probably sounds strange why you would want to have two passwords, but the reason is that you keep one for yourself, and give the other one out to another individual or a group.

Because you keep one to yourself (private) and give the other out (public) these are called public and private keys, and this technique is called Public Key Cryptography.

Depending on which key you keep private (the encryption or decryption key), you can get different effects.

## Usage Pattern 1 – Private Encryption, Public Decryption

If you keep the encryption key secret, but publish the decryption key out to the public (or to a group of people, or to another individual), what that means is that you can encrypt data which can be read by anyone. What is useful about this is that they have to use your public key to decrypt the data, so they know it was encrypted with your private key, which means they can be reasonably sure that you were the one that wrote the message. You have effectively cryptographically signed your message so that people know it was in fact you that sent that message.

People use this technique all the time in computers, this is how you can verify that something is from a legitamate source, regardless of if we are talking about a web page (HTTPS), a valid device driver (digitally signed device drivers), or other things of that nature.

Another neat thing about this usage pattern is that getting creative, you can also be ensured that the message or data hasn’t been tampered with.

For instance, let’s say you were making a computer operating system where you only allowed the computer to run trusted (signed executables).

Re-visiting a technique mentioned in the first article in this series on hashing, a “signed executable” might look like the below:

• [Cryptographic Hash of Unencrypted Executable Data]
• [Encrypted Executable Data]

So, you as the “central signing authority” for the operating system would receive programs from people wanting to release software on your operating system.

First, you would put the software through it’s paces via analysis and testing to make sure  the program worked as intended, was up to the level of quality you wanted software on your OS to be, followed any specific rules about how the software should behave and interact with the rest of the operating system, and also you would make sure the software wasn’t malicious.  Also, you would have to make sure the software wasn’t insecure in any ways that could compromise the rest of your security (for instance, if it had a Buffer Overflow, that could let attackers run arbitrary, unsigned code on your operating system, causing viruses to spread and other malicious things).

Once the program is verified safe, next up you would make the hash of the unencrypted program, write that to a file, then  encrypt the program with your private key and write that to the file after the hash.

You now have a trusted / signed executable to distribute.

When a user downloads this executable from your application store and tries to run it, the operating system could take the following measures to verify that the executable was trusted and unaltered from the time of it’s signing:

1. Unencrypt the executable using the public key.
2. Hash the unencrypted data and ensure that it matches the hash at the beginning of the file.

If the hashes match, you know that the executable was indeed signed by the central authority, and that it has not been altered in any way since it’s signing. Therefore, it is safe to run!

I am pretty sure variations of this sort of algorithm are used by things such as the xbox, playstation and iphone / ipad devices.

## Usage Pattern 2 – Public Encryption, Private Decryption

The other way to use asymmetric key encryption is to publicize the encryption key, but keep the decryption key private.

What this allows is for anyone to encrypt a message that only you can read.

One thing you could do with this is would be to be able to communicate securely with people if all you had was public communication.

For instance you could post to a public forum saying “This message is for Jesse”, and then put the encrypted data after that.

Since only Jesse knows his private key, and thus only Jesse can decrypt the data, only Jesse will be able to read your message, even though it is visible to everyone.

Despite this, there are still several unknowns in this particular communication, including:

1. Jesse doesn’t know that you really are who you say you are
2. You don’t know that Jesse got the message
3. Jesse doesn’t really know that the message wasn’t tampered with (well… if it’s a text message you are sending, and jesse unencrypts it and it’s garbage, he knows that the message was tampered with, but if the expected data was not so obvious when it was wrong, he may not be able to know that the message hadnt been tampered with).

But those problems, and others, are solvable, which leads to our next point…

## Cryptographic Protocols

A neat thing about cryptographic techniques like this one, symmetric key cryptography, and hashing is that they are basically just building blocks that you can stack together in different ways to be able to do useful and interesting things.

Once you learn some of the basic building blocks of cryptography (what this cryptography 101 series of articles is supposed to be all about), you can then learn more about how to put those building blocks together to preform useful tasks.  The recipes for preforming these useful tasks are called Cryptographic Protocols and they can (and often should) contain more than just cryptographic techniques.

In the first usage pattern, I showed how combining asymmetric key encryption with hashing can provide you with a system for creating and verifying trusted executables.  That series of steps for creating and using trusted executables was a cryptographic protocol that contained important steps even beyond just encryption and hashing – such as verifying that the executable was not malicious or insecure.  Leaving those steps out creates a big security hole, so they are very important to the overall protocol.

For the second usage pattern, here’s some cryptographic protocols to solve the problems i called out:

1. To solve the issue of Jesse not being sure that you are who you say you are, you could take the encrypted message you created, and sign it with your own private key (of which the decryption key is public… this is usage pattern 1).  This way, when Jesse gets the encrypted message from you, he first unencrypts it with your public key, and then unencrypts it with his own private key.  If the message comes out as garbage in the end, he knows that one of the two steps failed.  Specifically, either it wasn’t YOU who sent the message, OR, you used the wrong public key when signing a message to send to him.  Jesse doesn’t know which step went wrong, but he does know the message is invalid one way or another.
2. To solve the problem of you not knowing that Jesse got the message, you could tell Jesse in the encrypted message “Jesse, if you get this message, respond by sending me back an encrypted message that says ‘the password is forty two'”.   Then, if Jesse got the message, he could encrypt a message saying “the password is forty two” using your public key, and then post it on the board again for you to unencrypt with your private key and see that he got receipt of your message.  While it’s true that anyone is able to encrypt messages meant for you, and so anyone could have written that message, there is some level of security there because the specific message you said to send was encrypted in such a way that only Jesse could have read it.  This way, you can be reasonably sure that Jesse got the note.
3. To solve the issue of Jesse not knowing if the message was tampered with at all or not (in the case that it’s hard to tell if you got the right data out or not), one way would be to just put a hash of the unencrypted data on the front of the message.  You’d have to agree with Jesse in advance on the protocol, but using the hash again, it would let Jesse know that the data hadn’t been tampered with.

## Generation of Key Pairs and Algorithm

By the very nature that these keys work in tandem means that they are somehow linked together mathematically.

I was trying to think of a really simple way to show how public and private keys work together and how they are linked, with a minimal piece of sample code. I thought i had figured out a simplified way, but unfortunately it turned out I was mistaken and my method didn’t work at all.

So, I have to refer you to this page which is pretty darn helpful for understanding how the real thing works with RSA, but unfortunately it doesn’t explain the full nitty gritty of WHY it works to my liking.  Still a very good read though: http://sergematovic.tripod.com/rsa1.html

## Common Algorithms

Some commonly used Public Key Encryption algorithms are SSH, IKE and apparently even Bitcoins use it!.

## Oops!

After I wrote up this article, my friend Patrick corrected me saying that the process i described is not the usual process for digitally signing data. He said:

You got signing a little mixed up for asymmetric. Traditionally the process is:
1. Alice creates a public and private key pair.
2. Alice shares her public key with the world.
3. Alice never shares her private key.
4. Bob can now encrypt messages using Alice’s public key and only Alice can unencrypt them using her private key.
5. Alice can take a hash of something she wants people to verify as coming from her. Alice then signs that hash with her private key. Now Bob can verify the item coming from Alice by taking the hash of the data and comparing it against the hash in the signature using Alice’s public key.

RSA Labs Digital Signing Explanation

There are two reasons that I can think of why that process is better that the one I described:

1. You can sign data without obfuscating it via encryption.
2. Public key encryption takes a lot of processing power apparently, so you want to minimize how much data you encrypt with it.  This method encrypts a far smaller (and constant) amount of data.

Thanks for the correction Patrick!

## Cryptography 101: Encryption – Symmetric Keys

This is the third article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didnt get published because the NSA didnt want the info getting out into the public. Yay for the 1st amendment!

## Symmetric Key Encryption

Symmetric key encryption is a fancy name for the type of encryption you are probably most familiar with, which is using a password to scramble and unscramble data to make sure only certain people can see it.

This is in contrast to asymmetric key encryption, where you have two passwords; one for encrypting and one for decrypting (The next article is going to be on asymmetric key encryption).

## Security

There are numerous symmetric key encryption algorithms out there but they all have one thing in common: their security relies on only the right people having the password, and the assumption that the best way attackers have for getting the plaintext from the ciphertext is to guess the password via brute force.

In good (modern) algorithms, people say things like “on average it will take geological or astronomical amounts of time to guess a password with the computing technology of today and the projected future” so they are reasonably sure people won’t be able to brute force the password in any useful amount of time.

Quantum computers give some forms of cryptography a scare though, because there is something called Simon’s Algorithm which is a quantum computing algorithm that can brute force search ANYTHING with exponentially fewer operations than classical computing.  This means it can brute force guess passwords of an encryption algorithm a lot faster than a normal computer.  At the time of writing this, I think the record for quantum computing power is something like having 4 cubits work together to do some simple math operation (like multiplication).  We could be on the precipice of disaster regarding cryptography, but luckily there are encryption algorithms that take the same amount of time, or longer, for quantum computing to solve, so it isn’t all doom and gloom.

When decrypting data with either symetric or asymetric key encryption, there is no built in way to know if you had the right password or not. You can know by looking at the recovered plaintext and seeing if you got junk out, or meaningful data, but if you don’t know what the data out is supposed to be exactly, or what it’s supposed to look like, there’s no way to know if decrypted it correctly. This makes it so sometimes it can be difficult for attackers to even KNOW if they have guessed the right password or not, which is good for us folk trying to protect data.

Just like a good hashing algorithm, small changes in input should ideally yield large changes in output, which makes it a Chaotic Function and makes it so the cipher text gives as little information about the plaintext as possible.

Sometimes people will use multiple encryption algorithms on a piece of data in the hopes of making it harder to crack, which sometimes works, but can also be fairly dangerous.

To understand the danger, consider how every program, no matter how complex, is essentially a traditional algebraic function (with perhaps lots and lots and lots of terms).  For encryption, the input is the plain text and key, and the output is the cipher text.

Now, just like in junior high and high school, sometimes when you plug one function into another like f(g(x)) and preform algebraic substitution, terms from f and g maybe cancel out.  You may end up with a function that is less complex than either f(x) or g(x), or it just may be less complex for certain values of x.  An attacker could exploit these attacks to their advantage and it might be easier for them to recover some or all of the plaintext because you used two encryption algorithms instead of one.

On the other hand, using multiple algorithms, or the same algorithm multiple times (perhaps with different keys) can also make it a lot more secure.  It’s just something to be mindful of.

Clever programmers and mathematicians sometime come up with encryption techniques where attacking the algorithm itself is the literal equivalent of having to solve famous unsolved math problems from the ages.  These often seem really secure because for some of these problems, the best and brightest minds in all of history have been fighting with the problems for hundreds or thousands of years and making no progress.

Every now and then, some smarty figures one of these out though, and suddenly, encryption algorithms based on it become essentially worthless.

Another common way that people attack ciphertext is via something called a “known plaintext attack”.  What this means is that if the attacker knows any part of the plaintext before it became ciphertext, they can sometimes leverage that knowledge to know a bit more about the key or algorithm used to encrypt the data.  That simplifies their work and makes it more likely that they can get the plaintext back without having to revert to brute force.

One really common way this comes up is if people do something like compress their data before encrypting, or they encrypt known file types like executables, word processing documents, image files etc.

The reason for this is because in all of those file types, there is a standard, well known header that those files have, which allow other programs to use them.  That header data is known plaintext and can be used by an attacker to get more information how to recover the plaintext.

For all the clever people out there trying to make encryption based on super advanced mathematics, in the end, some of the very most secure algorithms out there are based on very simple computing operations such as addition, subtraction, bit rotation, and XOR.

As an example, there is an algorithm called RC5 which only uses those basic operations (you can find the source code for it easily!) and yet is extremely secure. The makers of RC5 published their source code, and encrypted some data with various key sizes (7 byte, 8 byte and 9 byte) in 1994, and it took something like 5 years for the first one to be cracked (via brute force), 10 years for the second, and they project that cracking the third will take 200 more years. More information available here: RC5

## Algorithm Components

A symmetric key algorithm is any deterministic algorithm where given a key, has the ability to obfuscate (hide / scramble) data, and then later given the same key, has the ability to undo the operations that it did to get the original data back.

Since all operations have to be reversible, that limits you to non destructive operations.  XOR isn’t destructive, because A XOR B XOR B = A.  Addition and subtraction isn’t destructive, because A + B – B = A (even true when you wrap around the max size of your integer).  Division is destructive however, because when you divide on a computer, you have finite precision (even with floating point numbers) which means you can never fully recover the origional data when trying to undo a division with a multiplication.  Bit rotation is another operation that isn’t destructive.  NOT isn’t destructive, but AND and OR are destructive.  Another operation that isn’t destructive is moving bytes around, since you could just do the moves again in reverse order to get the original data back.

As simple as all this sounds, these are essentially the building blocks of all encryption algorithms.

## Example Algorithm

Here’s an example algorithm that you could use to encrypt and unencrypt data.  I don’t do any byte swapping (moving bytes around), or bit rotation, but those would be some good ways to improve it.
``` //Takes a pointer and length so you can encrypt binary data as well as text //the pOutData parameter should point to memory that is the same size as pData //If bEncrypt is true, it will encrypt data. If bEncrypt is false, it will decrypt data. void EncryptData(const unsigned char *pData, int nDataLength, unsigned char *pOutData, const unsigned char *pKey, int nKeyLength, bool bEncrypt) {   int nKeyIndex = 0;   unsigned char nRunningSum = 0;   for(int nDataIndex = 0; nDataIndex < nDataLength; ++nDataIndex)   {     //update our running sum     nRunningSum += pKey[nKeyIndex % nKeyLength];     //get our current byte of plaintext or ciphertext     unsigned char nDataByte = pData[nDataIndex];     //to decrypt, it subtracts a running sum of the key then xors against the current key byte     if(!bEncrypt)     {       nDataByte -= nRunningSum;     }     //do our xor, whether we are encrypting or decrypting     nDataByte = nDataByte ^ pKey[nKeyIndex % nKeyLength];     //to encrypt, it xors against the current key byte and then adds a running sum of the key     if(bEncrypt)     {       nDataByte += nRunningSum;     }     //set the output data byte     pOutData[nDataIndex] = nDataByte;     //move to the next byte in the key     nKeyIndex++;   }   } ```
Also, here's some example code of how to use this function:
``` void DemoEncryption() {   //our key and plain text   const char *pKey = "MyKeyIsFairlyLongButThatIsJustFine!124351 seven";   const char *pPlainText = "This is some plaintext, how do you do?";```

```   //allocate space for our cipher text and recovered plain text   unsigned char *pCipherText = new unsigned char[strlen(pPlainText)];   unsigned char *pRecoveredPlainText = new unsigned char [strlen(pPlainText)+1];   //print out our plain text   printf("%s\n\n",pPlainText);   //encrypt the plain text   EncryptData((unsigned char *)pPlainText,strlen(pPlainText),pCipherText,(unsigned char *)pKey,strlen(pKey),true); ```

```  //print out the cipher text as hex digits   for(int nIndex = 0; nIndex < strlen(pPlainText); ++nIndex)   {     printf("%0.2x",pCipherText[nIndex]);   }   printf("\n\n");      //decrypt the cipher text to recover the plain text   EncryptData(pCipherText,strlen(pPlainText),pRecoveredPlainText,(unsigned char *)pKey,strlen(pKey),false);   //print out the recovered plain text after we null terminate it   pRecoveredPlainText[strlen(pPlainText)]=0;   printf("%s\n\n",pRecoveredPlainText);   //free the memory we allocated   delete[] pCipherText;   delete[] pRecoveredPlainText; } ```

## Common Algorithms

Some commonly used symmetric key encryption algorithms in use today are AES, Blowfish and 3DES.

## Until Next Time!

That's it for symmetric key algorithms, next up I'll be talking about asymmetric key algorithms, which have some pretty interesting uses.

## Cryptography 101: Encryption – One Time Pad

This is the second article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didnt get published because the NSA didnt want the info getting out into the public. Yay for the 1st amendment!

## Plaintext, Ciphertext and Keys

When talking about encryption, you’ll often hear two terms: Plaintext and Ciphertext.

The plaintext is the unencrypted data and may be either text or binary data.

Ciphertext is the encrypted data.

Ideally, the ciphertext will give no information about the nature of the plaintext that created it, other than perhaps the size of the plaintext itself.  Good ciphertext will look indistinguishable from random numbers, both by the human eye and mathematically.

This is because the point of encryption is to hide any patterns in the data, and good encryption will hide all discernible patterns. The only possible exception to this would be if the encryption process made ciphertext with misleading patterns that didn’t give any information about the plaintext.  I’m not sure if this comes up in practice, but it definitely could.

Another term you’ll hear often is “Keys”. A key is just the data that you encrypt or unencrypt data with. You can think of it as the password.

The one time pad is an extremely simple, yet secure way of encrypting data.

It is so simple that it only uses the xor operation, and is so secure that the ciphertext is literally uncrackable if done correctly.

The downside is that it requires a lot of pre-shared data which gets used up as you encrypt data. When you run out, you have to share more of this data if you want to keep communicating with that person.

This pre-shared data is the key used for encryption and unencryption.

## Implementing

To use a one time pad, you first gather a large amount of random data and share that with the person you want to communicate securely with. This is the one time pad itself and you’ll want one byte of random data for each byte of information you want to send to that person.  This step is also the crux of the security.  You need to make sure that nobody else is able to get the one time pad except your intended target, and you also need to ensure that you have high quality random data (more on that later on).

To encrypt data, you take one byte from the one time pad for each byte of data you want to encrypt and XOR them together. When you are done, you throw away the used bytes of the one time pad and never use them again.

Then, you send the ciphertext to the person you already pre-shared the one time pad data with.

To decrypt the data, that person xors each byte of the encrypted data with a byte of the one time pad, and they also throw away each byte used of the one time pad just like you did.

When they are done decrypting, they will have the plaintext data, and their one time pad will be in the same state that yours is in (ie their next number will be your next number).

From here you can rinse and repeat until you run out of one time pad data.

Super simple, and as long as nobody else has your one time pad data, and your one time pad data is truly random, nobody will be able to crack your ciphertext and get the plaintext.

## The Importance of Randomness

Besides securely transmitting the random data,the other crux of the security i mentioned was the quality of the random numbers in your one time pad.

The reason this is important is because if the numbers aren’t truly random, there will be patterns in the data. If there are patterns in the data, people can possibly discover those patterns, thus being able to separate the plaintext from the key and unencrypting some or all of your data.

Randomness comes up EVERYWHERE in cryptography, both in input and output to cryptographic algorithms. Because of this, truly random data is often somewhat of a commodity to cryptographers. Since re-using random data means that its slightly less secure (would be attackers have a pattern to gain knowledge with if you re-use your random numbers!), it’s also a consumable commodity!

In fact, there are famous books that are nothing but hundreds and hundreds of pages of random numbers generated from various real world sources – such as taking the wind speed over time in Juneau, Alaska and multiplying it by static gathered from a radio antenna which is tuned to dead air. Using real world data like that, people can be relatively sure that the data doesn’t have any discernible patterns. They just have to watch out for those pesky physicists unlocking the nature of the universe and finding the patterns in the background radiation 😛

I’m not even joking about these books by the way, check this out, here’s one such book!
A Million Random Digits with 100,000 Normal Deviates

Using random numbers from a published book makes your random numbers slightly less random (since other people have the book too, and attackers may notice it on your bookshelf or something), but so long as you don’t just use the numbers of the first or last pages (or anything else predictable), and the book actually contains high quality random numbers, it ought to be fine.

you can also BUY large amounts of high quality random data online from places like random.org.

The astute reader might ask “Why don’t i just use a pseudo random number generator on each side and never run out of one time pad data?”.

Well, if someone knows the PRNG you are using, and your seed, they would be able to unencrypt your data just like your intended target can.

## Specific Attack Against Randomness

As an extreme example, lets say that instead of random numbers, your one time pad data is all 0xFFFFFFFF and that you are using it to encrypt a text file (say, this article for instance).

When you encrypted your data by XORing each byte against 255 (0xFF), all the bits of each byte would be flipped from 0 to 1 or 1 to 0.

While it’s true that it would make the data un readable, and seemingly random, garbage data to the human eye, mathematically it’s a very different story.

If someone were analyzing your ciphertext, they would first notice that the byte value 154 (which looks like Ü and has a binary value of 10011010) occurs in the ciphertext roughly the same amount that the letter ‘e’ appears in the typical english language text document. This would be astute because that value of 154 is just the flipped bits of ‘e’ which has a byte value of 105 and a binary value of 1100101 (the binary bits are just flipped due to the XOR against 0xFF).

Then, they may notice the same for other letters… that some other value occurs as often as you’d expect an ‘o’ to appear in english, or an ‘m’ etc.

Pretty soon they have a clear picture that this is english plaintext, and they can start replacing letters with what they seem like they should be statistically (for the statistically significant letters).

After that, they have some of your plain text, and figuring out the rest is similar to playing sudoku… figuring out which letters fit where, based on how words are spelled, and then doing a find / replace in the entire document for each letter you figure out.

In the end, they have your plaintext and your encryption failed you.

This is an extreme case that is really simple to break, but hopefully you can see that if you even use slightly lower quality random numbers (such as the built in rand() function of C++, whether or not you use srand(time(0)) or not!) that you open yourself up to attack and it can compromise your whole communication stream.

## Requiring Less Pre-Shared Data

You can modify the one time pad algorithm to use less pre-shared data if you are ok with the changes in your security profile (your data may be weaker against some attacks, stronger against others).

There are many ways to skin a cat but I’ll just talk about a couple.

One way would be to generate more random data from the random data you do have. For instance, if you and the person you are pre-sharing data with agree on a protocol of MD5 hashing every 100 bytes of one time pad data to generate more random bytes that you can interleave with your one time pad data, you would have a way of generating 16% more one time pad data than what you gathered or shared with the other person. (16% more because MD5 hashes of 100 byte blocks spit out 16 byte hashes of seemingly random numbers – see the previous article on hashing for more information!).

However, doing this obviously makes the “random” data *somewhat* lower quality since there is a pattern to some of the random data. As non obvious as that pattern may be, if someone were to do fancy mathematical analysis of the data, this sort of technique may cause patterns to crop up which lead to a “chink in the armor” giving the attacker a foothold in recovering all or some of the plaintext.

Another way of making your one time pad go farther is instead of XORing the one time pad data against the plaintext and ciphertext to encrypt and unencrypt, you can use the one time pad to give you the keys (passwords) to encrypt / decrypt each communication.

For instance, if you and the person you are communicating with agree in advance on a symmetric key encryption algorithm (more on this topic in the next article!) that takes a 16 byte encryption key, you could use every 16 byte block of one time pad data for an entire single message no matter how large the message is.

For instance, you could encrypt 2GB of data using the first 16 bytes of a one time pad, send that to the person, then you encrypt 500MB with the next 16 bytes and send that to the person.

You’ve effectively used 32 bytes of your one time pad to encrypt 2.5GB of data, which is a crazy good ratio compared to the traditional one time pad protocol which would have required 2.5GB of pre-shared one time pad random data.

If you go this route, your ciphertext now becomes vulnerable to whatever attacks your symmetric key encryption algorithm are vulnerable to though. If the algorithm you are using turns out to have a serious flaw that mathematicians find out about (such as there’s a really easy way to recover the plaintext – this happens fairly often believe it or not!), your whole communication channel is screwed, whereas with the one time pad, it’s just the quality of your random numbers, and the security of your pre-shared data that define the security. So, there are definitely pros and cons to weigh.

## Other Weaknesses

There are a lot of ways to attack each cryptographic technique, and if you are serious about cryptography you really need to read up on a lot of things and be extremely clever, thinking of every possible situation that anyone else might think of.

Security is hard because often times you have a limited amount of time to implement your security (because you need to ship your software or open your service to the public SOME DAY), and there are most certainly more attackers than there are security professionals on your team, and they have all the time in the world to search for what you’ve missed! Just as there is no rest for the wicked, the same too is true for security professionals.

I mentioned that the quality of your random numbers and the security of your pre-shared data was the lynchpin of protecting against people getting your plaintext from your cyphertext, but there is another way to attack the communication channel as well.

Namely, if someone were to intercept a message between you and your target person, they may not be able to get your plaintext out, but if they can keep that message from getting to your target, and do so in a way that you aren’t aware of this, they can completely break your communication channel.

The reason for this is that doing this makes the one time pads of you and your target person get out of sync when you throw away one time pad data that the target person did not throw away. This means that the random numbers you are using to encrypt your data is not the same numbers your target person is decrypting data with, so they will get garbage, random data as output and not be able to recover the plaintext.

A malicious person in the middle was able to thwart your ability to communicate securely!

Also, if a person was able to modify the ORDER that the target person got the encrypted messages in, they would be able to break the channel that way as well (at least temporarily) since it would make the recieving person unencrypt the messages with the wrong pieces of data. The next message the person got would be unencryptable in this case though, since the same number of bytes were used up by the out of order messages as if they had come in the right order.

This is not the traditional man in the middle attack, but it is definitely *A* man in the middle attack.

As with so many things, there are often strange, non obvious connections between different subjects.  Case in point, one way to protect against these sort of attacks of lost or re-ordered messages would be to implement the sorts of algorithms used in network programming (like those used in TCP/IP) that ensure “guaranteed” and “in order” communication between two computers or individuals.

Going this route, just like how computers on the internet can know when they got message B but haven’t received message A yet, or that when they sent a message to another person that it never got there, you too would be able to know if a message got to the target, and they would be able to know if they have received messages out of order or not.

## Until Next Time!

That’s the essence of the one time pad and I hope you found it interesting!

Next Up I’ll be talking about symmetric key algorithms which are the more traditional way of encrypting where you use a password to protect data.

For those interested in cracking encrypted data (which technically is against the DMCA these days, but used to be a common academic activity, and a way of weeding out insecure algorithms), here’s a nice morsel for you. It’s hexadecimal encoded encrypted data.  Every 2 hex characters equals one byte of encrypted data. If you use the information from the article, you ought to be able to crack it (there’s an easy way and a hard way).

And no, cracking the encrypted data below is not even technically against the law, I’m giving you explicit permission to crack it if you can (:

d19c9092d08a8c9a8dd08c96878b868c86929d90938cdfdfbc979a92968c8b8d86c5978b8b8fc5d0

## Cryptography 101: Hashing

Welcome to the first article in a series aimed to teach the basics of cryptography:

In this digital age, cryptography is more important than ever.  It’s used to protect financial transactions, ensure the anonymity of political dissidents, protect private conversations, help prevent cheating in video games and many other things as well.

DISCLAIMER: These articles are meant for educational purposes only.  The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications.  For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved.  Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

## Hashing in Computer Science

You may be familiar with the term “hashing” from computer science. In computer science, a hash function is a function which takes input data, preforms some operations on it and spits out some (usually) smaller output data that can be used more or less as a unique identifier for the input data.

Common uses of hashing include:

• Hashing the contents of large files to be able to compare the hashes to quickly know if they are the same or different, instead of having to compare the files byte by byte.  This is especially useful when comparing files over a network connection.
• Hashing peices of data which are difficult or time consuming to compare (such as strings) and using the hashed value as a “look up key” within a database or array or list, so that you can look up items very quickly by their hash, instead of having to do more expensive string compares (or whatever other more complex comparison and lookup methods).

## Hashing in Cryptography

As you can probably guess, since hashing makes large pieces of data (such as entire files) into small pieces of data (often only a handful of bytes large), there are many pieces of larger source data that can result in the same smaller hashed data.  When this happens, it’s called a hash collision and a good hash function will do it’s best to minimize collisions for optimal performance.  The more output bits you have, the more “space” you have before a collision is unavoidable.

Often times, a good hashing algorithm will have 2 properties to minimize collisions…

1. Small changes in input give large changes in output.  In other words, it’s very sensitive to initial conditions and so is a chaotic function (http://en.wikipedia.org/wiki/Chaos_theory)
2. If you give it a set of well distributed random inputs, it should give a well distributed set of random outputs.  Heck, if you give it any set of (varying) inputs, it should give a well distributed set of random outputs ideally.  By random i mean no discernible patterns.

If these things aren’t true, the hashed output can give clues as to the nature of the input, or, it can make it easier to provide input that hashes to the same output (which is the main way to attack hash based security).

For instance, if your hashing algorithm made an 8 bit hash (very small!) that always (or often) set the 7th and 8th bits to 1, that means effectively you really have a 6 bit hash, because 2 of the bits are almost always the same. In general, more bits means more security, since it’s harder to get a hash collision on purpose.

Quick aside, can you think of something else with these properties?  Some deterministic algorithm that spits out chaotic, well distributed, seemingly random numbers based on (perhaps) non random input?  How about a pseudo random number generator?  There is a lot of crossover between these two types of algorithms and I find it pretty neat that they are working towards almost the same goals, but that they are used for such different things.

Also as you might guess, hashes are one way.  If you are given hashed data, it’s difficult or impossible to work backwards and get the source data back again.  In fact, you often hear hash functions referred to as “one way hash functions” because of this.  This is important because in cryptographical uses you want a hash to reveal as little information about the source data as possible.

## Example Uses of Cryptographic Hashing

Here’s two examples of places where hashing comes in handy. One is for protecting passwords, and the other is for protecting save game data of video games.

For protecting passwords, many times when you have a large online service such as facebook, youtube, etc, there will be a central database (cluster) storing everyone’s account information, including their passwords.

If someone were to hack a server and get access to the user database table, they would have everyone’s username and password and the users would be screwed.

#### Protecting Save Game Data

For protecting save game data, hashes are used in conjunction with encryption to prevent both read and write access to the save game data.

To write this protected save game data, you first hash the unencrypted save game data, and write that to the front of the file.  Next, you encrypt the save game data and write that after the hash.

When reading save game data, you read in both the hash and the encrypted save game data.  Next you unencrypt the save game data and hash it.  Then, you can compare the hash you made with the hash stored in the file and if they don’t match, you know that someone tried to tamper with the file and you can consider it invalid / corrupt.

Also, since the save game data is encrypted, it’s difficult for a user to read the data in your save game data.  Thus you protect the file from both reading and writing.

It’s possible that the person could modify the data in such a way that it will unencrypt and then hash to the same hash value stored in the beginning of the file, but it’s extremely unlikely, and also even less unlikely that doing so will result in something favorable for the attacker.  They can’t even be sure they are increasing a value thanks to the encryption function scrambling the data completely.

## Hashing Algorithm Overview

In a nutshell, besides all the stuff we talked about above, a hashing algorithm is just a deterministic algorithm (meaning it acts the same way every time, no randomness) that takes some input, chews it up, and spits out some (often) smaller piece of data to represent it. When chewing it up, it can do things that are destructive to the data (such as integer division, which loses precision) and isn’t just limited to non destructive operations like encryption algorithms are (non destruction operations can be reversed, such as XOR, addition, subtraction, bit rotations).

As an extra piece of security, people often “SALT” their hashes which means they hash some constant before hashing whatever data they want to hash. This constant is called the salt and you can think of it kind of like a password. This way, even if someone knows what algorithm you are using to hash data (such as the popular MD5 or SHA-1 hash functions), you’d also have to know the salt used to more effectively attack the system.  It’s a little extra bit of security, which is always nice.

## Example Hash Function

Here’s an example hash function in C++.   Again, note that this is not really fit for real world use or important situations, it’s just for educational purposes.  You’d want to do more “chewing” and use different operations, bit rotations to make sure all the bits got “hit” by the xor’s, etc.  Check out some more complex, real world hashing algorithms for more info!
``` //assuming sizeof(int) == 4 typedef unsigned int uint32;```

``` //Takes a pointer and length so you can hash binary data as well as text //note that this function as is won't give the same answers on machines with different endian-ness uint32 Calculate4ByteHash(const unsigned char *pData, int nDataLength, const unsigned char *pSalt, int nSaltLength) {   //setup some variables   uint32 nHash = 0;   unsigned char *pHashPointer = (unsigned char *)&nHash; ```

```  //salt the hash   for(int nIndex = 0; nIndex < nSaltLength; ++nIndex)   {     pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pSalt[nIndex];   }   //hash the data   for(int nIndex = 0; nIndex < nDataLength; ++nIndex)   {     pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pData[nIndex];   }   return nHash; } ```

## Rainbow Tables

Assuming the algorithm meets the critera above, the only real way to attack something secured by hashing (besides asking the nice receptionist for the secret info while batting your eyelashes) is to brute force hash a bunch of values until you find something that gives the same hash as what you are looking for.

Unfortunately, there are something called "Rainbow Tables" where people have gone through and created tables of unique hashes and "source data" that results in those hash values for common algorithms.  This way, if for instance, an attacker saw that a hashed value was "3", and he knew you were using the MD5 algorithm, he could look at an MD5 rainbow table to find the value he could put into your system to result in a hash value of "3" and thus he'd gain some ground at attacking your security.

Of course, if you salt your hash, he would have to find out your salt value too and perhaps salting would invalidate the rainbow table entirely (depending on the algorithms used).  Also, the more bits your output hash contains, the larger a rainbow table would have to be, so if you really wanna screw with would be attackers, make your output bit count larger - it makes their job exponentially harder! (:

## Popular Hashing Algorithms

Two common hashing algorithms used for various real world applications are MD5 and SHA-1.  You've probably seen them around, especially if you've used open sourced software.

## Recording lagless demo videos of a laggy game

Often times when developing a game, you’ll want to record a demo video to show to a publisher, show at E3, post on kickstarter, youtube, or other places to help generate interest or gain funding to keep your project going.

Unfortunately, the point in time that you need a video is often in the beginning of the project, when your game probably doesn’t run very fast, or might have performance spikes, making it difficult to get a high quality video capture.

Many times, developers will have to have a performance push to get the game up to speed for a demo video, spending time on “demo hacks”, which are often just throw away code for after the video is made.  I’ve been through a couple of these myself and they are not fun, but they are an unfortunate necessity.

This article will explain a fairly simple technique for getting a full speed recording of your game engine with perfectly synchronized sound, no matter what speed your game actually runs at, saving you time and effort, not having to waste time on demo hacks just to get a presentable video.

Playable demos are a whole other beast,  and you are on your own there, but if a video will suit your needs, you’ve come to the right place!

I’ve used this technique myself in a couple different games during development, and in fact included it as a feature of one PC game I shipped in the past, called “Line Rider 2:Unbound”, so this is also a technique for adding video recording to any game you might want to add it to.

## Out of the box solutions

There are various “out of the box” ways to record a video of your game, but they have some downsides which make them not so attractive.

For instance, you can get fraps which will record any application’s audio and video and you could use that to record a video of your game. The downside here is that if your game lags, so does the video, so we still have that problem. Also, the act of recording competes with your game for resources, causing your game to run at an even lower FPS and making an even worse video.  Fraps is also limited to specific platforms, and you may be working on an unsupported platform.

lastly, if you want to include this feature of video recording in a shipped product, you will have to license fraps for that use, which may be prohibitive to your project’s budget.

Other video recording software has the same or similar issues.

## Rolling your own – Video

Making your own video recorder built into your game has some real easy to hit pitfalls that I want to talk about.

When considering only the video portion (not audio yet), our aim is to write all the frames to disk as individual image files (such as png, or raw uncompressed image files), and then after recording is done, use something like ffmpeg to combine the frames into a video. Writing a compressed image file (such as png or jpg) for each frame may save disk space, but may take longer for your computer to be able to process and write to disk. You will likely find that a raw file format is more performant, at the cost of increase disk space usage, but hard drives are cheap these days and everyone has huge ones. Also, at this point you probably want to use lossless image compression (such as png, or a raw image file) for your screen captures so that you don’t have compression artifacts in your screen captures. When you make a final video, you may choose a more highly compressed video format, and it may introduce it’s own artifacts, but you want to keep your source files as clean as possible so that you don’t introduce UNNECESSARY artifacts too early in the process.

If you dump each rendered frame to disk, the disk i/o can drag your game’s frame rate down to a crawl.  You might think about having the disk write happen on another thread so the game isn’t limited by the disk i/o, but then you’ll have to keep a buffer of previous frames which will grow and grow and grow (since you are making frames faster than it can write to disk) until you run out of memory.  It’s a losing battle for longer videos.

You could get a faster drive, configure a striped raid array, use a ram disk, or things like that, but why fix with hardware what you can fix in software?  Save yourself and your company some cash.

Similarly to the fraps problem, when you record video, that will likely affect the frame rate of your game as well, making it run slower, making a lower quality video because frames will be skipped – assuming you are using variable frame rate logic – making it so that you either have to have a “laggy” looking video as output, or your video will actually appear to speed up in the places that you encountered lag while recording, which is very odd looking and definitely not demoable.

The solution (which might be really obvious to the astute reader) is to make your game run your game’s logic at a fixed rate, instead of making it be based on frame time.  For instance, instead of measuring the time between frames and using that delta to control logic (making things move farther when more time has passed etc), you just make your game act as if the same amount of time has always passed between your frames, such as ~16ms for a 60fps recording, or ~33ms for a 30fps recording.  IMPORTANT: make it only behave this way when in “recording mode”.  You don’t need to sacrifice variable frame rate logic just to get the ability to record nice videos.  Also, 30fps is fine for a video.  The more FPS your video has, the larger the video file will be.  Movie and TVs are something like 24 fps, so you don’t need a 60 fps video for your game demo, 30 or less is just fine.

This way, it doesn’t matter how long it took to render each frame, the game will generate a sequence of frames at whatever frame rate you would like your video to be in.  While recording the demo video, the game may run slowly, and be difficult to control if it’s REALLY laggy, but at least the output video will be smooth and perfectly lagless.   Later in this article I present a possible solution to the problem of difficulty playing the game while recording.

Are we done at this point?  NO!  We haven’t talked at all about audio, and as it turns out our audio is in a very odd state going this route.

## Rolling your own – Audio

From the section above, we have a nice lagless video stream, but if we just recorded audio as it went, the audio would be out of sync with the frames. This is because we recorded audio in real time, but we recorded the frames in variable time.

You could try to sync the audio in the right places with each frame, but then you’d have to speed up and slow down portions of your audio to hit the right frame numbers, which would make your audio sound really weird as it sped up and slowed down and changed pitch.

Definitely not demoable! So what’s the solution?

The solution is that while you are recording your video frames, you also make an audio timeline of what audio was triggered at which frame numbers.

For instance, if on frame 20, the player swung his sword and on frame 25 hit an exploding barrel, causing it to explode, your timeline would say “at frame 20, play the sword swing sound effect. at frame 25, play the exploding barrel sound effect”.

I’ve found it really easy to capture an audio timeline by hooking into your game or engine’s audio system itself, capturing all sound events.  It usually is not very difficult to implement this part.

After you have recorded all of your video frames, and have an audio timeline, the next step is to re-create the audio from the timeline, which means you need a way of doing “offline” audio mixing.

If you are using an audio library, check the documentation to see if it has an offline mode, many of them do, including the ever popular fmod.  If your audio library can’t do it for you, there are various command line tools and audio libraries out there that can do this for you.  I believe portaudio (port mixer?) can do this for you, and also another open sourced program called sox.

What you need to do is render each item in the audio timeline onto a cumulative audio stream.  If your video were a 30fps video, and a 500ms sound effect happened at frame 93, that means that you know this sound effect started at 3.1 seconds in (frame 93 * 33.33 miliseconds per frame) and lasts until 3.6 seconds (since it’s 500ms long).   So, you’d mix that into the output audio stream at the appropriate point in time, and then rinse and repeat with the rest of the audio timeline items until you had the full audio stream for the video.

When you are done with this stage, you have your video frames and your audio stream.  With your video creation software (such as ffmpeg) you can combine these into a single video file which shows your game running perfectly at whatever frame rate you specified, and with perfectly synchronized audio.  It’s a beautiful thing and definitely ready to demo to get some funding.

## Recap

To recap, the steps for creating a perfect video recording of your game are:

1. When in recording mode, make your game run at a fixed frame rate – no matter how long it really was between frames, lie to your game and tell it that 33.33ms have passed each frame for 30fps video or 16ms for 60fps video (or whatever other frame rate you want to run at)
2. Write each rendered frame to disk as an uncompressed or lossless compression graphics file.
3. While rendering each frame, build up a timeline of audio events that you can use to re-create the audio later.
4. After all the frames are captured, render your audio timeline into an audio stream.
5. After you have your audio stream and each video frame, use software such as ffmpeg to combine them into a perfect, lagless video.
6. BLOW THE SOCKS OFF OF INVESTORS AND SECURE SOME FUNDING!

## Bonus Points – Or making this feature a shippable feature of your game for players to use

At this point, the final product (the video) is as nice as it can possibly be.  However, the process of actually recording the video can be cumbersome because even though you are making a nice and smooth 30fps video, during recording it may be running at 2fps (depending on your machine) making it very difficult to control the game.  Also, in the final video it will appear that the user is traversing menus, inputting commands, and reacting at superhuman speeds.

A good way to handle this is instead of recording during play, what you do is record all the input that happens during the recording process.  This way you have an input timeline that is tied to frame numbers, the same way the audio timeline is tied to frame numbers.

When the recording process is done, you then put up a nice dialog for the end user saying something like “Rendering video please wait….” with a progress bar, and then re-simulate the user input that occurred during the recording phase, and render all those frames to disk (well, screen capture them as image files just like usual, just dont display them to the end user).

Since building an input timeline is relatively cheap computationally, you should have no slow down during the “recording” phase of the video while you (or the end user) is actually playing the game.

The “Gotcha” here is that your game needs to be deterministic for fixed rate time steps (or at least everything that really matters needs to be deterministic, maybe not particles or something) which can potentially be a bit tricky, but the upside is if you actually make this happen, you can record light weight playbacks as “videos” and have users share these feaux-videos with each other to watch playbacks of gameplay that other players had.  When you want to export these playbacks as real videos, you can put it through the regular video recording steps and spit out a full mpeg, suitable for sharing, uploading to youtube (from within the app perhaps even?) just like normal.  But, until you need to use the video outside of your application, you have very small files users can share with each other to view “videos” of in game gameplay.

Final tip: if doing this in windows, I’ve found that in recent versions of windows, doing  the screen capture using GDI functions instead of DirectX is actually WAY faster so use that if you can.  I’m thinking this must be because windows already has a screen cap in memory to show those little icons when you mouse over the minimized application or something.

## That’s all folks!

That’s all there is to it. With luck this will save some fellow engineers from having to crunch up some “demo hacks” to get performance up for an E3 demo video or the like. If you have any questions or comments, drop me a line (:

## DIY Synth 3: Sampling, Mixing, and Band Limited Wave Forms

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This it the third installment of a series of tutorials on how to program your own synthesizer.

In this chapter we’ll continue on from the last chapter, and talk about a way to generate simple wave forms that don’t have aliasing problems. We’ll also talk about sampling, mixing and end with a somewhat realistic song made with samples and our very own platform independent synthesizer code.

You can download the full source code and source wave files from the link below.  The code got a bit more complex so there’s a zip file instead of a stand alone main.cpp.   Also, it’s not the cleanest, best organized code in the world – sorry about that! – but hopefully it’ll be ok for the purposes of this tutorial (:

DIY Synthesizer: Chapter 3 Source Code

If you don’t want to wait til the end of the chapter to hear the sample song, check it out here:

The Lament Of Tim Curry

## Aliasing

As mentioned in the previous tutorial, the wave forms we were generating have aliasing problems. Aliasing is an audio artifact where unintended audio frequencies appear in audio data due to trying to encode frequencies that are too high for the sample rate. Wikipedia describes Aliasing pretty well, check it out for more info: Aliasing.

Sound is pressure waves conducted in the air, and at the core, audio engineers and mathematicians like to think of all sound as being made up of sine waves at different frequencies and amplitudes (volumes).

If you have a smooth / bumpy wave form, you could picture building it up with sine waves pretty easily.

If on the other hand, you have something with sharp corners, like a saw wave, a triangle wave or a square wave, it gets more difficult.

In fact, to make a “perfect corner” out of sine waves, it would take an infinite amount of sine waves of ever diminishing frequency and amplitude to get the perfectly sharp corner.

In chapter one I briefly mentioned that the maximum frequency you can store in audio data is half the sample rate. This frequency is called the Nyquist frequency and you can read more about it here: Nyquist Frequency and here: Nyquist-Shannon sampling theorem.

Aliasing occurs whenever you try to store a frequency higher than the nyquist frequency. When you do that, your audio data is not what it ought to be (a higher frequency actually appears to be a lower frequency), causing audio artifacts. If you’ve ever seen a car’s wheels spinning too slowly or backwards in a tv commercial, that is the exact same problem.

So, when making a “perfect corner” on a saw, triangle, or square wave, and having to use infinitely high frequencies to make that corner, you can bet that an infinite frequency is above Nyquist, and that it will cause some aliasing.

So, to make band limited wave forms for saw, square, and triangle, we just add together the sine waves UP TO nyquist, and then stop, instead of continuing on to infinity (which would also take far too long to calculate hehe).  That makes a much cleaner, smoother sound, that is a lot easier on the ears.

A friend of mine who wishes to remain nameless has been a good sport in listening to my audio tracks over the years and for a long, long time she would complain that my songs hurt her ears.  I tried putting reverb and flange on my songs to try to mellow them out, and that helped a little, but even then, it still hurt her ears.  After I started using band limited wave forms, my songs stopped hurting her ears and my tones started sounding a lot smoother and richer, and more “professional”.

So, if you don’t want people’s ears to bleed when they hear your tunes, I recommend band limited wave forms!

## Band Limited Sine Wave

The sine wave does not have a band limited form, since since itself IS bandlimited by definition. So, a band limited sine wave is just the sine wave itself.

Onto the next!

## Band Limited Saw Wave

Wikipedia has a great article Sawtooth wave which says:

A sawtooth wave’s sound is harsh and clear and its spectrum contains both even and odd harmonics of the fundamental frequency. Because it contains all the integer harmonics, it is one of the best waveforms to use for synthesizing musical sounds, particularly bowed string instruments like violins and cellos, using subtractive synthesis.

What they mean by that (and what the heavy math formulas on that page say) is that if you have a saw wave of frequency 100, that means it contains a sine wave of frequency 100 (1 * fundamental frequency), another of frequency 200 (2 * fundamental frequency), another of 300 (3 * fundamental frequency) and so on into infinity.

The amplitude (volume) of each sine wave (harmonic) is 1 over the harmonic number. So in our example, the sine wave at frequency 100 has an amplitude of 1 (1/1). The sine wave at frequency 200 has an amplitude of 0.5 (1/2), the sine wave at frequency 300 has an amplitude of 0.333 (1/3) and so on into infinity.

After that you’ll need to multiply your sample by 2 / PI to get back to a normalized amplitude.

There’s a function in the sample code called AdvanceOscilator_Saw_BandLimited() that you can use to generate a band limited saw wave sample. It has an optional parameter where you can tell it how many harmonics to use, but if you omit that parameter, it’ll use as many as it can without going over Nyquist.

Here’s how a band limited saw wave looks and sounds compared to a non band limited saw wave, like the ones we created in the last chapter.

Chapter 3 Saw

Chapter 3 Saw Band Limited

## Band Limited Square Wave

Wikipedia has a good article on square wave’s too here: Square Wave which says:

Note that the square wave contains only odd-integer harmonic frequencies (of the form 2π(2k-1)f), in contrast to the sawtooth wave and real-world signals, which contain all integer harmonics.

What this means is that if you were trying to make a square wave at frequency 100, unlike a saw wave which has sine waves at frequencies 100, 200, 300, 400 and so on, a square wave is made up of sine waves of frequencies 100, 300, 500 and 700.

Like the saw wave, however, the amplitude of each frequency is the reciprocal of the multiple of the frequency. So, the sine wave at frequency 100 has an amplitude of 1/1, the sine wave at frequency 300 has an amplitude of 1/3, the sine wave at frequency 500 has an amplitude of 1/5.

After that, you need to multiply by 4/PI to get back to a normalized amplitude.

The function to generate this wave form in the sample code is called AdvanceOscilator_Square_BandLimited().

Here’s how a band limited square wave looks and sounds compared to a non band limited square wave, like the ones we created in the last chapter.

Chapter 3 Square

Chapter 3 Square Band Limited

## Band Limited Triangle Wave

The triangle wave is often used as a cheap approximation of a sine wave so it’s kind of funny making a more expensive (computationally) version of a triangle wave out of sine waves.

The wikipedia article for the triangle wave is here: Triangle Wave and it says:

It is possible to approximate a triangle wave with additive synthesis by adding odd harmonics of the fundamental, multiplying every (4n−1)th harmonic by −1 (or changing its phase by π), and rolling off the harmonics by the inverse square of their relative frequency to the fundamental.

Ok so in English what that means is that a triangle wave is a lot like a square wave, but every other harmonic, we subtract, instead of adding it. Also, instead of the amplitude (volume) of a sine wave being the reciprocal of the multiple of the frequency, the amplitude is the reciprocal of the SQUARE of the multiple of the frequency.

So that means for a 100hz frequency triangle wave, we would…

make a sine wave of 100hz at 1/1 amplitude
Subtract a sine wave of 300hz at 1/9 amplitude
Add a sine wave of 500hz at 1/25 amplitude
Subtract a sine wave of 700hz at 1/49 amplitude

and so on til infinity (or Nyquist frequency)

After that you multiply by 8 / PI*PI to get back to a normalized amplitude.

The function to generate this wave form in the sample code is called AdvanceOscilator_Triangle_BandLimited().

Here’s how a band limited triangle wave looks and sounds compared to a non band limited triangle wave, like the ones we created in the last chapter.

Chapter 3 Triangle

Chapter 3 Triangle Band Limited

## Band Limited Noise

In the last chapter we also talked about the “noise” wave form and I briefly mentioned that it had it’s uses – such as in percussion sounds.

Is it possible to make a band limited version? It is, but I’m not sure if it’s really useful for anything, other than a strange sound (but then again, strange sounds is what synth is all about right?)

A quick aside – In this chapter so far, we’ve actually been talking about “Additive Synthesis” which is the process of adding multiple noises together to get an interesting result. Specifically, we’ve been adding sine waves together to get band limited forms of a saw wave, a square wave and a triangle wave. There is something else called “Subtractive Synthesis” where you carve away sounds with filters (such as a low pass filter, a high pass filter, a band pass filter, etc) to get your sound. Another way to generate band limited wave forms is to make a pure, non band limited wave form, and then use a high pass filter to cut out the high frequencies of the sound (the ones generating the aliasing sounds).

In practice, it sounds the same either way you generate it.  Subtractive synthesis is just another way to approach the problem of aliasing and synth in general. In fact, when you down sample a sound file (take it from a higher sample rate to a lower sample rate), you should apply a low pass filter first to get rid of any frequencies that would cause aliasing in the lower sample rate.

Anyways, to generate band limited noise, I figured I’d just make a sine wave that changes it’s frequency once every 4000 samples (at a sample rate of 44,100, that means it changes it’s frequency 10 times a second).

Here’s what that looks and sounds like:

Chapter 3 Random Beeps

Interesting audio, and band limited, but not quite noise, so here is the same thing, switching frequency every 40 samples instead of every 4000 samples. That’s about 1000 times a second .

Chapter 3 Noise Wave

It is technically noise, and it is band limited, but it sounds weird.  Like a tape player on fast forward or water flowing quickly or something.

I didn’t make a function to generate that wave form, but the sample code does it “manually” if you want to make your own function.

## Chapter 3 Song

So this chapter has a somewhat passable song as a culmination of the info from the tutorials so far. You can check it out at the bottom of this article, but I wanted to give a quick overview of some other things that went into making it.

The song loads some sound files to use as samples. It loads 3 percussion sounds for the drum parts, and two sound clips from a favorite movie of mine called “Legend” – starring Tim Curry as the devil, Mia Sara as a princess and Tom Cruise as a naturalist wildman who is friends with fairies and elves. It’s a really great movie i really recommend checking it out!

Anyways, it MIXES these sound effects with our generated synth tones by just adding the various sound sources together. Mixing sounds is literally just adding them together.

When it loads up the wave files, it RESAMPLES them if necessary, meaning that if the sound file has a lower sample rate than the sound we want to render, it interpolates samples to make a higher sample rate. If the sound file loaded has a higher sample rate than the sound we want to render, it drops samples to make a lower sample rate. Check out the code for the details of how it does this, but it’s really simple and pretty much works how you’d expect it to. Note that if you down sample audio, you normally want to put it through a low pass filter to cut out any frequencies which would be above Nyquist, but my resampling code doesn’t handle that. It just aliases if there’s a frequency that is too high for the sake of simplicity.

Another thing that happens when it loads each wave file is that it converts it to mono or stereo if needed, to match the format of the sound we want to render. To convert from mono to stereo, it just duplicates the mono channel for the left and right channels, and to convert from stereo to mono, it just mixes (adds!) the left and right channel data together to get the mono channel data.  Intuition might tell you that adding the left and right channels together would make it louder, even maybe twice as loud, but in practice that doesn’t happen.  Sounds mix together pretty darn well without getting way loud, especially if they are “real life” sounds (not synthesized wave forms) and not played at the exact same time.  Basically, the peaks (positive numbers) and valleys (negative numbers) in sound sources tend to cancel each other out and keep things in normal range.

Lastly, when loading a wave file, it normalizes the audio data so that our synth and the audio samples are all working in the same amplitude ranges of -1 to 1. When normalizing, it also “re-centers” the audio data. That is to say, if audio data was really quiet, but was always above the zero axis, it would move the data down to be centered on the zero axis before normalizing to make sure and maximize loudness.

In reality, we’d want to re-center the left and right channels individually, but I just do them together. Also, you might want to normalize individual sections of the audio data at a time instead of normalizing the entire thing as one big chunk at the end. There are a lot of good techniques and algorithms out there to do this, but this functionality is often called a compressor (to give you a place to start your research).

Note, you could easily play sound files backwards to see if they sync up with the wizard of oz, or give you instructions for some tasty brownies, but I didn’t do that in this example code, I leave that up to you!

If you want to be able to read and write other sound formats besides wav files, you might check out libsndfile. I use it in my own projects and it works pretty nicely! You can find it at: libsndfile

## The Lament Of Tim Curry

Without further ado, here’s this chapter’s sample song. The full source code and source wave files is in this chapter’s source code zip file. Check it out with headphones for a neat effect, the bass line floats between the left and right channels. Enjoy!  And go watch Legend if you haven’t seen it before!

The Lament Of Tim Curry

## DIY Synth 2: Common Wave Forms

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This is the second chapter in a series of tutorials about programming your own synthesizer

In this chapter we’ll talk about oscillators, and some common basic wave forms: Sine, Square, Saw, Triangle and Noise.

By the end, you should have enough knowledge to make some basic electronic melodies.

You can download the full source for this chapter here:  DIY Synthesizer: Chapter 2 Source Code

## The Sine Wave

The sine wave is the basis of lots of things in audio synthesis. It can be used on it’s own to make sound, multiple sine waves can be combined to make other more complex wave forms (as we’ll see in the next chapter) and it’s also the basis of a lot of DSP theory and audio analysis. For instance, there is something called Fourier Analysis where you can analyze some audio data and it will tell you what audio frequencies are in that sound data, and how strong each is (useful for advanced synthesis and digital signal processing aka DSP). The math of how to get that information is based on some simple properties of sine waves. More info can be found here: http://en.wikipedia.org/wiki/Fourier_analysis.

If we want to use a sine wave in our audio data, the first problem we hit is that sine has a value from -1 to 1, but our audio data from the last chapter is stored in a 32 bit int, which has a range of -2,147,483,648 to 2,147,483,647, and is unable to store fractional numbers.

The solution is to just map -1 to -2,147,483,648, and 1 to 2,147,483,647 and all the numbers in between represent fractional numbers between -1 and 1.  0.25 for instance would become 536,870,911.

If instead of 32 bits, we wanted to store the data in 16 bits, or 8 bits, we could do that as well.  After generating our floating point audio data, we just convert it differently to get to those 16 bits and 8 bits.  16 bits have a range of -32,768 to 32,767 so 0.25 would convert to 8191.  In 8 bits, wave files want UNSIGNED 8 bit numbers, so the range is 0 to 255.   In that case,  0.25 would become 158.

Note, in the code for this chapter, i modified WriteWaveFile to do this conversion for us so going forward we can work with floating point numbers only and not worry about bits per sample until we want to write the wave file. When you call the function, you have to give it a template parameter specifying what TYPE you want to use for your samples. The three supported types are uint8, int16 and int32. For simple wave forms like those we are working with today, there is no audible difference between the 3, so all the samples just make 16 bit wave files.

So, we bust out some math and figure out here’s how to generate a sine wave, respecting the sample rate and frequency we want to use:
``` //make a naive sine wave for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fFrequency / (float)nSampleRate); } WriteWaveFile<int16>("sinenaive.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

That does work, and if you listen to the wave file, it does sound correct:
Naive Sine Wave Generation

It even looks correct:

There is a subtle problem when generating the sine wave that way though which we will talk about next.

## Popping aka Discontinuity

The problem with how we generated the wave file only becomes apparent when we try to play two tones right next to each other, like in the following code segment:
``` //make a discontinuitous (popping) sine wave for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { if(nIndex < nNumSamples / 2) { float fCurrentFrequency = CalcFrequency(3,3); pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate); } else { float fCurrentFrequency = CalcFrequency(3,4); pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate); } } WriteWaveFile<int16>("sinediscon.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

Quick note about a new function shown here, called CalcFrequency.  I made that function so that you pass the note you want, and the octave you want, and it will return the frequency for that note.  For instance, to get middle C aka C4 (the tone all these samples use), you use CalcFrequency(3,3), which returns approximately 261.626.

Listen to the wave file generated and you can hear a popping noise where the tone changes from one frequency to the next: Discontinuous Sine Wave

So why is this? The reason is because how we are generating our sine waves makes a discontinuity where the 2 wave files change.

Here you can see the point that the frequencies change and how a pretty small discontinuity can make a pretty big impact on your sound! The sound you are hearing has an official name, called a “pop” (DSP / synth / other audio people will talk about popping in their audio, and discontinuity is the reason for it)

So how do we fix it? Instead of making the sine wave be rigidly based on time, where for each point, we calculate the sine value with no regard to previous values, we use a “Free Spinning Oscillator”.

That is a fancy way of saying we just have a variable keep track of the current PHASE (angle) that we are at in the sine wave for the current sample, and to get the next sample, we advance our phase based on the frequency at the time. Basically our oscillator is a wheel that spins freely, and our current frequency just says how fast to turn the wheel (from wherever it is now) to get the value for the next sample.

Here’s what the looks like in code:

``` //make a continuous sine wave that changes frequencies for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { if(nIndex < nNumSamples / 2) { float fCurrentFrequency = CalcFrequency(3,3); fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;```

``` while(fPhase >= 2 * (float)M_PI) fPhase -= 2 * (float)M_PI;```

``` while(fPhase < 0) fPhase += 2 * (float)M_PI;```

``` pData[nIndex] = sin(fPhase); } else { float fCurrentFrequency = CalcFrequency(3,4); fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;```

``` while(fPhase >= 2 * (float)M_PI) fPhase -= 2 * (float)M_PI;```

``` while(fPhase < 0) fPhase += 2 * (float)M_PI;```

``` pData[nIndex] = sin(fPhase); } } WriteWaveFile<int16>("sinecon.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

Note that we keep the phase between 0 and 2 * PI. There’s no mathematical reason for needing to do this, but in floating point math, if you let a value get too large, it starts to lose precision. That means, that if you made a wave file that lasted a long time, the audio would start to degrade the longer it played. I also use a while loop instead of a regular if statement, because if someone uses very large frequencies, you can pass 2 * PI a couple of times in a single sample. Also, i check that it’s above zero, because it is valid to use negative frequency values! All stuff to be mindful of when making your own synth programs (:

Here’s what the generated wave file sounds like, notice the smooth transition between the two notes:
Continuous Sine Wave

And here’s what it looks like visually where the wave changes frequency, which you can see is nice and smooth (the bottom wave). The top wave is the popping sine wave image again at the same point in time for reference. On the smooth wave it isn’t even visually noticeable that the frequency has changed.

One last word on this… popping is actually sometimes desired and can help make up a part of a good sound. For instance, some percussion sounds can make use of popping to sound more appropriate!

## Sine Wave Oscillator

For our final incarnation of a sine wave oscillator, here’s a nice simple helper function:
``` float AdvanceOscilator_Sine(float &fPhase, float fFrequency, float fSampleRate) { fPhase += 2 * (float)M_PI * fFrequency/fSampleRate;```

``` while(fPhase >= 2 * (float)M_PI) fPhase -= 2 * (float)M_PI;```

``` while(fPhase < 0) fPhase += 2 * (float)M_PI;```

``` return sin(fPhase); } ```

You pass that function your current phase, the frequency you want, and the sample rate, and it will advance your phase, and return the value for your next audio sample.

Here’s an example of how to use it:
``` //make a sine wave for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate); } WriteWaveFile<int16>("sine.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

Here’s what it sounds like (nothing new at this point!):
Vanilla Sine Wave

## Wave Amplitude, Volume and Clipping

You can adjust the AMPLITUDE of any wave form by multiplying each sample by a value. Values greater than one increase the amplitude, making it louder, values less than one decrease the amplitude, making it quieter, and negative values flip the wave over, but also have the ability to make it quieter or louder.

One place people use negative amplitudes (volumes) is for noise cancellation. If you have a complex sound that has some noise in it, but you know the source of the noise, you can take that noice, multiply it by -1 to get a volume of -1, and ADD IT (or MIX IT) into the more complex sound, effectively removing the noise from the sound. There are other uses too but this is one concrete, real world example.

This code sample generates a quieter wave file:
``` //make a quieter sine wave for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 0.4f; } WriteWaveFile<int16>("sinequiet.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

And here’s what that sounds like:
Vanilla Sine Wave – Quiet

And here’s what that looks like:

If you recall though, when we write a wave file, we map -1 to the smallest int number we can store, and 1 to the highest int number we can store. What happens if we make something too loud, so that it goes above 1.0 or below -1.0?

One way to fix this would be to “Normalize” the sound data.  To normalize it, you would loop through each sample in the stream and find the highest absolute value sample.  For instance if you had 3 samples: 1.0, -1.2, 0.8,  the highest absolute sample value would be 1.2.

Once you have this value, you loop through the samples in the stream and divide by this number.  After you do this, every sample in the stream will be within the range -1 to 1.  Note that if you had any data that would be clipping, this process has the side effect of making your entire stream quieter since it reduces the amplitude of every sample.  If you didn’t have any clipping data, this process has the side effect of making your entire stream louder because it increases the amplitude of every sample.

Another way to deal with it is to just clamp the values to the -1, 1 range.  In the case of a sine wave, that means we chop off the top and/or the bottom of the wave and there’s just a flat plateau where the numbers went out of range.

This is called clipping, and along with popping are 2 of the main problems people have with audio quality degradation.  Aliasing is a third, and is something we address in the next chapter by the way! (http://en.wikipedia.org/wiki/Aliasing)

Here’s some code for generating a clipping sine wave:
``` //make a clipping sine wave for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 1.4f; } WriteWaveFile<int16>("sineclip.wav",pData,nNumSamples,nNumChannels,nSampleRate); ```

And here’s what it sounds like:
Vanilla Sine Wave – Clipping

Also, here’s what it looks like:

Note that in this case, it doesn’t necessarily sound BAD compared to a regular, non clipping sine wave, but it does sound different. That might be a good thing, or a bad thing, depending on your intentions. With more complex sounds, like voice, or acoustic music, this will usually make it sound terrible. Audio engineers have to carefully control the levels (volumes) of the channels being mixed (added) together to make sure the resulting output doesn’t go outside of the valid range and cause clipping. Also, in analog hardware, going out of range can cause damage to the devices if they aren’t built to protect themselves from it!

In the case of real time synthesis, as you might imagine, normalizing wave data is impossible to do because it requires that you know all the sound data up front to be able to normalize the data.  In real time applications, besides just making sure the levels keep everything in range, you also have the option of using a compressor which sort of dynamically normalizes on the fly.  Check this out for more information: http://en.wikipedia.org/wiki/Dynamic_range_compression

## Square Wave Oscillator

Here’s the code for the square wave oscillator:
``` float AdvanceOscilator_Square(float &fPhase, float fFrequency, float fSampleRate) { fPhase += fFrequency/fSampleRate;```

``` while(fPhase > 1.0f) fPhase -= 1.0f;```

``` while(fPhase < 0.0f) fPhase += 1.0f;```

``` if(fPhase <= 0.5f) return -1.0f; else return 1.0f; } ```

Note that we are using the phase as if it’s a percentage, instead of an angle. Since we are using it differently, that means if you switch from sine wave to square wave, there will be a discontinuity (a pop). However, in practice this happens anyways almost all the time because unless you change from sine to square at the very top or bottom of the sine wave, there will be discontinuity anyways. In reality, this really doesn’t matter, but you could “fix” it to switch only on those boundaries, or you could use “cross fading” or “blending” to fade one wave out (decrease amplitude from 1 to 0), while bringing the new wave in (increase amplitude from 0 to 1), adding them together to get the output. Doing so will make a smooth transition but adds some complexity, and square waves by nature constantly pop anyways – it’s what gives them their sound!

Here’s what a square wave sounds like and looks like:
Square Wave

## Saw Wave Oscillator

We used the saw wave in chapter one. Here’s the code for a saw wave oscillator:
``` float AdvanceOscilator_Saw(float &fPhase, float fFrequency, float fSampleRate) { fPhase += fFrequency/fSampleRate;```

``` while(fPhase > 1.0f) fPhase -= 1.0f;```

``` while(fPhase < 0.0f) fPhase += 1.0f;```

``` return (fPhase * 2.0f) - 1.0f; } ```

Here’s what a saw wave looks and sounds like:
Saw Wave

Note that sometimes saw waves point the other direction and the “drop off” is on the left instead of on the right, and the rest of the way descends instead of rises but as far as I have seen, there is no audible or practical difference.

## Triangle Wave Oscillator

A lot of synths don’t even bother with a triangle wave, and those that do, are just for approximations of a sine wave. A triangle wave sounds a lot like a sine wave and looks a bit like it too.

Here’s the code for a triangle wave oscillator:
``` float AdvanceOscilator_Triangle(float &fPhase, float fFrequency, float fSampleRate) { fPhase += fFrequency/fSampleRate;```

``` while(fPhase > 1.0f) fPhase -= 1.0f;```

``` while(fPhase < 0.0f) fPhase += 1.0f;```

``` float fRet; if(fPhase <= 0.5f) fRet=fPhase*2; else fRet=(1.0f - fPhase)*2;```

``` return (fRet * 2.0f) - 1.0f; } ```

Here’s what it looks and sounds like:
Triangle Wave

## Noise Oscillator

Believe it or not, even static has it’s place too. It’s used sometimes for percussion (put an envelope around some static to make a “clap” sound), it can be used as a low frequency oscillator aka LFO (the old “hold and sample” type stuff) and other things as well. Static is just random audio samples.

The code for a noise oscillator is slightly different than the others. You have to pass it the last sample generated (you can pass 0 if it’s the first sample) and it will continue returning that last value until it’s time to generate a new random number. It determines when it’s time based on the frequency you pass in. A higher frequency mean more random numbers will be chosen in the same amount of audio data while a lower frequency means that fewer random numbers will be chosen.

At lower frequencies (like in the sample), it kind of sounds like an explosion or rocket ship sound effect from the 80s which is fun 😛

Here’s the code:
``` float AdvanceOscilator_Noise(float &fPhase, float fFrequency, float fSampleRate, float fLastValue) { unsigned int nLastSeed = (unsigned int)fPhase; fPhase += fFrequency/fSampleRate; unsigned int nSeed = (unsigned int)fPhase;```

``` while(fPhase > 2.0f) fPhase -= 1.0f;```

``` if(nSeed != nLastSeed) { float fValue = ((float)rand()) / ((float)RAND_MAX); fValue = (fValue * 2.0f) - 1.0f;```

``` //uncomment the below to make it slightly more intense /* if(fValue < 0) fValue = -1.0f; else fValue = 1.0f; */```

``` return fValue; } else { return fLastValue; } } ```

Here’s what it looks and sounds like:
Noise

I think it kind of looks like the Arizona desert 😛

As a quick aside, i have the random numbers as random floating point numbers (they can be anything between -1.0 and 1.0). Another way to generate noise is to make it so it will choose only EITHER -1 or 1 and nothing in between. It gives a slightly harsher sound. The code to do that is in the oscillator if you want to try it out, it’s just commented out. There are other ways to generate noise too (check out “pink noise” http://en.wikipedia.org/wiki/Pink_noise) but this ought to be good enough for our immediate needs!

## More Exotic Wave Forms

Two other oscillators I’ve used on occasion is the squared sine wave and the rectangle wave.

To create a “squared sine wave” all you need to do is multiply each sample by itself (square the audio sample). This makes a wave form that is similar to sine waves, but a little bit different, and sounds a bit different too.

A rectangle wave is created by making it so the wave spends either more or less time in the “up” or “down” part of the wave. Instead of it being 50% of the time in “up”, and 50% of the time in “down” you can make it so it spends 80% of the time in up, and 20% of the time in down. It makes it sound quite a bit different, and the more different the percentages are, the “brighter” it sounds.

Also, you can add multiple wave form samples together to get more interesting wave forms (like adding a triangle and a square wave of the same frequency together, and reducing the amplitude to avoid clipping). That’s called additive synthesis and we’ll talk more about that next chapter, including how to make more correct wave forms using sine waves to avoid aliasing.

You can also multiply wave forms together to create other, more interesting waves. Strictly speaking this is called AM synthesis (amplitude modulation synthesis) which is also sometimes known as ring modulation when done a certain way.

As you can see, there are a lot of different ways to create oscillators, and the wave forms are just limited by your imagination. Play around and try to make your own oscillators and experiment!

## Final Samples

Now we have the simple basics down for being able to create music. here’s a small “song” that is generated in the sample code:
Simple Song

And just to re-inforce how important keeping your wave data continuous is, here’s the same wave file, but about 0.75 seconds in a put a SINGLE -1.0 sample where it doesn’t belong. a single sample wrong when there’s 44100 samples per second and look how much it affects the audio.
Simple Song With Pop

## Until Next Time…

Next up we will talk about “aliasing” and how to avoid it, making much better sounding saw, square and triangle waves that are less harsh on the ears.

## DIY Synth 1: Sound Output

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This is the first in a series of tutorials on how to make your own software synthesizer.

These tutorials are aimed at C++ programmers, and the example code is meant to be as easy to understand as possible and have as few dependencies as possible. The code ought to compile and run for you no matter what system or compiler you are using with minimal if any changes required.

You can download the full source for this chapter here: DIY Synthesizer Chapter 1 Source Code

## Wave File Format

Since making sound come out of computer speakers varies a lot between different systems, we’ll start out just writing a .wave file.

If you want to jump into doing real time audio, i recommend portaudio (http://www.portaudio.com/) , and i also recomend libsndfile for reading and writing other audio file formats(http://www.mega-nerd.com/libsndfile/).

I found these 2 links really helpful in understanding the wave file format:

There’s a lot of optional parts of a wave file header, but we are only going to focus on the bare minimum required to get the job done. Here’s what our wave file header struct looks like:

```//this struct is the minimal required header data for a wav file struct SMinimalWaveFileHeader { //the main chunk unsigned char m_szChunkID[4]; uint32 m_nChunkSize; unsigned char m_szFormat[4];```

``` //sub chunk 1 "fmt " unsigned char m_szSubChunk1ID[4]; uint32 m_nSubChunk1Size; uint16 m_nAudioFormat; uint16 m_nNumChannels; uint32 m_nSampleRate; uint32 m_nByteRate; uint16 m_nBlockAlign; uint16 m_nBitsPerSample;```

``` //sub chunk 2 "data" unsigned char m_szSubChunk2ID[4]; uint32 m_nSubChunk2Size;```

` ```` //then comes the data! }; ```

And boringly, here’s the function that fills out the struct and writes it to disk:
``` bool WriteWaveFile(const char *szFileName, void *pData, int32 nDataSize, int16 nNumChannels, int32 nSampleRate, int32 nBitsPerSample) { //open the file if we can FILE *File = fopen(szFileName,"w+b"); if(!File) { return false; }```

` SMinimalWaveFileHeader waveHeader;`

``` //fill out the main chunk memcpy(waveHeader.m_szChunkID,"RIFF",4); waveHeader.m_nChunkSize = nDataSize + 36; memcpy(waveHeader.m_szFormat,"WAVE",4);```

``` //fill out sub chunk 1 "fmt " memcpy(waveHeader.m_szSubChunk1ID,"fmt ",4); waveHeader.m_nSubChunk1Size = 16; waveHeader.m_nAudioFormat = 1; waveHeader.m_nNumChannels = nNumChannels; waveHeader.m_nSampleRate = nSampleRate; waveHeader.m_nByteRate = nSampleRate * nNumChannels * nBitsPerSample / 8; waveHeader.m_nBlockAlign = nNumChannels * nBitsPerSample / 8; waveHeader.m_nBitsPerSample = nBitsPerSample;```

``` //fill out sub chunk 2 "data" memcpy(waveHeader.m_szSubChunk2ID,"data",4); waveHeader.m_nSubChunk2Size = nDataSize;```

``` //write the header fwrite(&waveHeader,sizeof(SMinimalWaveFileHeader),1,File);```

``` //write the wave data itself fwrite(pData,nDataSize,1,File);```

` ```` //close the file and return success fclose(File); return true; } ```

Nothing too crazy or all that interesting, but it gets the job done. Again, check out those links above if you are interested in the details of why things are written the way they are, or what other options there are.

## Generating a Mono Wave File

Now, finally something interesting, we are going to generate some audio data and make a real wave file!

```int nSampleRate = 44100; int nNumSeconds = 4; int nNumChannels = 1; ```

The sample rate defines how many samples of audio data there are per second. A stream of audio data is nothing more than a stream of numbers, and each number is a single audio sample, so the sample rate is just how many numbers there are per second of audio data. The less numbers you use, the less “horizontal resolution” your sound file has, or, the less times the wave data can change in amplitude per second.

The sample rate also defines the maximum frequency you can store in the audio stream. The maximum frequency you can store is half of the sample rate. In other words, with a 44100 sample rate, the maximum frequency you can store is 22,050hz. The maximum audible frequency for the human ear is about 20,000hz so using a sample rate of 44100 ought to be pretty good for most needs (you might need to go higher, for complex technical reasons, but this is info enough for now!). Here’s some interesting info about audio frequencies: http://en.wikipedia.org/wiki/Audio_frequency

The number of seconds is how long (in seconds) the wave goes on for, and the number of channels is how many audio channels there are. Since this is a mono sound, there is only one audio channel.
``` int nNumSamples = nSampleRate * nNumChannels * nNumSeconds; int32 *pData = new int32[nNumSamples]; ```

Here we calculate how many actual audio samples there are and then allocate space to hold the audio data. We are using 32 bit integers, but you could also use 16 bit integers. The number of bits in your audio samples indicates the vertical resolution of your audio data, or how many unique values there are. in 16 bit ints, there are 65536 different values, and in 32 bits there are 4.2 billion different values. If you think about your data as plots on a graph (essentially, what it is, where X is time and Y is wave amplitude) the more bits per sample, and the higher the sample rate, the closer your graph can be to whatever real values you are trying to use (such as a sine wave). Less bits and a lower sample rate mean it’s farther away from the real data you are trying to model, which will cause the audio to sound less correct.

``` int32 nValue = 0; for(int nIndex = 0; nIndex < nNumSamples; ++nIndex) { nValue += 8000000; pData[nIndex] = nValue; } ```

Here we are actually creating our wave data. We are using the fact that if you have an int near the maximum value you can store, and then add some more, it will wrap around to the minimum value the int can store. If you look at this on a graph, it looks like a saw tooth wave, ie we are creating a saw tooth wave!  Normally you wouldn’t create them this way because the way we are doing it is harsh on the ear, and introduces something called aliasing (http://en.wikipedia.org/wiki/Aliasing).  In a later tutorial we’ll see how to create a band limited saw tooth wave to make higher quality sound, but for now this will work file!

you can change how much is added to nValue to change the frequency of the resulting wave. Add a smaller number to make it a lower frequency, add a higher number to make it a higher frequency. We’ll get into the math of more finely controlling frequency in another chapter so you can actually match your waves to notes you watch to hit.

` WriteWaveFile("outmono.wav",pData,nNumSamples * sizeof(pData[0]),nNumChannels,nSampleRate,sizeof(pData[0])*8);`

delete[] pData;
Lastly we write our wave file and free our memory.

Tada! All done, we have a sawtooth mono wave file written out, give it a listen!

DIY Synthesizer Chapter 1: outmono.wav

## Writing a Stereo File

The only thing that has really changed in the stereo file is that there are 2 channels instead of 1, and how we generate the audio data is slightly different.  Since there are 2 channels, one for left, one for right, there is actually double the audio data for the same sample rate and time length wave file, since it needs a full set of data for each channel.

The audio data itself is interleaved, meaning that the first audio sample is for the left channel, the second sample is for the right channel, the third sample is for the left channel, and so on.

Here’s how the audio is generated:

``` int32 nValue1 = 0; int32 nValue2 = 0; for(int nIndex = 0; nIndex < nNumSamples; nIndex += 2) { nValue1 += 8000000; nValue2 += 12000000; pData[nIndex] = nValue1; //left channel pData[nIndex+1] = nValue2; //right channel } ```

Note that for the right channel we write a different frequency wave. I did this so that you can tell the difference between this and the mono file. Play around with the values and try muting one channel or the other to convince yourself that it really is a stereo file!

DIY Synthesizer Chapter 1: outstereo.wav

## Until Next Time…

That’s all for chapter 1, thanks for reading.

Next up we’ll talk about the basic wave forms – sine, square, saw, square, and noise – and we’ll talk more about frequency and oscillators.

## MoriRT: Pixel and Geometry Caching to Aid Real Time Raytracing

About half a year ago, some really intriguing ideas came to me out of the blue dealing with ways to speed up raytracing.  My plan was to create a couple games, showing off these techniques, and after some curiosity was piqued, write an article up talking about how it worked to share it with others in case they found it useful.

For those of you who don’t know what raytracing is, check out this wikipedia article:

http://en.wikipedia.org/wiki/Ray_tracing_(graphics)

Due to some distractions and technical setbacks unrelated to the raytracing itself, I’ve only gotten one small game working with these techniques.  One of the distractions is that I implemented the game using google’s native client (NaCl) and for some reason so far unknown to me, it doesn’t work on some people’s machines which is very annoying!  I plan to get a mac mini in the next couple months to try and repro / solve the problem.

Check out the game if you want to.  Open this link in google chrome to give it a play:

1. Limitations
2. Geometry Caching
3. Pixel Caching
4. Future Work
5. Compatible Game Ideas

## Limitations

Admittedly, these techniques have some pretty big limitations.  I’ve explained my techniques to quite a few fellow game devs and when I mention the limitations, the first reaction people give me is the same one you are probably going to have, which is “oh… well THAT’S dumb!”.  Usually after explaining it a bit more, people perk up again, realizing that you can still work within the limitations to make some neat things.   So please hold off judgement until checking out the rest of the article! (:

The big fat, unashamed limitations are:

• The camera can’t move (*)
• Objects in the scene shouldn’t move too much, at least not all at the same time

(* there is a possible exception to the camera not moving limitation in the “Future Work” section)

So…. what the heck kind of games can you make with that?  We’ll get to that later on, but here’s some things these techniques are good at:

• Changing light color and intensity is relatively inexpensive
• Changing object color and animating textures is inexpensive
• These techniques don’t break the parallel-izable nature of raytracing.  Use all those CPU and GPU cores to your heart’s content!

Seems a bit dodgy I’m sure, but read on.

## Geometry Caching

The first technique is geometry caching.

The idea behind geometry caching is:  If no objects have moved since the last frame, why should we test which objects each ray hits?  It’s a costly part of the ray tracing, and we already KNOW that we are going to get the same results as last frame, so why even bother?  Let’s just use the info we calculated last frame instead.

Also, if some objects HAVE moved, but we know that the moving objects don’t affect all rays, we can just recalculate the rays that have been affected, without needing to recalculate all rays.

Just because we know the collision points for rays doesn’t mean that we can just skip rendering all together though.  Several things that can make us still need to re-render a ray include:  Animating textures, objects changing colors, lights dimming, lights changing color.  When these things happen, we can re-render a ray much less expensively than normal (just recalculate lighting and shading and such), so they are comparatively inexpensive operations compared to objects actually moving around.

How I handle geometry caching is give each ray (primary and otherwise) a unique ID, and I have a dynamic array that holds the collision info for each ID.

In the part of the code that actually casts a single ray, i pass the ID and a flag saying whether it’s allowed to use the geometry cache.  If it isn’t allowed to use the geometry cache, or there is no entry in the cache for the ID, the code calculates intersection info and puts it into the geometry cache.

It then uses the geometry cache information (whether it was re-calculated, or was usable as is) and applies phong shading, does texture lookups, recurses for ray refraction and reflection, and does the other things to figure out the color of the pixel.

In my first implementation of the geometry cache, it was very fast to render with once it was filled in, but it was really expensive to invalidate individual cache items.  If an object moved and a couple hundred geometry cache items needed to be marked as dirty, it was a really computationally expensive operation!

A better option, the one i use now, involves both a 2d grid (for the screen pixels) and a 3d grid (to hold the geometry of the world).

Breaking the screen into a grid, when each ray is cast into the world, I’m able to tell the ray what screen cell it belongs to.   This way, as a ray traverses the 3d grid holding the world geometry, it’s able to add itself each world grid maintains to keep track of which rays pass through that 3d cell (keeping just the unique values of course!).  Child rays know what screen cell they are in by getting that value from their parent.

If an object moves in the world, you can make a union of which world cells it occupied before it moved, and which world cells it occupies after the move.  From there, you can make a union of which screen cells sent rays into that world cell.  The last step is to mark all those screen cells as “geometry dirty” so that next frame, the rays in those cells are disallowed from using the geometry cache data, and instead will re-calculate new intersection info.

This method makes it so potentially a lot of rays re-calcuate their intersection data that don’t really need to, but by tuning the size of the screen and world grids, you can find a good happy medium for your use cases.

## Pixel Caching

The second technique is pixel caching which is a fancy way of saying “don’t redraw pixels that we don’t have to”.  The less rays you have to cast, the faster your scene will render.

The first challenge to tackle in this problem is how do you know which pixels will be affected when an object changes color?  That is solved by the same mechanism that tells us when geometry cache data is invalidated.

When an object changes color (or has some other non-geometry change), you just get the list of world cells the object resides in, and then get the union of screen cells that sent rays through those world cells.

When you have that list, instead of marking the screen cell “geometry dirty”, you mark it as “pixel dirty”.

When rendering the screen cells, any screen cell that isn’t marked as dirty in either way can be completely skipped.  Rendering it is a no-op because it would be the same pixels as last time! (:

This is the reason why you want to minimize geometry changes (objects moving, rotating, resizing, etc) and if you have to,  rely instead on animating textures, object colors, and lighting colors / intensities.

## Future Work

Here’s a smattering of ideas for future work that I think ought to bear fruit:

• Replace the screen and/or world grid with better performing data structures
• Pre-compute (a pack time process) the primary rays and subsequent rays of static geometry and don’t store static geometry in the world grid, but store it in something else instead like perhaps a BSP tree.  This way, at run time, if a ray misses all objects in the dynamic geometry world grid, it can just use the info from the pre-computed static geometry, no matter how complex the static geometry is.  If something DOES hit a dynamic object however, you’ll have to test subsequent rays against both the dynamic object world grid, and the data structure holding the info about the static geometry but hopefully it’ll be a net win in general.
• Investigate to see how to integrate this with photon mapping techniques and data structures.  Photon mapping is essentially ray tracing from the opposite direction (from light to camera, instead of from camera to light).  Going the opposite direction, there are some things it’s really good at – like caustics – which ray tracing alone just isn’t suited for: http://en.wikipedia.org/wiki/Photon_mapping
• In a real game, some things in the world will be obscured by the UI overlays.  There might be an oportunity in some places to “early out” when rendering a single ray if it was obscured by UI.  It would complicate caching those since an individual ray could remain dirty while the screen cell itself was marked as clean.
• Orthographic camera:  If the camera is orthographic, that means you could pan the camera without invalidating the pixel and geometry cache.  This would allow the techniques to be used for a side scrolling game, overhead view game, and things of that nature – so long as orthographic projection looks good enough for the needs of the game.  I think if you got creative, it could end up looking pretty nice.
• Screen space effects: enhance the raytracing visuals with screen space particles and such.  Could also keep a “Z-Buffer” by having a buffer that holds the time each ray took to hit the first object.  This would allow more advanced effects.
• Interlaced rendering: to halve the rendering time, every frame could render every other horizontal line.  Un-dirtying a screen cell would take 2 frames but this ought to be a pretty straight forward and decent win if losing a little bit of quality is ok.
• red/blue 3d glasses mode:  This is actually a feature of my snake game but figured i’d call it out.  It works by rendering the scene twice which is costly (each “camera” has it’s own geometry and pixel cache at least).  If keeping the “Z-Buffer” as mentioned above, there might be a way to fake it more cheaply but not sure.

## Compatible Game Ideas

Despite the limitations, I’ve been keeping a list of games that would be compatible with these ideas.  Here’s the highlights of that list!

• Pinball:  Only flippers, and the area around the ball would actually have geometry changes, limiting geometry cache invalidating.  Could do periodic, cycling color / lighting animations on other parts of the board to spice the board up in the “non active” areas.
• Marble Madness Clone: Using an orthographic camera, to allow camera paning, a player could control a glass or mirrored ball through a maze with dangerous traps and time limits.  Marble Madness had very few moving objects and was more about the static geometry so there’d probably be a lot of mileage here.  You could also have animated textures for pools of acid so that they didn’t impact the geometry cache.
• Zelda 1 and other overhead view type games: Using ortho camera to allow panning, or in the case of Zelda 1, have each “room” be a static camera.  You’d have to keep re-rendering down somehow by minimizing enemy count, while still making it challenging.  Could be difficult.
• Metroidvania: side scroller with ortho camera to allow panning.  Could walk behind glass pillars and waterfalls for cool refractive effects.
• Monkey Island type game: LOTS of static geometry in a game like that which would be nice.
• Arkanoid type game: static camera, make use of screen space effects for break bricking particles etc
• Mystery game: Static scenes where you can use a magnifying glass to LITERALLY view things better (magnification due to refraction, just like in real life) to find clues and solve the mystery.  Move from screen to screen to find new clues and find people to talk to, progress the storyline etc.
• Puzzle Game: could possibly do a traditional “block based” puzzle game like puzzle fighters, tetris, etc.
• Physics based puzzle game: You set up pieces on a board (only one object moves at once! your active piece!) then press “play”.  Hopefully it’d be something like a ball goes through your contraption which remains mostly motionless and you beat the level if you get the ball in the hole or something.
• Somehow work optics into gameplay… maybe a puzzle game based on lasers and lights or something
• Pool and board games: as always, gotta have a chess game with insane, state of the art graphics hehe
• mini golf: A fixed camera when you are taking your shot, with a minimum of moving objects (windmills, the player, etc).  When you hit the ball, it rolls, and when it stops, the camera teleports to the new location.
• Security gaurd game:  Have several raytraced viewports which are played up to be security camera feeds.  Could have scenes unfold in one feed at a time to keep screen pixel redraw low.
• Turn based overhead view game:  Ortho camera for panning, and since it’s turn based, you can probably keep object movement down to one at a time.

Lastly, here’s a video describing this stuff in action.  When you view the video, the orange squares are screen tiles that are completely clean (no rendering required, they are a no-op).  Purple squares are screen tiles that were able to use the geometry cache.   Where you don’t see any squares at all, it had to re-render the screen tile from scratch and wasn’t able to make use of either caching feature.

Feedback and comments welcomed!  I’d be really interested too in hearing if anyone actually uses this for anything or tries to implement in on their own.