Anatomy of a Skeletal Animation System Part 2

This is part two of “Anatomy of a Skeletal Animation System”

Animation Controller v3 – Bone Groups

In part 1, we talked about how to make a skeletal animation system that was able to play smooth, non popping animations on a model, it could communicate back to the engine to play sound effects, spawn objects in specific spots, and many other things as well.  What it could not do however, was play a different animation on the upper body and lower body.

To solve this, instead of having a single animation controller for our model, we need to have multiple animation controllers, where each controller controls a specific set of bones.  Note that multiple controllers should be able to affect the same set of bones, and in the end result, a bone’s position is made up by blending the data from all animation controllers that affect it.

Each animation controller should have a blend weight so that it can be blended in and out to keep animation motion smooth and continuous, and also the blend weighting allows you to turn on and off specific animation controllers as needed.

Some great example uses for this are…

  • Having a seperate animation controller for the upper and lower body so that they can work independently (the lower body can look like it’s jumping, without having to care if the upper body is firing a gun or not).
  • Having a seperate full body animation controller that affects all bones.  In most situations, this animation controller would be off, but in the rare cases that you want to play a full body animation, you turn this one on and play an animation on it.
  • Having a facial animation anim controller that only turns on if the camera is close enough to a characters’s face.  This way, if you look closely at another player, you can see their face moving, but if you are far away from them, the game engine doesn’t bother animating the facial bones since you can’t see them very well anyways.

The order that these animation controllers are evaluated should be explicit (instead of left up to load order or things like that).  You want to be very clear about which animation controllers over-ride which other animation controllers for the case of having multiple on at the same time, affecting the same bones.

For the sake of efficiency, when trying to blend the animation data together from each animation controller that affects that bone, you should start at the last fully weight (100% weight) anim controller in the anim controller list.  This way, you don’t bother evaluating animations for anim controllers that are just going to be completely masked out by other animation controllers.

If there is no full weight anim controller in the list that affects the specific bone, initialize the bone data to the “T-Pose” animation position before blending the other anim controller bone data on top of it.

We now have a very robust animation system, but it isn’t quite there yet.  Interacting with this animation system from game code means you having to tell specific game controllers when to play specific animations.   This is quite cumbersome and not very maintainable.  Ideally, the animation logic would be separated from the game play logic. Besides making the code more maintainable, this means that non animation programmers will be able to write game play code that interacts with the animation system which is a big win for everyone. Fewer development bottlenecks.

Animation Selection

There are two good techniques i’ve seen for separating the logic and preforming animation selection for you.

The first way is via “animation properties” and the second way is by using an animation state machine. There are pros and cons to each.

Animation Properties

For the animation properties method, you essentially have a list of enums that describe the player’s state.  These enums include things such as being able to say whether the player is crouched or standing, whether the player is unarmed, holding a pistol, or holding a rifle, or even how injured the player is (not injured, somewhat injured, or near death).

The game play code would be in charge of making sure these enums were set to the right values, and the animation controller(s) would use these values to determine the appropriate animations to play.

For instance, the game code may set the enum values to this:

  • WeaponType = Rifle (vs Unarmed, Pistol, etc)
  • WeaponAction = Idle (vs Firing, Reloading, etc)
  • PlayerHealth = NearDeath (vs healthy, injured, etc)
  • MovementType = WalkForward (vs Idle, Running, LungeRight, etc)

From here, the animation system takes over.

The lower body animation controller perhaps only cares about “MovementType” and “PlayerHealth”.  It notices that the player is walking forward (WalkForward) and that they have very low health (NearDeath).  From this, it uses a table that animators created in advance that says for this combination of animation properties, the lower body animation controller should play the “WalkNearDeathFwd” animation.  So, the lower body animation controller obliges and plays that animation for the lower body bones.

The upper body animation controller perhaps just cares about WeaponAction, WeaponType and PlayerHealth.  It notices that the player has a rifle, they aren’t shooting it, and they have very low health.  From this, the upper body animation controller looks into it’s animation properties table and sees that it should play the “RifleIdleInjured” animation, so it plays that animation on the upper body bones.

The logic of game play and animation are completely seperate, and the animators have a lot of control over what animations to play in which situations.

Once again, you’d want an editor of some sort for animators to set up these animation properties tables so that it’s easier for them to work with, it verifies the data to reduce the bug count, and everyone wins.

Your tool also ought to pack each animation properties table (upper body, lower body, facial animation, full body animation, etc) into some run-time friendly structure, such as perhaps a balanced decision tree to facilitate quick lookups based on animation properties.

Animation State Machine

Another way to handle animation selection is to have the animation controllers run animation state machines, having the game code send animation events to the state machines. Each state of the state machine corresponds to a specific animation.

When the player presses the crouch button for instance, it could send an event to all of the animation controllers saying so, maybe ACTION_BEGINCROUCH.

Depending on the logic of the state that each anim controller state machine is in, it may respond to that event, or ignore it.

The upper body anim controller may be in the “Idle” state. The logic for the idle state says that it doesn’t do anything if it recieves the ACTION_BEGINCROUCH event, so it does nothing and keeps doing the animation it was doing before.

The lower body anim controller may also be in a state named “Idle”. The logic for the lower body idle state says that if it recieves the ACTION_BEGINCROUCH event, that it should transition to the “StartCrouch” state. So, it transitions to that state which says to play the “CrouchBegin” animation (also says to ignore all incoming events perhaps), and when that animation is done, it should automatically transition to the “CrouchIdle” state, which it does, and that state says to play the “Crouching” animation, so it does that, waiting for various events to happen, including an ACTION_ENDCROUCH event to be sent from game code when the player lets go of the crouch button.

The interesting thing about the anim state machine is that it gives content creators a lot more control over the actual control of the player himself (they can say when the player is allowed to crouch for instance!) which can be either a good or bad thing, depending on your needs, use cases and skill sets of your content creators.

Going this route, you are going to want a full on state machine editor for content people to be able to set up states, the rules for state switching, and they should be able to see a model and simulate state switches to see how things look. If you DO make such an editor, it’s also a great place to allow them to define and edit bone groups. You might even be able to combine it with the key string editor and make a one stop shop editor for animation (and beyond).

Animation Controller v4 – Animation Blend Trees

At this point, our animation system is in pretty good shape, but we can do a bit better before calling it shippable.

The thing we can do to really spruce it up is instead of dealing with individual animations (for blending, animation selection, etc), is to replace them with animation blend trees like the below:


In the animation blend tree above, you can see that it’s playing two animations (FireGun and GunSight) and blending them together to create the final bone data.

As you can imagine, you might have different nodes that preformed different functionality which would result in lots of different kinds of animations using the same animation blend tree.

You will be in good shape if you make a nice animation blend tree editor where a content creator can create an animation blend tree, set parameters on animation blend tree nodes, and preview their work within that editor to be able to quickly iterate on their changes.  Again, without this tool, everyone’s lives will be quite a bit harder, and a little less happy so it’s in your interest to invest the effort!

Some really useful animation nodes for use in the blend trees might include:

  • PlayAnimation – definitely needed!
  • AnimationSequence – This node has N number of “children” and will play each child in order from 1 to N in a sequence.  You may optionally specify (in the editor) that you want the children chosen at random and you specify a weighting to each child for the random choosing.  This is useful for “idle animations” so that periodically an idle character will do silly things.
  • AimGrid – this animation node uses the player data to see yaw and pitch of the player’s aim.  It uses this information to figure out how to blend between a grid of 9 animations of the player pointing in the main directions to give a proper resulting aim.  This node has 9 children, which specify the animations that specify the following aiming animations: Up Left, Up, Up Right, Left, Forward, Right, Down Left, Down, Down Right.  Note that since this is a generalized anim blend tree, these child nodes can be ANY type of animation node, they aren’t required to be a “PlayAnimation” node.  This in essence is the basis of parametric animation (which i mentioned at the beginning of part 1), so this is a way to get some parametric animation into your system without having to go full bore on it.
  • IK / FK Nodes – get full or partial ragdoll on your model.  Also get it to do IK solving to position hands correctly for specified targets and such.
  • BlendBySpeed – You give N number of children, and movement speeds for each child.  This animation node will choose the correct animation, or blend between the correct animations, based on the current traveling speed of the player.  This way you get a smooth blend between walk, run and sprint animations and the player can move at whatever speed they ought to (perhaps the speed is defined by the pathing system, or the player’s input).  To solve the problem of feet “dancing” as they blend, you need to make sure the footfalls happen on the same time (in %) on each animation that will blend together.  This way, the animations don’t fight eachother, and the feet will appear to move properly.
  • BlendByHealth – if you want the player to walk differently when they are injured, this node could be used to specify various walk animations with matching health levels so that it will blend between them (for upper or lower body or whatever else) as is appropriate for the player’s current health level.
  • Additive Blending – to get gun recoils and such

As you can see, animation blend trees have quite a bit of power.  They are also very technical which means engineers may need to help out content folk in making good trees to resolve some edge case bugs.  In my experience, animators are often very technical folk themselves, so can do quite a bit on their own generally.

Combine anim blend trees with the animation selection systems (FSM or anim properties) and the ability to smoothly blend an animation controller between it’s internal animations (or anim trees) it’s playing and you have a really robust, high quality animation system.

Often time with this work flow, an animator will just say “hey i need an anim node which can do X”, so an animation engineer creates the node and the animators start using it to do interesting things.  No need for an engineer to be deeply involved in the process of making the animation work like the animator wants, or having to worry about triggering it in the right situations etc.

Sure there will be bugs, and some things will be more complex than this, but by and large, it’s a very low hassle system that very much empowers content creators, and removes engineers from needing to be involved in most changes – which is a beautiful thing.

End of Part 2

This is the end of part 2. In the next and final part, we’ll talk about a few other miscellaneous features and optimizations.

Anatomy of a Skeletal Animation System Part 1

This is part one of “Anatomy of a Skeletal Animation System”

There is quite a bit of information out there on the basics of skeletal animation, including how to export and read animation and model data, how to animate bones and thus transform a mesh, how to blend bone data together and other related animation topics.

However, there is a lot less information out there about how to set up a system to use these techniques in a realistic way, such as you might find in your average modern 3d video game.

I myself have been an animation programmer on a few games including an open world unreal engine game called “This is Vegas” (unfortunately cancelled due to Midway going bankrupt) and also a multiplayer only first person shooter called “Gotham City Impostors” which was released earlier this year for PC, 360 and PS3.  The info I’m presenting is based on experience developing those games, as well as info i gathered from other developers or read about in books or online.

In this article I’m going to assume you already know how to get animation bone data into memory, how to use that animation data to animate models (meshes), and also how to blend animation bone data together.  I’m going to start off with the most simple animation system possible and slowly introduce features until we end up at something that would be fully featured for a typical modern game.

The “next generation” of skeletal animation seems like it’s going to be heavily based on parametric animation, and while we will TOUCH on the basics of parametric animation, we won’t dig into it very much beyond that.   If you are making a next gen AAA title, parametric animation may possibly be for you (and maybe not), but with the rise of 3d in flash, the rise of mobile games, and also indie game development, I think traditional pose driven skeletal animation is here to stay at least for a while.

Depending on the needs of your project, and how high a quality bar you want vs how much CPU time you want to spend on animation, some of these features may not be appropriate.  Feel free to take what is useful to you, and leave what isn’t.  Every game is different.

Animation Controller v1 – Super Simple

The simplest point we will start out is that if you have a mesh with an animation controller on it (to control what animations should play on it and such), it has these features:

  • If you tell it to play a looping animation, it will continue playing that looping animation forever.
  • If you tell it to play a non looping animation, it will play the animation and have some way of notifying you when the animation is done.  This is either by having it call a callback when it’s done, or by setting some flag on itself saying that the animation is done (won’t ever get set on a looping animation)
  • You should be able to tell it a playback multiplier to play the animation at, such as if you tell it to play at 3.0, it will play 3 times as fast, or if you tell it to play at 0.5, it will play half as fast and look like slow motion.
  • If you tell it to play an animation while another animation is playing, it will instantly stop the animation it’s playing and start playing the new animation.

With this simple animation system, we could conceivably make a game that has animated characters.

That being said, the animation system is lacking in a few ways:

  1.  You can only play full body animations, meaning if you want the lower body to look like it’s jumping, and the upper body to look like it’s firing a rifle, you have to make an animation that looks like that.  If you want the same thing, but you want the lower body to look like it’s standing around while the upper body is firing a rifle, you have to make an entirely different animation that looks like that!  The permutations of actions can get quite large and you have to decide in advance which animation you want to use.  That is, when the player is jumping, they cant change their mind that they suddenly want to start shooting.
  2. When you switch animations, there is visible “popping”.  Popping is when a bone goes from doing one thing to doing something else instantly.  It looks like the bone teleported and is very visible to players.  It looks buggy and unpolished.
  3. If you are doing something like having the player throw a grenade, you have no way of knowing when to actually spawn the grenade model, and where to spawn it.  You could “hard code” it to spawn at the same place relative to the player each time, when the animation stops playing, but that is pretty hackish and not very maintainable.

Lets start off by working on solving problem #3 of not being able to specify where to spawn a grenade or when to spawn it.

Keyframe Strings

To solve the problem of WHEN to spawn it, a feature common to nearly all animation systems is the ability to put game engine events on animation key frames.

This way, when the arm is at the correct position in the throw animation, someone would be able to put an event like “throw grenade” on that animation key.  When the animation reaches that animation frame, it sends the message to the game engine, which can then create a grenade (with any specified parameters to the event).

Often times I’ve seen this implemented as an actual string that is associated with an animation key frame.  The strings might be things like:

Playsound Laugh.wav   (to play a sound to go along with the animation)

SpawnPhysicsProjectile  Grenade.mdl 0 0 5   (to spawn a projectile with the specified mesh and velocity vector)

FootFallSound (This would tell the engine to play a footstep sound, based on the material the player was standing on, such as a metalic sound if on metal, or a duller thud if walking on dirt)

You could also use it to hide and show attachments or a myriad of other things.  Basically you can use it for anything that you want to be tied to an animation.

Usually you’ll want some kind of editor for animators and other content creators to be able to associate these key strings with specific key frames.   If they have to work with a text file where they have to hand enter times and key strings associated with those times, it’s going to be really tedious and they are going to be sad.  Also, it will be very error prone which makes everyone sad when it generates more bugs than it needs to, slowing down dev time.

On the topic of creating unnecessary bugs, while i’ve often seen keystrings implemented as actual strings, it’s actually a lot less error prone if you have some kind of structured input system in your key string editor.

For instance, instead of them typing a command name and supplying any required parameters, it would be a lot better for them to have to choose a key string command from a drop down list.  When they choose one, it should display any parameters that might be needed, and have some way of validating that their input is valid.

This editor should be tightly coupled with your game engine.  Example ways for doing this including having a shared header file that defines all key string commands and what parameters they require, or having the key string editor load a game dll to get at the data that way.

If you have to manually maintain the tool to match game code, it will often get out of sync and cause you pain you don’t need.  Avoiding that pain means you can work on developing more features instead of fighting reoccurring bugs, and means QA can focus on finding harder to find bugs.  In the end it means a better product which is great for the company, your continued paycheck, and the player’s experience.

Some other potential bugs can come up with key frames that I don’t have a good answer for, it’s just something you have to mindful of.

One of these bugs is that when an animation is interrupted, a key frame might not get hit when you expect a key frame to get hit.  For instance if an animation attaches something to a players hand, and at the end of the animation hides that attached object, if you interrupt the animation midway through, it won’t get hidden and the attachment will be stuck to the hand as the player does other things – which looks very weird.  Your best bet is to design things in such a way that if key strings are missed, it isn’t a problem.  Not always possible with all features unfortunately though…

Another problem that comes up when you have more advanced anim systems is that you may be blending out an animation which is no longer relevant, but while it is blending out, it hits a key frame.  For instance if you a player is holstering a weapon, but blending out a fire animation that got interupted, you may get a “firegun” key string command, when you really don’t want it because it’s not relevant anymore.  Sometimes you would want a key string to fire in that case though, so there is no real global solution to the problem that I’m aware of.


Now that we have a way of knowing WHEN to spawn a grenade in a grenade throw animation, we don’t know WHERE to spawn it.  This is where sockets come in – no I’m not talking about TCP/IP or UDP sockets!

A seemingly obvious solution is probably to say which bone to spawn the grenade on in the “throw grenade” animation key string.    An issue here though is that maybe if you spawn it right on the “rhand” bone, it might clip through the hand (inter-penetrate the hand) and look sloppy.  Also, for other use cases, you might want to attach something where there isn’t a bone nearby.

Another seemingly obvious solution might be to add extra bones to the animation data that aren’t tied to any real geometry.  This way, you can use the bones to attach things to, or spawn things at, but they aren’t tied to any real model geometry so you can make them move however you want.

The problem with this solution is that you are paying the cost of animating those bones even if you aren’t using them for anything.  Enter sockets!

Sockets are a transformation (translation and rotation) away from a specified bone.  They are usually only calculated on demand so that when you aren’t using them, you don’t pay a price for having them.

This way, sockets act as very cheap attachment / reference points on a model during animations to attach other models to (such as capes, helmets, guns, grenades).

When a key string command takes a socket or bone as a parameter, you should have it accept either a bone or a socket.  They should be usable interchangeably, because sometimes you really do want to attach something to a bone, and you shouldn’t make an animator make an extra socket just to make it match a bone.

We now have a way of specifying WHEN to spawn a grenade (via a key string), and also WHERE to spawn it (specifying a socket to spawn it at as a parameter to the key string command).

Animation Controller v2 – Blending

I mentioned popping earlier and said it was caused by a bone changing where it is or how it’s moving by a drastic amount in a single frame.  If you’ve read my DIY Synth articles, you probably remember how important in audio programming it is to make sure that your sound data stays continuous.   The same is true of animation data, you have to make sure that bone motion / position stays continuous always, or else you’ll get popping.

Just like in audio programming, you use envelopes to help keep things continuous when you add a new animation into the mix, or remove an old animation.

For instance, If a model is playing one animation and you tell it to play another, the new animation should start at a blend weight of 0.0 and slowly increase while the old animation decreases from a blend weight of 1.0 down to 0.0.  This gives you a nice smooth blend between two animations and works for MOST animations (more on that in a second).

Typically, when crossfading from one animation to another, the magic number is to blend over 0.2 seconds, but certain uses may warrant a longer or shorter blend time.  You might also blend out the old animation at a different rate than you blend in the new animation.  Give your animators the option to choose so they can do whatever they need.  They will be happy that they have the control, and you will be happy that you don’t have to one off program things all the time for them.  Everyone wins!

What happens if you want to play an animation while an animation blend is in progress already?  0.2 seconds of blend time sounds like a short amount of time, but this actually comes up ALL THE TIME.

There are two ways to deal with this issue that I’m going to talk about.

The first way to deal with this problem is to keep a list of all the animations that are currently playing, so that if you tell the animation controller to play a bunch of different animations really quickly, it will end up sampling a bunch of different animations as various  ones blend out, and the final one is blending in.  This can result in A LOT of animation sampling which can take a serious toll on your game’s performance.  I encountered a bug on a game I worked on once that caused around 100 animations to be getting sampled on a single model for several frames due to this problem and it made the game tank HARD.

The second way to deal with this, and how I like to implement it usually, is to make it so only two animations can play at once (a main animation and a blend animation) and you have another field on the animation controller which says what the next animation  to blend in is.

Going this route, when you say to play a new animation while a blend is in progress, it goes into the “next animation” field.  When the current blend is done, that next animation will blend in and the last one will blend out.

If there is already another animation in the “next animation field”, it’s replaced and it’s never seen.

This way, only two animations will be sampled / blended at a time maximum, yet you will get a perfectly smooth blending between animations, and the controls will still feel fairly snappy, although there may be a noticeable delay in control response if animations change a lot really often.  You’ll have to make a judgement call about the needs of your game.

Lastly, I said blending works nicely for most animations but not all.  One exception to this rule is when you try to blend different lower body animations together, such as trying to blend a walk animation and a run animation together.  Often times, the feet will be in different places and when you blend them, it makes the feet look like they are doing a little stuttering dance and it looks ugly.  I’ll talk about getting around this specific problem in the next part, but as a preview, the short version of the solution is to make sure the feet are in the same positions at the same time for the two animations.

End of Part 1

At this point we have a fairly nice animation system but it isn’t quite ready yet. The most glaring problem we have is that we can only play full body animations still, which is not acceptable.  A real animation system NEEDS to be able to play different animations on different sets of bones independently.

We’ll tackle that problem, and others, in part 2.

Encryption 101: Realistic Security

This is the fifth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Realistic Security

“Everyone has a plan ’till they get punched in the mouth.” — Mike Tyson

Cryptography is really awesome, and as a friend of mine said today  (BOOOOOORIIIIIIIS your grandma is calling you!) , there’s a certain mathematical purity to it that’s really appealing.

However, in most security systems, cryptography is not the bottleneck. There’s often way easier things to attack and often you just need to defeat the weakest link in the chain to break open the whole thing.

A popular and successful method of attacking secure systems is something called Social Engineering which you see a lot of in movies like “mission impossible” and “sneakers”.

Social engineering is when you chat up the receptionist and get her to give you info she really ought not to give out, or when you call a company claiming to be maintenance and asking for the door code to get in after hours. Often much easier than trying to factor gigantic primes or the like 😛

Beyond social engineering, there is also physical security to watch out for. I attended DEF CON in vegas for a few years with my good buddy LagGod and learned some really interesting things.  DEF CON has gotten pretty packed in recent years but i highly recommend going if you are at all interested in security. Lots of really talented people on both sides of the fence (attackers aka black hats, and defenders aka white hats) and even some feds and random technophiles thrown in. Here’s two really memorable security lessons I learned at those conferences that really put security into perspective for me.

Hacking Into a Wifi Network the Easy Way

Note: this no longer works as advertised, thanks to advancements in wifi security technology, but the principles are still interesting and could work in other situations you may find yourself in.  Also, it’s good to know weakness of systems past and present to better protect other systems.  Otherwise, only the criminals have guns and we are all screwed 😛

Ok so lets say that you want to hack into a company’s network, and lets say that they have a wireless router where when you first try to access it, you are presented with a web browser login screen to type in your username and password.

How wifi networks used to work is that if you were trying to connect to a wifi network, it would pick the router that had the strongest signal that was broadcasting the network id that you wanted to connect to.

What this means is that you as a hacker could drive into the parking lot of a company and broadcast their network id with a really strong signal.  Then, when people tried to use their network, the traffic would be directed to your machine.

If you saved the html of their login page before turning on your fake network, you would be able to present a web page to the people hitting your network that looked exactly like the login page they were used to seeing, except that you could take all those usernames and passwords they entered, and log them to a text file!

After you’ve harvested a few logins, you turn off your network and then log into theirs. Thanks for the logins d00ds!

I’m not sure how they solved this problem, but you could probably do something with public key encryption to make sure that everyone who is broadcasting a network id is actually legitimately part of that wireless network.

Defeating Biometrics

Again this is somewhat dated info, but it’s still pretty interesting, and possibly useful for other situations.

It used to be that finger print scanners were a lot simpler (some cheap ones might still be). It used to be that if you mashed a gummy bear onto a finger print scanner, that the scanner would pick up the oily fingerprint of the last person that used it, which surely is a valid user, and so, the door would open, the laptop would unlock, or whatever else.

They fixed that problem by having it detect heat, listen for a heartbeat, and probably lots of other secret or publicized ways, but it used to work pretty regularly!

Something else to say about biometrics is that despite the complexity of the actions they preform, I’ve been told that often times there is just a single wire going into them, and a single wire going out of them. For all those fancy actions, all the thing does in the end is complete a circuit of two wires. If you really need to get in somewhere, you are likely able to smash open the box and connect the wires, circumventing the “infallible” biometrics reader.

Final Notes on Security

Here are some final words on security.

  •  There is no such thing as perfect security, there is only good enough security.  The only way to get perfect security is to lock your computer in a safe and drop it into the Marianas Trench (although I hear you have to watch out for James Cameron these days).
  • Good enough security often means just making sure you aren’t the low hanging fruit.  If you are more difficult to attack than your peers, you are safer than they are.  if you and someone else are running from a lion, you don’t need to outrun the lion, you just need to ourtun the other guy!
  • If your security is based on the fact that your algorithm is secret, that is called “Security Through Obscurity” and is really weak security.  You should assume your attacker knows the details of everything for better security.  Also, secret algorithms don’t get peer reviewed, so weak techniques don’t get weeded out.  Don’t forget that people STILL haven’t cracked the 72 bit RC5 message.  A single message with a 9 byte key, published in the mid 90s, attacked by distributed computer networks, and still, it hasn’t been cracked despite the algorithm being publicly available.  That is some good security right there.

I went to a talk at either DEF CON, or San Diego’s Toorcon (sorry, can’t remember which) where the author Bruce Schneier (who is mentioned in the disclaimer / header of these articles) gave a talk after he had just published a book as a sequel to Applied Cryptography.  He said something like “Throw away the other book… physical security is the only thing you really need to be worried about.”

BTW Bruce, if you are reading this, thanks for that first book anyways man, you rock (:

… and let me know if i misquoted you 😛

Thanks for reading!  Now go forth and cryptophy.  HACK THE PLANET!

Cryptography 101: Encryption – Asymmetric Keys

This is the fourth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Asymmetric Key Encryption (Public and Private Keys)

Unlike symmetric key encryption, which uses the same key for encryption and decryption, Asymmetric key encryption uses one key for encryption and a different key for decryption.

This probably sounds strange why you would want to have two passwords, but the reason is that you keep one for yourself, and give the other one out to another individual or a group.

Because you keep one to yourself (private) and give the other out (public) these are called public and private keys, and this technique is called Public Key Cryptography.

Depending on which key you keep private (the encryption or decryption key), you can get different effects.

Usage Pattern 1 – Private Encryption, Public Decryption

If you keep the encryption key secret, but publish the decryption key out to the public (or to a group of people, or to another individual), what that means is that you can encrypt data which can be read by anyone. What is useful about this is that they have to use your public key to decrypt the data, so they know it was encrypted with your private key, which means they can be reasonably sure that you were the one that wrote the message. You have effectively cryptographically signed your message so that people know it was in fact you that sent that message.

People use this technique all the time in computers, this is how you can verify that something is from a legitamate source, regardless of if we are talking about a web page (HTTPS), a valid device driver (digitally signed device drivers), or other things of that nature.

Another neat thing about this usage pattern is that getting creative, you can also be ensured that the message or data hasn’t been tampered with.

For instance, let’s say you were making a computer operating system where you only allowed the computer to run trusted (signed executables).

Re-visiting a technique mentioned in the first article in this series on hashing, a “signed executable” might look like the below:

  • [Cryptographic Hash of Unencrypted Executable Data]
  • [Encrypted Executable Data]

So, you as the “central signing authority” for the operating system would receive programs from people wanting to release software on your operating system.

First, you would put the software through it’s paces via analysis and testing to make sure  the program worked as intended, was up to the level of quality you wanted software on your OS to be, followed any specific rules about how the software should behave and interact with the rest of the operating system, and also you would make sure the software wasn’t malicious.  Also, you would have to make sure the software wasn’t insecure in any ways that could compromise the rest of your security (for instance, if it had a Buffer Overflow, that could let attackers run arbitrary, unsigned code on your operating system, causing viruses to spread and other malicious things).

Once the program is verified safe, next up you would make the hash of the unencrypted program, write that to a file, then  encrypt the program with your private key and write that to the file after the hash.

You now have a trusted / signed executable to distribute.

When a user downloads this executable from your application store and tries to run it, the operating system could take the following measures to verify that the executable was trusted and unaltered from the time of it’s signing:

  1. Unencrypt the executable using the public key.
  2. Hash the unencrypted data and ensure that it matches the hash at the beginning of the file.

If the hashes match, you know that the executable was indeed signed by the central authority, and that it has not been altered in any way since it’s signing. Therefore, it is safe to run!

I am pretty sure variations of this sort of algorithm are used by things such as the xbox, playstation and iphone / ipad devices.

Usage Pattern 2 – Public Encryption, Private Decryption

The other way to use asymmetric key encryption is to publicize the encryption key, but keep the decryption key private.

What this allows is for anyone to encrypt a message that only you can read.

One thing you could do with this is would be to be able to communicate securely with people if all you had was public communication.

For instance you could post to a public forum saying “This message is for Jesse”, and then put the encrypted data after that.

Since only Jesse knows his private key, and thus only Jesse can decrypt the data, only Jesse will be able to read your message, even though it is visible to everyone.

Despite this, there are still several unknowns in this particular communication, including:

  1. Jesse doesn’t know that you really are who you say you are
  2. You don’t know that Jesse got the message
  3. Jesse doesn’t really know that the message wasn’t tampered with (well… if it’s a text message you are sending, and jesse unencrypts it and it’s garbage, he knows that the message was tampered with, but if the expected data was not so obvious when it was wrong, he may not be able to know that the message hadnt been tampered with).

But those problems, and others, are solvable, which leads to our next point…

Cryptographic Protocols

A neat thing about cryptographic techniques like this one, symmetric key cryptography, and hashing is that they are basically just building blocks that you can stack together in different ways to be able to do useful and interesting things.

Once you learn some of the basic building blocks of cryptography (what this cryptography 101 series of articles is supposed to be all about), you can then learn more about how to put those building blocks together to preform useful tasks.  The recipes for preforming these useful tasks are called Cryptographic Protocols and they can (and often should) contain more than just cryptographic techniques.

In the first usage pattern, I showed how combining asymmetric key encryption with hashing can provide you with a system for creating and verifying trusted executables.  That series of steps for creating and using trusted executables was a cryptographic protocol that contained important steps even beyond just encryption and hashing – such as verifying that the executable was not malicious or insecure.  Leaving those steps out creates a big security hole, so they are very important to the overall protocol.

For the second usage pattern, here’s some cryptographic protocols to solve the problems i called out:

  1. To solve the issue of Jesse not being sure that you are who you say you are, you could take the encrypted message you created, and sign it with your own private key (of which the decryption key is public… this is usage pattern 1).  This way, when Jesse gets the encrypted message from you, he first unencrypts it with your public key, and then unencrypts it with his own private key.  If the message comes out as garbage in the end, he knows that one of the two steps failed.  Specifically, either it wasn’t YOU who sent the message, OR, you used the wrong public key when signing a message to send to him.  Jesse doesn’t know which step went wrong, but he does know the message is invalid one way or another.
  2. To solve the problem of you not knowing that Jesse got the message, you could tell Jesse in the encrypted message “Jesse, if you get this message, respond by sending me back an encrypted message that says ‘the password is forty two'”.   Then, if Jesse got the message, he could encrypt a message saying “the password is forty two” using your public key, and then post it on the board again for you to unencrypt with your private key and see that he got receipt of your message.  While it’s true that anyone is able to encrypt messages meant for you, and so anyone could have written that message, there is some level of security there because the specific message you said to send was encrypted in such a way that only Jesse could have read it.  This way, you can be reasonably sure that Jesse got the note.
  3. To solve the issue of Jesse not knowing if the message was tampered with at all or not (in the case that it’s hard to tell if you got the right data out or not), one way would be to just put a hash of the unencrypted data on the front of the message.  You’d have to agree with Jesse in advance on the protocol, but using the hash again, it would let Jesse know that the data hadn’t been tampered with.

Generation of Key Pairs and Algorithm

By the very nature that these keys work in tandem means that they are somehow linked together mathematically.

I was trying to think of a really simple way to show how public and private keys work together and how they are linked, with a minimal piece of sample code. I thought i had figured out a simplified way, but unfortunately it turned out I was mistaken and my method didn’t work at all.

So, I have to refer you to this page which is pretty darn helpful for understanding how the real thing works with RSA, but unfortunately it doesn’t explain the full nitty gritty of WHY it works to my liking.  Still a very good read though:

Common Algorithms

Some commonly used Public Key Encryption algorithms are SSH, IKE and apparently even Bitcoins use it!.


After I wrote up this article, my friend Patrick corrected me saying that the process i described is not the usual process for digitally signing data. He said:

You got signing a little mixed up for asymmetric. Traditionally the process is:
1. Alice creates a public and private key pair.
2. Alice shares her public key with the world.
3. Alice never shares her private key.
4. Bob can now encrypt messages using Alice’s public key and only Alice can unencrypt them using her private key.
5. Alice can take a hash of something she wants people to verify as coming from her. Alice then signs that hash with her private key. Now Bob can verify the item coming from Alice by taking the hash of the data and comparing it against the hash in the signature using Alice’s public key.

Some additional reference:
RSA Labs Digital Signing Explanation

There are two reasons that I can think of why that process is better that the one I described:

  1. You can sign data without obfuscating it via encryption.
  2. Public key encryption takes a lot of processing power apparently, so you want to minimize how much data you encrypt with it.  This method encrypts a far smaller (and constant) amount of data.

Thanks for the correction Patrick!

Cryptography 101: Encryption – Symmetric Keys

This is the third article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Symmetric Key Encryption

Symmetric key encryption is a fancy name for the type of encryption you are probably most familiar with, which is using a password to scramble and unscramble data to make sure only certain people can see it.

This is in contrast to asymmetric key encryption, where you have two passwords; one for encrypting and one for decrypting (The next article is going to be on asymmetric key encryption).


There are numerous symmetric key encryption algorithms out there but they all have one thing in common: their security relies on only the right people having the password, and the assumption that the best way attackers have for getting the plaintext from the ciphertext is to guess the password via brute force.

In good (modern) algorithms, people say things like “on average it will take geological or astronomical amounts of time to guess a password with the computing technology of today and the projected future” so they are reasonably sure people won’t be able to brute force the password in any useful amount of time.

Quantum computers give some forms of cryptography a scare though, because there is something called Simon’s Algorithm which is a quantum computing algorithm that can brute force search ANYTHING with exponentially fewer operations than classical computing.  This means it can brute force guess passwords of an encryption algorithm a lot faster than a normal computer.  At the time of writing this, I think the record for quantum computing power is something like having 4 cubits work together to do some simple math operation (like multiplication).  We could be on the precipice of disaster regarding cryptography, but luckily there are encryption algorithms that take the same amount of time, or longer, for quantum computing to solve, so it isn’t all doom and gloom.

When decrypting data with either symetric or asymetric key encryption, there is no built in way to know if you had the right password or not. You can know by looking at the recovered plaintext and seeing if you got junk out, or meaningful data, but if you don’t know what the data out is supposed to be exactly, or what it’s supposed to look like, there’s no way to know if decrypted it correctly. This makes it so sometimes it can be difficult for attackers to even KNOW if they have guessed the right password or not, which is good for us folk trying to protect data.

Just like a good hashing algorithm, small changes in input should ideally yield large changes in output, which makes it a Chaotic Function and makes it so the cipher text gives as little information about the plaintext as possible.

Sometimes people will use multiple encryption algorithms on a piece of data in the hopes of making it harder to crack, which sometimes works, but can also be fairly dangerous.

To understand the danger, consider how every program, no matter how complex, is essentially a traditional algebraic function (with perhaps lots and lots and lots of terms).  For encryption, the input is the plain text and key, and the output is the cipher text.

Now, just like in junior high and high school, sometimes when you plug one function into another like f(g(x)) and preform algebraic substitution, terms from f and g maybe cancel out.  You may end up with a function that is less complex than either f(x) or g(x), or it just may be less complex for certain values of x.  An attacker could exploit these attacks to their advantage and it might be easier for them to recover some or all of the plaintext because you used two encryption algorithms instead of one.

On the other hand, using multiple algorithms, or the same algorithm multiple times (perhaps with different keys) can also make it a lot more secure.  It’s just something to be mindful of.

Clever programmers and mathematicians sometime come up with encryption techniques where attacking the algorithm itself is the literal equivalent of having to solve famous unsolved math problems from the ages.  These often seem really secure because for some of these problems, the best and brightest minds in all of history have been fighting with the problems for hundreds or thousands of years and making no progress.

Every now and then, some smarty figures one of these out though, and suddenly, encryption algorithms based on it become essentially worthless.

Another common way that people attack ciphertext is via something called a “known plaintext attack”.  What this means is that if the attacker knows any part of the plaintext before it became ciphertext, they can sometimes leverage that knowledge to know a bit more about the key or algorithm used to encrypt the data.  That simplifies their work and makes it more likely that they can get the plaintext back without having to revert to brute force.

One really common way this comes up is if people do something like compress their data before encrypting, or they encrypt known file types like executables, word processing documents, image files etc.

The reason for this is because in all of those file types, there is a standard, well known header that those files have, which allow other programs to use them.  That header data is known plaintext and can be used by an attacker to get more information how to recover the plaintext.

For all the clever people out there trying to make encryption based on super advanced mathematics, in the end, some of the very most secure algorithms out there are based on very simple computing operations such as addition, subtraction, bit rotation, and XOR.

As an example, there is an algorithm called RC5 which only uses those basic operations (you can find the source code for it easily!) and yet is extremely secure. The makers of RC5 published their source code, and encrypted some data with various key sizes (7 byte, 8 byte and 9 byte) in 1994, and it took something like 5 years for the first one to be cracked (via brute force), 10 years for the second, and they project that cracking the third will take 200 more years. More information available here: RC5

Algorithm Components

A symmetric key algorithm is any deterministic algorithm where given a key, has the ability to obfuscate (hide / scramble) data, and then later given the same key, has the ability to undo the operations that it did to get the original data back.

Since all operations have to be reversible, that limits you to non destructive operations.  XOR isn’t destructive, because A XOR B XOR B = A.  Addition and subtraction isn’t destructive, because A + B – B = A (even true when you wrap around the max size of your integer).  Division is destructive however, because when you divide on a computer, you have finite precision (even with floating point numbers) which means you can never fully recover the origional data when trying to undo a division with a multiplication.  Bit rotation is another operation that isn’t destructive.  NOT isn’t destructive, but AND and OR are destructive.  Another operation that isn’t destructive is moving bytes around, since you could just do the moves again in reverse order to get the original data back.

As simple as all this sounds, these are essentially the building blocks of all encryption algorithms.

Example Algorithm

Here’s an example algorithm that you could use to encrypt and unencrypt data.  I don’t do any byte swapping (moving bytes around), or bit rotation, but those would be some good ways to improve it.

//Takes a pointer and length so you can encrypt binary data as well as text
//the pOutData parameter should point to memory that is the same size as pData
//If bEncrypt is true, it will encrypt data. If bEncrypt is false, it will decrypt data.
void EncryptData(const unsigned char *pData, int nDataLength, unsigned char *pOutData, const unsigned char *pKey, int nKeyLength, bool bEncrypt)
  int nKeyIndex = 0;
  unsigned char nRunningSum = 0;
  for(int nDataIndex = 0; nDataIndex < nDataLength; ++nDataIndex)
    //update our running sum
    nRunningSum += pKey[nKeyIndex % nKeyLength];

    //get our current byte of plaintext or ciphertext
    unsigned char nDataByte = pData[nDataIndex];

    //to decrypt, it subtracts a running sum of the key then xors against the current key byte
      nDataByte -= nRunningSum;

    //do our xor, whether we are encrypting or decrypting
    nDataByte = nDataByte ^ pKey[nKeyIndex % nKeyLength];

    //to encrypt, it xors against the current key byte and then adds a running sum of the key
      nDataByte += nRunningSum;

    //set the output data byte
    pOutData[nDataIndex] = nDataByte;

    //move to the next byte in the key

Also, here’s some example code of how to use this function:

void DemoEncryption()
  //our key and plain text
  const char *pKey = "MyKeyIsFairlyLongButThatIsJustFine!124351 seven";
  const char *pPlainText = "This is some plaintext, how do you do?";

  //allocate space for our cipher text and recovered plain text
  unsigned char *pCipherText = new unsigned char[strlen(pPlainText)];
  unsigned char *pRecoveredPlainText = new unsigned char [strlen(pPlainText)+1];

  //print out our plain text

  //encrypt the plain text
  EncryptData((unsigned char *)pPlainText,strlen(pPlainText),pCipherText,(unsigned char *)pKey,strlen(pKey),true);

  //print out the cipher text as hex digits
  for(int nIndex = 0; nIndex < strlen(pPlainText); ++nIndex)
  //decrypt the cipher text to recover the plain text
  EncryptData(pCipherText,strlen(pPlainText),pRecoveredPlainText,(unsigned char *)pKey,strlen(pKey),false);

  //print out the recovered plain text after we null terminate it

  //free the memory we allocated
  delete[] pCipherText;
  delete[] pRecoveredPlainText;

Common Algorithms

Some commonly used symmetric key encryption algorithms in use today are AES, Blowfish and 3DES.

Until Next Time!

That’s it for symmetric key algorithms, next up I’ll be talking about asymmetric key algorithms, which have some pretty interesting uses.

Cryptography 101: Encryption – One Time Pad

This is the second article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Plaintext, Ciphertext and Keys

When talking about encryption, you’ll often hear two terms: Plaintext and Ciphertext.

The plaintext is the unencrypted data and may be either text or binary data.

Ciphertext is the encrypted data.

Ideally, the ciphertext will give no information about the nature of the plaintext that created it, other than perhaps the size of the plaintext itself.  Good ciphertext will look indistinguishable from random numbers, both by the human eye and mathematically.

This is because the point of encryption is to hide any patterns in the data, and good encryption will hide all discernible patterns. The only possible exception to this would be if the encryption process made ciphertext with misleading patterns that didn’t give any information about the plaintext.  I’m not sure if this comes up in practice, but it definitely could.

Another term you’ll hear often is “Keys”. A key is just the data that you encrypt or unencrypt data with. You can think of it as the password.

The One Time Pad

The one time pad is an extremely simple, yet secure way of encrypting data.

It is so simple that it only uses the xor operation, and is so secure that the ciphertext is literally uncrackable if done correctly.

The downside is that it requires a lot of pre-shared data which gets used up as you encrypt data. When you run out, you have to share more of this data if you want to keep communicating with that person.

This pre-shared data is the key used for encryption and unencryption.


To use a one time pad, you first gather a large amount of random data and share that with the person you want to communicate securely with. This is the one time pad itself and you’ll want one byte of random data for each byte of information you want to send to that person.  This step is also the crux of the security.  You need to make sure that nobody else is able to get the one time pad except your intended target, and you also need to ensure that you have high quality random data (more on that later on).

To encrypt data, you take one byte from the one time pad for each byte of data you want to encrypt and XOR them together. When you are done, you throw away the used bytes of the one time pad and never use them again.

Then, you send the ciphertext to the person you already pre-shared the one time pad data with.

To decrypt the data, that person xors each byte of the encrypted data with a byte of the one time pad, and they also throw away each byte used of the one time pad just like you did.

When they are done decrypting, they will have the plaintext data, and their one time pad will be in the same state that yours is in (ie their next number will be your next number).

From here you can rinse and repeat until you run out of one time pad data.

Super simple, and as long as nobody else has your one time pad data, and your one time pad data is truly random, nobody will be able to crack your ciphertext and get the plaintext.

The Importance of Randomness

Besides securely transmitting the random data,the other crux of the security i mentioned was the quality of the random numbers in your one time pad.

The reason this is important is because if the numbers aren’t truly random, there will be patterns in the data. If there are patterns in the data, people can possibly discover those patterns, thus being able to separate the plaintext from the key and unencrypting some or all of your data.

Randomness comes up EVERYWHERE in cryptography, both in input and output to cryptographic algorithms. Because of this, truly random data is often somewhat of a commodity to cryptographers. Since re-using random data means that its slightly less secure (would be attackers have a pattern to gain knowledge with if you re-use your random numbers!), it’s also a consumable commodity!

In fact, there are famous books that are nothing but hundreds and hundreds of pages of random numbers generated from various real world sources – such as taking the wind speed over time in Juneau, Alaska and multiplying it by static gathered from a radio antenna which is tuned to dead air. Using real world data like that, people can be relatively sure that the data doesn’t have any discernible patterns. They just have to watch out for those pesky physicists unlocking the nature of the universe and finding the patterns in the background radiation 😛

I’m not even joking about these books by the way, check this out, here’s one such book!
A Million Random Digits with 100,000 Normal Deviates

Using random numbers from a published book makes your random numbers slightly less random (since other people have the book too, and attackers may notice it on your bookshelf or something), but so long as you don’t just use the numbers of the first or last pages (or anything else predictable), and the book actually contains high quality random numbers, it ought to be fine.

you can also BUY large amounts of high quality random data online from places like

The astute reader might ask “Why don’t i just use a pseudo random number generator on each side and never run out of one time pad data?”.

Well, if someone knows the PRNG you are using, and your seed, they would be able to unencrypt your data just like your intended target can.

HOWEVER, this kind of setup can be appropriate sometimes if you know the risks and are ok with them. Check out this wikipedia page for more information:
Cryptographically Secure Pseudorandom Number Generator

Specific Attack Against Randomness

As an extreme example, lets say that instead of random numbers, your one time pad data is all 0xFFFFFFFF and that you are using it to encrypt a text file (say, this article for instance).

When you encrypted your data by XORing each byte against 255 (0xFF), all the bits of each byte would be flipped from 0 to 1 or 1 to 0.

While it’s true that it would make the data un readable, and seemingly random, garbage data to the human eye, mathematically it’s a very different story.

If someone were analyzing your ciphertext, they would first notice that the byte value 154 (which looks like Ü and has a binary value of 10011010) occurs in the ciphertext roughly the same amount that the letter ‘e’ appears in the typical english language text document. This would be astute because that value of 154 is just the flipped bits of ‘e’ which has a byte value of 105 and a binary value of 1100101 (the binary bits are just flipped due to the XOR against 0xFF).

Then, they may notice the same for other letters… that some other value occurs as often as you’d expect an ‘o’ to appear in english, or an ‘m’ etc.

Pretty soon they have a clear picture that this is english plaintext, and they can start replacing letters with what they seem like they should be statistically (for the statistically significant letters).

After that, they have some of your plain text, and figuring out the rest is similar to playing sudoku… figuring out which letters fit where, based on how words are spelled, and then doing a find / replace in the entire document for each letter you figure out.

In the end, they have your plaintext and your encryption failed you.

This is an extreme case that is really simple to break, but hopefully you can see that if you even use slightly lower quality random numbers (such as the built in rand() function of C++, whether or not you use srand(time(0)) or not!) that you open yourself up to attack and it can compromise your whole communication stream.

Requiring Less Pre-Shared Data

You can modify the one time pad algorithm to use less pre-shared data if you are ok with the changes in your security profile (your data may be weaker against some attacks, stronger against others).

There are many ways to skin a cat but I’ll just talk about a couple.

One way would be to generate more random data from the random data you do have. For instance, if you and the person you are pre-sharing data with agree on a protocol of MD5 hashing every 100 bytes of one time pad data to generate more random bytes that you can interleave with your one time pad data, you would have a way of generating 16% more one time pad data than what you gathered or shared with the other person. (16% more because MD5 hashes of 100 byte blocks spit out 16 byte hashes of seemingly random numbers – see the previous article on hashing for more information!).

However, doing this obviously makes the “random” data *somewhat* lower quality since there is a pattern to some of the random data. As non obvious as that pattern may be, if someone were to do fancy mathematical analysis of the data, this sort of technique may cause patterns to crop up which lead to a “chink in the armor” giving the attacker a foothold in recovering all or some of the plaintext.

Another way of making your one time pad go farther is instead of XORing the one time pad data against the plaintext and ciphertext to encrypt and unencrypt, you can use the one time pad to give you the keys (passwords) to encrypt / decrypt each communication.

For instance, if you and the person you are communicating with agree in advance on a symmetric key encryption algorithm (more on this topic in the next article!) that takes a 16 byte encryption key, you could use every 16 byte block of one time pad data for an entire single message no matter how large the message is.

For instance, you could encrypt 2GB of data using the first 16 bytes of a one time pad, send that to the person, then you encrypt 500MB with the next 16 bytes and send that to the person.

You’ve effectively used 32 bytes of your one time pad to encrypt 2.5GB of data, which is a crazy good ratio compared to the traditional one time pad protocol which would have required 2.5GB of pre-shared one time pad random data.

If you go this route, your ciphertext now becomes vulnerable to whatever attacks your symmetric key encryption algorithm are vulnerable to though. If the algorithm you are using turns out to have a serious flaw that mathematicians find out about (such as there’s a really easy way to recover the plaintext – this happens fairly often believe it or not!), your whole communication channel is screwed, whereas with the one time pad, it’s just the quality of your random numbers, and the security of your pre-shared data that define the security. So, there are definitely pros and cons to weigh.

Other Weaknesses

There are a lot of ways to attack each cryptographic technique, and if you are serious about cryptography you really need to read up on a lot of things and be extremely clever, thinking of every possible situation that anyone else might think of.

Security is hard because often times you have a limited amount of time to implement your security (because you need to ship your software or open your service to the public SOME DAY), and there are most certainly more attackers than there are security professionals on your team, and they have all the time in the world to search for what you’ve missed! Just as there is no rest for the wicked, the same too is true for security professionals.

I mentioned that the quality of your random numbers and the security of your pre-shared data was the lynchpin of protecting against people getting your plaintext from your cyphertext, but there is another way to attack the communication channel as well.

Namely, if someone were to intercept a message between you and your target person, they may not be able to get your plaintext out, but if they can keep that message from getting to your target, and do so in a way that you aren’t aware of this, they can completely break your communication channel.

The reason for this is that doing this makes the one time pads of you and your target person get out of sync when you throw away one time pad data that the target person did not throw away. This means that the random numbers you are using to encrypt your data is not the same numbers your target person is decrypting data with, so they will get garbage, random data as output and not be able to recover the plaintext.

A malicious person in the middle was able to thwart your ability to communicate securely!

Also, if a person was able to modify the ORDER that the target person got the encrypted messages in, they would be able to break the channel that way as well (at least temporarily) since it would make the recieving person unencrypt the messages with the wrong pieces of data. The next message the person got would be unencryptable in this case though, since the same number of bytes were used up by the out of order messages as if they had come in the right order.

This is not the traditional man in the middle attack, but it is definitely *A* man in the middle attack.

As with so many things, there are often strange, non obvious connections between different subjects.  Case in point, one way to protect against these sort of attacks of lost or re-ordered messages would be to implement the sorts of algorithms used in network programming (like those used in TCP/IP) that ensure “guaranteed” and “in order” communication between two computers or individuals.

Going this route, just like how computers on the internet can know when they got message B but haven’t received message A yet, or that when they sent a message to another person that it never got there, you too would be able to know if a message got to the target, and they would be able to know if they have received messages out of order or not.

Until Next Time!

That’s the essence of the one time pad and I hope you found it interesting!

Next Up I’ll be talking about symmetric key algorithms which are the more traditional way of encrypting where you use a password to protect data.

For those interested in cracking encrypted data (which technically is against the DMCA these days, but used to be a common academic activity, and a way of weeding out insecure algorithms), here’s a nice morsel for you. It’s hexadecimal encoded encrypted data.  Every 2 hex characters equals one byte of encrypted data. If you use the information from the article, you ought to be able to crack it (there’s an easy way and a hard way).

And no, cracking the encrypted data below is not even technically against the law, I’m giving you explicit permission to crack it if you can (:


Cryptography 101: Hashing

Welcome to the first article in a series aimed to teach the basics of cryptography:

In this digital age, cryptography is more important than ever.  It’s used to protect financial transactions, ensure the anonymity of political dissidents, protect private conversations, help prevent cheating in video games and many other things as well.

DISCLAIMER: These articles are meant for educational purposes only.  The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications.  For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved.  Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Hashing in Computer Science

You may be familiar with the term “hashing” from computer science. In computer science, a hash function is a function which takes input data, preforms some operations on it and spits out some (usually) smaller output data that can be used more or less as a unique identifier for the input data.

Common uses of hashing include:

  • Hashing the contents of large files to be able to compare the hashes to quickly know if they are the same or different, instead of having to compare the files byte by byte.  This is especially useful when comparing files over a network connection.
  • Hashing peices of data which are difficult or time consuming to compare (such as strings) and using the hashed value as a “look up key” within a database or array or list, so that you can look up items very quickly by their hash, instead of having to do more expensive string compares (or whatever other more complex comparison and lookup methods).

More info is available here:

Hashing in Cryptography

As you can probably guess, since hashing makes large pieces of data (such as entire files) into small pieces of data (often only a handful of bytes large), there are many pieces of larger source data that can result in the same smaller hashed data.  When this happens, it’s called a hash collision and a good hash function will do it’s best to minimize collisions for optimal performance.  The more output bits you have, the more “space” you have before a collision is unavoidable.

Often times, a good hashing algorithm will have 2 properties to minimize collisions…

  1. Small changes in input give large changes in output.  In other words, it’s very sensitive to initial conditions and so is a chaotic function (
  2. If you give it a set of well distributed random inputs, it should give a well distributed set of random outputs.  Heck, if you give it any set of (varying) inputs, it should give a well distributed set of random outputs ideally.  By random i mean no discernible patterns.

If these things aren’t true, the hashed output can give clues as to the nature of the input, or, it can make it easier to provide input that hashes to the same output (which is the main way to attack hash based security).

For instance, if your hashing algorithm made an 8 bit hash (very small!) that always (or often) set the 7th and 8th bits to 1, that means effectively you really have a 6 bit hash, because 2 of the bits are almost always the same. In general, more bits means more security, since it’s harder to get a hash collision on purpose.

Quick aside, can you think of something else with these properties?  Some deterministic algorithm that spits out chaotic, well distributed, seemingly random numbers based on (perhaps) non random input?  How about a pseudo random number generator?  There is a lot of crossover between these two types of algorithms and I find it pretty neat that they are working towards almost the same goals, but that they are used for such different things.

Also as you might guess, hashes are one way.  If you are given hashed data, it’s difficult or impossible to work backwards and get the source data back again.  In fact, you often hear hash functions referred to as “one way hash functions” because of this.  This is important because in cryptographical uses you want a hash to reveal as little information about the source data as possible.

Example Uses of Cryptographic Hashing

Here’s two examples of places where hashing comes in handy. One is for protecting passwords, and the other is for protecting save game data of video games.

Protecting Passwords

For protecting passwords, many times when you have a large online service such as facebook, youtube, etc, there will be a central database (cluster) storing everyone’s account information, including their passwords.

If someone were to hack a server and get access to the user database table, they would have everyone’s username and password and the users would be screwed.

A way that people address this is to store the HASH of a each password in the database table instead of the password itself. When people log in, the server takes the password that it received from the user, puts it through the hash algorithm, and compares it to the hash stored in the database. If the hashes match, they know (can assume with a good level of certainty) that the user is who they claim to be. However, if this server gets hacked and the database is compromised, the attacker won’t have the passwords, they will only have the hashed passwords.  The attacker will have to try and brute force the hashes, which is essentially the same as having to brute force the password – except that they can do it on their own computer in their own time of course, which makes it easier and untraceable unfortunately.  Hopefully if this happens, the service can tell their users to change their passwords before the attacker is able to crack many of the logins.

Protecting Save Game Data

For protecting save game data, hashes are used in conjunction with encryption to prevent both read and write access to the save game data.

To write this protected save game data, you first hash the unencrypted save game data, and write that to the front of the file.  Next, you encrypt the save game data and write that after the hash.

When reading save game data, you read in both the hash and the encrypted save game data.  Next you unencrypt the save game data and hash it.  Then, you can compare the hash you made with the hash stored in the file and if they don’t match, you know that someone tried to tamper with the file and you can consider it invalid / corrupt.

Also, since the save game data is encrypted, it’s difficult for a user to read the data in your save game data.  Thus you protect the file from both reading and writing.

It’s possible that the person could modify the data in such a way that it will unencrypt and then hash to the same hash value stored in the beginning of the file, but it’s extremely unlikely, and also even less unlikely that doing so will result in something favorable for the attacker.  They can’t even be sure they are increasing a value thanks to the encryption function scrambling the data completely.

Hashing Algorithm Overview

In a nutshell, besides all the stuff we talked about above, a hashing algorithm is just a deterministic algorithm (meaning it acts the same way every time, no randomness) that takes some input, chews it up, and spits out some (often) smaller piece of data to represent it. When chewing it up, it can do things that are destructive to the data (such as integer division, which loses precision) and isn’t just limited to non destructive operations like encryption algorithms are (non destruction operations can be reversed, such as XOR, addition, subtraction, bit rotations).

As an extra piece of security, people often “SALT” their hashes which means they hash some constant before hashing whatever data they want to hash. This constant is called the salt and you can think of it kind of like a password. This way, even if someone knows what algorithm you are using to hash data (such as the popular MD5 or SHA-1 hash functions), you’d also have to know the salt used to more effectively attack the system.  It’s a little extra bit of security, which is always nice.

Example Hash Function

Here’s an example hash function in C++.   Again, note that this is not really fit for real world use or important situations, it’s just for educational purposes.  You’d want to do more “chewing” and use different operations, bit rotations to make sure all the bits got “hit” by the xor’s, etc.  Check out some more complex, real world hashing algorithms for more info!

//assuming sizeof(int) == 4
typedef unsigned int uint32;

//Takes a pointer and length so you can hash binary data as well as text
//note that this function as is won't give the same answers on machines with different endian-ness
uint32 Calculate4ByteHash(const unsigned char *pData, int nDataLength, const unsigned char *pSalt, int nSaltLength)
  //setup some variables
  uint32 nHash = 0;
  unsigned char *pHashPointer = (unsigned char *)&nHash;

  //salt the hash
  for(int nIndex = 0; nIndex < nSaltLength; ++nIndex)
    pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pSalt[nIndex];

  //hash the data
  for(int nIndex = 0; nIndex < nDataLength; ++nIndex)
    pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pData[nIndex];

  return nHash;

Rainbow Tables

Assuming the algorithm meets the critera above, the only real way to attack something secured by hashing (besides asking the nice receptionist for the secret info while batting your eyelashes) is to brute force hash a bunch of values until you find something that gives the same hash as what you are looking for.

Unfortunately, there are something called “Rainbow Tables” where people have gone through and created tables of unique hashes and “source data” that results in those hash values for common algorithms.  This way, if for instance, an attacker saw that a hashed value was “3”, and he knew you were using the MD5 algorithm, he could look at an MD5 rainbow table to find the value he could put into your system to result in a hash value of “3” and thus he’d gain some ground at attacking your security.

Of course, if you salt your hash, he would have to find out your salt value too and perhaps salting would invalidate the rainbow table entirely (depending on the algorithms used).  Also, the more bits your output hash contains, the larger a rainbow table would have to be, so if you really wanna screw with would be attackers, make your output bit count larger – it makes their job exponentially harder! (:

Popular Hashing Algorithms

Two common hashing algorithms used for various real world applications are MD5 and SHA-1.  You’ve probably seen them around, especially if you’ve used open sourced software.

Recording lagless demo videos of a laggy game

Often times when developing a game, you’ll want to record a demo video to show to a publisher, show at E3, post on kickstarter, youtube, or other places to help generate interest or gain funding to keep your project going.

Unfortunately, the point in time that you need a video is often in the beginning of the project, when your game probably doesn’t run very fast, or might have performance spikes, making it difficult to get a high quality video capture.

Many times, developers will have to have a performance push to get the game up to speed for a demo video, spending time on “demo hacks”, which are often just throw away code for after the video is made.  I’ve been through a couple of these myself and they are not fun, but they are an unfortunate necessity.

This article will explain a fairly simple technique for getting a full speed recording of your game engine with perfectly synchronized sound, no matter what speed your game actually runs at, saving you time and effort, not having to waste time on demo hacks just to get a presentable video.

Playable demos are a whole other beast,  and you are on your own there, but if a video will suit your needs, you’ve come to the right place!

I’ve used this technique myself in a couple different games during development, and in fact included it as a feature of one PC game I shipped in the past, called “Line Rider 2:Unbound”, so this is also a technique for adding video recording to any game you might want to add it to.

Out of the box solutions

There are various “out of the box” ways to record a video of your game, but they have some downsides which make them not so attractive.

For instance, you can get fraps which will record any application’s audio and video and you could use that to record a video of your game. The downside here is that if your game lags, so does the video, so we still have that problem. Also, the act of recording competes with your game for resources, causing your game to run at an even lower FPS and making an even worse video.  Fraps is also limited to specific platforms, and you may be working on an unsupported platform.

lastly, if you want to include this feature of video recording in a shipped product, you will have to license fraps for that use, which may be prohibitive to your project’s budget.

Other video recording software has the same or similar issues.

Rolling your own – Video

Making your own video recorder built into your game has some real easy to hit pitfalls that I want to talk about.

When considering only the video portion (not audio yet), our aim is to write all the frames to disk as individual image files (such as png, or raw uncompressed image files), and then after recording is done, use something like ffmpeg to combine the frames into a video. Writing a compressed image file (such as png or jpg) for each frame may save disk space, but may take longer for your computer to be able to process and write to disk. You will likely find that a raw file format is more performant, at the cost of increase disk space usage, but hard drives are cheap these days and everyone has huge ones. Also, at this point you probably want to use lossless image compression (such as png, or a raw image file) for your screen captures so that you don’t have compression artifacts in your screen captures. When you make a final video, you may choose a more highly compressed video format, and it may introduce it’s own artifacts, but you want to keep your source files as clean as possible so that you don’t introduce UNNECESSARY artifacts too early in the process.

If you dump each rendered frame to disk, the disk i/o can drag your game’s frame rate down to a crawl.  You might think about having the disk write happen on another thread so the game isn’t limited by the disk i/o, but then you’ll have to keep a buffer of previous frames which will grow and grow and grow (since you are making frames faster than it can write to disk) until you run out of memory.  It’s a losing battle for longer videos.

You could get a faster drive, configure a striped raid array, use a ram disk, or things like that, but why fix with hardware what you can fix in software?  Save yourself and your company some cash.

Similarly to the fraps problem, when you record video, that will likely affect the frame rate of your game as well, making it run slower, making a lower quality video because frames will be skipped – assuming you are using variable frame rate logic – making it so that you either have to have a “laggy” looking video as output, or your video will actually appear to speed up in the places that you encountered lag while recording, which is very odd looking and definitely not demoable.

The solution (which might be really obvious to the astute reader) is to make your game run your game’s logic at a fixed rate, instead of making it be based on frame time.  For instance, instead of measuring the time between frames and using that delta to control logic (making things move farther when more time has passed etc), you just make your game act as if the same amount of time has always passed between your frames, such as ~16ms for a 60fps recording, or ~33ms for a 30fps recording.  IMPORTANT: make it only behave this way when in “recording mode”.  You don’t need to sacrifice variable frame rate logic just to get the ability to record nice videos.  Also, 30fps is fine for a video.  The more FPS your video has, the larger the video file will be.  Movie and TVs are something like 24 fps, so you don’t need a 60 fps video for your game demo, 30 or less is just fine.

This way, it doesn’t matter how long it took to render each frame, the game will generate a sequence of frames at whatever frame rate you would like your video to be in.  While recording the demo video, the game may run slowly, and be difficult to control if it’s REALLY laggy, but at least the output video will be smooth and perfectly lagless.   Later in this article I present a possible solution to the problem of difficulty playing the game while recording.

Are we done at this point?  NO!  We haven’t talked at all about audio, and as it turns out our audio is in a very odd state going this route.

Rolling your own – Audio

From the section above, we have a nice lagless video stream, but if we just recorded audio as it went, the audio would be out of sync with the frames. This is because we recorded audio in real time, but we recorded the frames in variable time.

You could try to sync the audio in the right places with each frame, but then you’d have to speed up and slow down portions of your audio to hit the right frame numbers, which would make your audio sound really weird as it sped up and slowed down and changed pitch.

Definitely not demoable! So what’s the solution?

The solution is that while you are recording your video frames, you also make an audio timeline of what audio was triggered at which frame numbers.

For instance, if on frame 20, the player swung his sword and on frame 25 hit an exploding barrel, causing it to explode, your timeline would say “at frame 20, play the sword swing sound effect. at frame 25, play the exploding barrel sound effect”.

I’ve found it really easy to capture an audio timeline by hooking into your game or engine’s audio system itself, capturing all sound events.  It usually is not very difficult to implement this part.

After you have recorded all of your video frames, and have an audio timeline, the next step is to re-create the audio from the timeline, which means you need a way of doing “offline” audio mixing.

If you are using an audio library, check the documentation to see if it has an offline mode, many of them do, including the ever popular fmod.  If your audio library can’t do it for you, there are various command line tools and audio libraries out there that can do this for you.  I believe portaudio (port mixer?) can do this for you, and also another open sourced program called sox.

What you need to do is render each item in the audio timeline onto a cumulative audio stream.  If your video were a 30fps video, and a 500ms sound effect happened at frame 93, that means that you know this sound effect started at 3.1 seconds in (frame 93 * 33.33 miliseconds per frame) and lasts until 3.6 seconds (since it’s 500ms long).   So, you’d mix that into the output audio stream at the appropriate point in time, and then rinse and repeat with the rest of the audio timeline items until you had the full audio stream for the video.

When you are done with this stage, you have your video frames and your audio stream.  With your video creation software (such as ffmpeg) you can combine these into a single video file which shows your game running perfectly at whatever frame rate you specified, and with perfectly synchronized audio.  It’s a beautiful thing and definitely ready to demo to get some funding.


To recap, the steps for creating a perfect video recording of your game are:

  1. When in recording mode, make your game run at a fixed frame rate – no matter how long it really was between frames, lie to your game and tell it that 33.33ms have passed each frame for 30fps video or 16ms for 60fps video (or whatever other frame rate you want to run at)
  2. Write each rendered frame to disk as an uncompressed or lossless compression graphics file.
  3. While rendering each frame, build up a timeline of audio events that you can use to re-create the audio later.
  4. After all the frames are captured, render your audio timeline into an audio stream.
  5. After you have your audio stream and each video frame, use software such as ffmpeg to combine them into a perfect, lagless video.

Bonus Points – Or making this feature a shippable feature of your game for players to use

At this point, the final product (the video) is as nice as it can possibly be.  However, the process of actually recording the video can be cumbersome because even though you are making a nice and smooth 30fps video, during recording it may be running at 2fps (depending on your machine) making it very difficult to control the game.  Also, in the final video it will appear that the user is traversing menus, inputting commands, and reacting at superhuman speeds.

A good way to handle this is instead of recording during play, what you do is record all the input that happens during the recording process.  This way you have an input timeline that is tied to frame numbers, the same way the audio timeline is tied to frame numbers.

When the recording process is done, you then put up a nice dialog for the end user saying something like “Rendering video please wait….” with a progress bar, and then re-simulate the user input that occurred during the recording phase, and render all those frames to disk (well, screen capture them as image files just like usual, just dont display them to the end user).

Since building an input timeline is relatively cheap computationally, you should have no slow down during the “recording” phase of the video while you (or the end user) is actually playing the game.

The “Gotcha” here is that your game needs to be deterministic for fixed rate time steps (or at least everything that really matters needs to be deterministic, maybe not particles or something) which can potentially be a bit tricky, but the upside is if you actually make this happen, you can record light weight playbacks as “videos” and have users share these feaux-videos with each other to watch playbacks of gameplay that other players had.  When you want to export these playbacks as real videos, you can put it through the regular video recording steps and spit out a full mpeg, suitable for sharing, uploading to youtube (from within the app perhaps even?) just like normal.  But, until you need to use the video outside of your application, you have very small files users can share with each other to view “videos” of in game gameplay.

Final tip: if doing this in windows, I’ve found that in recent versions of windows, doing  the screen capture using GDI functions instead of DirectX is actually WAY faster so use that if you can.  I’m thinking this must be because windows already has a screen cap in memory to show those little icons when you mouse over the minimized application or something.

That’s all folks!

That’s all there is to it. With luck this will save some fellow engineers from having to crunch up some “demo hacks” to get performance up for an E3 demo video or the like. If you have any questions or comments, drop me a line (:

DIY Synth 3: Sampling, Mixing, and Band Limited Wave Forms

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This it the third installment of a series of tutorials on how to program your own synthesizer.

In this chapter we’ll continue on from the last chapter, and talk about a way to generate simple wave forms that don’t have aliasing problems. We’ll also talk about sampling, mixing and end with a somewhat realistic song made with samples and our very own platform independent synthesizer code.

You can download the full source code and source wave files from the link below.  The code got a bit more complex so there’s a zip file instead of a stand alone main.cpp.   Also, it’s not the cleanest, best organized code in the world – sorry about that! – but hopefully it’ll be ok for the purposes of this tutorial (:

DIY Synthesizer: Chapter 3 Source Code

If you don’t want to wait til the end of the chapter to hear the sample song, check it out here:

The Lament Of Tim Curry


As mentioned in the previous tutorial, the wave forms we were generating have aliasing problems. Aliasing is an audio artifact where unintended audio frequencies appear in audio data due to trying to encode frequencies that are too high for the sample rate. Wikipedia describes Aliasing pretty well, check it out for more info: Aliasing.

Sound is pressure waves conducted in the air, and at the core, audio engineers and mathematicians like to think of all sound as being made up of sine waves at different frequencies and amplitudes (volumes).

If you have a smooth / bumpy wave form, you could picture building it up with sine waves pretty easily.

If on the other hand, you have something with sharp corners, like a saw wave, a triangle wave or a square wave, it gets more difficult.

In fact, to make a “perfect corner” out of sine waves, it would take an infinite amount of sine waves of ever diminishing frequency and amplitude to get the perfectly sharp corner.

In chapter one I briefly mentioned that the maximum frequency you can store in audio data is half the sample rate. This frequency is called the Nyquist frequency and you can read more about it here: Nyquist Frequency and here: Nyquist-Shannon sampling theorem.

Aliasing occurs whenever you try to store a frequency higher than the nyquist frequency. When you do that, your audio data is not what it ought to be (a higher frequency actually appears to be a lower frequency), causing audio artifacts. If you’ve ever seen a car’s wheels spinning too slowly or backwards in a tv commercial, that is the exact same problem.

So, when making a “perfect corner” on a saw, triangle, or square wave, and having to use infinitely high frequencies to make that corner, you can bet that an infinite frequency is above Nyquist, and that it will cause some aliasing.

So, to make band limited wave forms for saw, square, and triangle, we just add together the sine waves UP TO nyquist, and then stop, instead of continuing on to infinity (which would also take far too long to calculate hehe).  That makes a much cleaner, smoother sound, that is a lot easier on the ears.

A friend of mine who wishes to remain nameless has been a good sport in listening to my audio tracks over the years and for a long, long time she would complain that my songs hurt her ears.  I tried putting reverb and flange on my songs to try to mellow them out, and that helped a little, but even then, it still hurt her ears.  After I started using band limited wave forms, my songs stopped hurting her ears and my tones started sounding a lot smoother and richer, and more “professional”.

So, if you don’t want people’s ears to bleed when they hear your tunes, I recommend band limited wave forms!

Band Limited Sine Wave

The sine wave does not have a band limited form, since since itself IS bandlimited by definition. So, a band limited sine wave is just the sine wave itself.

Onto the next!

Band Limited Saw Wave

Wikipedia has a great article Sawtooth wave which says:

A sawtooth wave’s sound is harsh and clear and its spectrum contains both even and odd harmonics of the fundamental frequency. Because it contains all the integer harmonics, it is one of the best waveforms to use for synthesizing musical sounds, particularly bowed string instruments like violins and cellos, using subtractive synthesis.

What they mean by that (and what the heavy math formulas on that page say) is that if you have a saw wave of frequency 100, that means it contains a sine wave of frequency 100 (1 * fundamental frequency), another of frequency 200 (2 * fundamental frequency), another of 300 (3 * fundamental frequency) and so on into infinity.

The amplitude (volume) of each sine wave (harmonic) is 1 over the harmonic number. So in our example, the sine wave at frequency 100 has an amplitude of 1 (1/1). The sine wave at frequency 200 has an amplitude of 0.5 (1/2), the sine wave at frequency 300 has an amplitude of 0.333 (1/3) and so on into infinity.

After that you’ll need to multiply your sample by 2 / PI to get back to a normalized amplitude.

There’s a function in the sample code called AdvanceOscilator_Saw_BandLimited() that you can use to generate a band limited saw wave sample. It has an optional parameter where you can tell it how many harmonics to use, but if you omit that parameter, it’ll use as many as it can without going over Nyquist.

Here’s how a band limited saw wave looks and sounds compared to a non band limited saw wave, like the ones we created in the last chapter.

Chapter 3 Saw

Chapter 3 Saw Band Limited

Chapter 3 Saw Wave

Band Limited Square Wave

Wikipedia has a good article on square wave’s too here: Square Wave which says:

Note that the square wave contains only odd-integer harmonic frequencies (of the form 2π(2k-1)f), in contrast to the sawtooth wave and real-world signals, which contain all integer harmonics.

What this means is that if you were trying to make a square wave at frequency 100, unlike a saw wave which has sine waves at frequencies 100, 200, 300, 400 and so on, a square wave is made up of sine waves of frequencies 100, 300, 500 and 700.

Like the saw wave, however, the amplitude of each frequency is the reciprocal of the multiple of the frequency. So, the sine wave at frequency 100 has an amplitude of 1/1, the sine wave at frequency 300 has an amplitude of 1/3, the sine wave at frequency 500 has an amplitude of 1/5.

After that, you need to multiply by 4/PI to get back to a normalized amplitude.

The function to generate this wave form in the sample code is called AdvanceOscilator_Square_BandLimited().

Here’s how a band limited square wave looks and sounds compared to a non band limited square wave, like the ones we created in the last chapter.

Chapter 3 Square

Chapter 3 Square Band Limited

Chapter 3 square Wave 

Band Limited Triangle Wave

The triangle wave is often used as a cheap approximation of a sine wave so it’s kind of funny making a more expensive (computationally) version of a triangle wave out of sine waves.

The wikipedia article for the triangle wave is here: Triangle Wave and it says:

It is possible to approximate a triangle wave with additive synthesis by adding odd harmonics of the fundamental, multiplying every (4n−1)th harmonic by −1 (or changing its phase by π), and rolling off the harmonics by the inverse square of their relative frequency to the fundamental.

Ok so in English what that means is that a triangle wave is a lot like a square wave, but every other harmonic, we subtract, instead of adding it. Also, instead of the amplitude (volume) of a sine wave being the reciprocal of the multiple of the frequency, the amplitude is the reciprocal of the SQUARE of the multiple of the frequency.

So that means for a 100hz frequency triangle wave, we would…

make a sine wave of 100hz at 1/1 amplitude
Subtract a sine wave of 300hz at 1/9 amplitude
Add a sine wave of 500hz at 1/25 amplitude
Subtract a sine wave of 700hz at 1/49 amplitude

and so on til infinity (or Nyquist frequency)

After that you multiply by 8 / PI*PI to get back to a normalized amplitude.

The function to generate this wave form in the sample code is called AdvanceOscilator_Triangle_BandLimited().

Here’s how a band limited triangle wave looks and sounds compared to a non band limited triangle wave, like the ones we created in the last chapter.

Chapter 3 Triangle

Chapter 3 Triangle Band Limited

Chapter 3 Triangle Wave

Band Limited Noise

In the last chapter we also talked about the “noise” wave form and I briefly mentioned that it had it’s uses – such as in percussion sounds.

Is it possible to make a band limited version? It is, but I’m not sure if it’s really useful for anything, other than a strange sound (but then again, strange sounds is what synth is all about right?)

A quick aside – In this chapter so far, we’ve actually been talking about “Additive Synthesis” which is the process of adding multiple noises together to get an interesting result. Specifically, we’ve been adding sine waves together to get band limited forms of a saw wave, a square wave and a triangle wave. There is something else called “Subtractive Synthesis” where you carve away sounds with filters (such as a low pass filter, a high pass filter, a band pass filter, etc) to get your sound. Another way to generate band limited wave forms is to make a pure, non band limited wave form, and then use a high pass filter to cut out the high frequencies of the sound (the ones generating the aliasing sounds).

In practice, it sounds the same either way you generate it.  Subtractive synthesis is just another way to approach the problem of aliasing and synth in general. In fact, when you down sample a sound file (take it from a higher sample rate to a lower sample rate), you should apply a low pass filter first to get rid of any frequencies that would cause aliasing in the lower sample rate.

Anyways, to generate band limited noise, I figured I’d just make a sine wave that changes it’s frequency once every 4000 samples (at a sample rate of 44,100, that means it changes it’s frequency 10 times a second).

Here’s what that looks and sounds like:

Chapter 3 Random Beeps

Chapter 3 Beeps

Interesting audio, and band limited, but not quite noise, so here is the same thing, switching frequency every 40 samples instead of every 4000 samples. That’s about 1000 times a second .

Chapter 3 Noise Wave

Chapter 3 Noise

It is technically noise, and it is band limited, but it sounds weird.  Like a tape player on fast forward or water flowing quickly or something.

I didn’t make a function to generate that wave form, but the sample code does it “manually” if you want to make your own function.

Chapter 3 Song

So this chapter has a somewhat passable song as a culmination of the info from the tutorials so far. You can check it out at the bottom of this article, but I wanted to give a quick overview of some other things that went into making it.

The song loads some sound files to use as samples. It loads 3 percussion sounds for the drum parts, and two sound clips from a favorite movie of mine called “Legend” – starring Tim Curry as the devil, Mia Sara as a princess and Tom Cruise as a naturalist wildman who is friends with fairies and elves. It’s a really great movie i really recommend checking it out!

Anyways, it MIXES these sound effects with our generated synth tones by just adding the various sound sources together. Mixing sounds is literally just adding them together.

When it loads up the wave files, it RESAMPLES them if necessary, meaning that if the sound file has a lower sample rate than the sound we want to render, it interpolates samples to make a higher sample rate. If the sound file loaded has a higher sample rate than the sound we want to render, it drops samples to make a lower sample rate. Check out the code for the details of how it does this, but it’s really simple and pretty much works how you’d expect it to. Note that if you down sample audio, you normally want to put it through a low pass filter to cut out any frequencies which would be above Nyquist, but my resampling code doesn’t handle that. It just aliases if there’s a frequency that is too high for the sake of simplicity.

Another thing that happens when it loads each wave file is that it converts it to mono or stereo if needed, to match the format of the sound we want to render. To convert from mono to stereo, it just duplicates the mono channel for the left and right channels, and to convert from stereo to mono, it just mixes (adds!) the left and right channel data together to get the mono channel data.  Intuition might tell you that adding the left and right channels together would make it louder, even maybe twice as loud, but in practice that doesn’t happen.  Sounds mix together pretty darn well without getting way loud, especially if they are “real life” sounds (not synthesized wave forms) and not played at the exact same time.  Basically, the peaks (positive numbers) and valleys (negative numbers) in sound sources tend to cancel each other out and keep things in normal range.

Lastly, when loading a wave file, it normalizes the audio data so that our synth and the audio samples are all working in the same amplitude ranges of -1 to 1. When normalizing, it also “re-centers” the audio data. That is to say, if audio data was really quiet, but was always above the zero axis, it would move the data down to be centered on the zero axis before normalizing to make sure and maximize loudness.

In reality, we’d want to re-center the left and right channels individually, but I just do them together. Also, you might want to normalize individual sections of the audio data at a time instead of normalizing the entire thing as one big chunk at the end. There are a lot of good techniques and algorithms out there to do this, but this functionality is often called a compressor (to give you a place to start your research).

Note, you could easily play sound files backwards to see if they sync up with the wizard of oz, or give you instructions for some tasty brownies, but I didn’t do that in this example code, I leave that up to you!

If you want to be able to read and write other sound formats besides wav files, you might check out libsndfile. I use it in my own projects and it works pretty nicely! You can find it at: libsndfile

The Lament Of Tim Curry

Without further ado, here’s this chapter’s sample song. The full source code and source wave files is in this chapter’s source code zip file. Check it out with headphones for a neat effect, the bass line floats between the left and right channels. Enjoy!  And go watch Legend if you haven’t seen it before!

The Lament Of Tim Curry

DIY Synth 2: Common Wave Forms

This is a part of the DIY Synthesizer series of posts where each post is roughly built upon the knowledge of the previous posts. If you are lost, check the earlier posts!

This is the second chapter in a series of tutorials about programming your own synthesizer

In this chapter we’ll talk about oscillators, and some common basic wave forms: Sine, Square, Saw, Triangle and Noise.

By the end, you should have enough knowledge to make some basic electronic melodies.

You can download the full source for this chapter here:  DIY Synthesizer: Chapter 2 Source Code

The Sine Wave

The sine wave is the basis of lots of things in audio synthesis. It can be used on it’s own to make sound, multiple sine waves can be combined to make other more complex wave forms (as we’ll see in the next chapter) and it’s also the basis of a lot of DSP theory and audio analysis. For instance, there is something called Fourier Analysis where you can analyze some audio data and it will tell you what audio frequencies are in that sound data, and how strong each is (useful for advanced synthesis and digital signal processing aka DSP). The math of how to get that information is based on some simple properties of sine waves. More info can be found here:

If we want to use a sine wave in our audio data, the first problem we hit is that sine has a value from -1 to 1, but our audio data from the last chapter is stored in a 32 bit int, which has a range of -2,147,483,648 to 2,147,483,647, and is unable to store fractional numbers.

The solution is to just map -1 to -2,147,483,648, and 1 to 2,147,483,647 and all the numbers in between represent fractional numbers between -1 and 1.  0.25 for instance would become 536,870,911.

If instead of 32 bits, we wanted to store the data in 16 bits, or 8 bits, we could do that as well.  After generating our floating point audio data, we just convert it differently to get to those 16 bits and 8 bits.  16 bits have a range of -32,768 to 32,767 so 0.25 would convert to 8191.  In 8 bits, wave files want UNSIGNED 8 bit numbers, so the range is 0 to 255.   In that case,  0.25 would become 158.

Note, in the code for this chapter, i modified WriteWaveFile to do this conversion for us so going forward we can work with floating point numbers only and not worry about bits per sample until we want to write the wave file. When you call the function, you have to give it a template parameter specifying what TYPE you want to use for your samples. The three supported types are uint8, int16 and int32. For simple wave forms like those we are working with today, there is no audible difference between the 3, so all the samples just make 16 bit wave files.

So, we bust out some math and figure out here’s how to generate a sine wave, respecting the sample rate and frequency we want to use:

//make a naive sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fFrequency / (float)nSampleRate);

That does work, and if you listen to the wave file, it does sound correct:
Naive Sine Wave Generation

It even looks correct:
Naive Sine Wave

There is a subtle problem when generating the sine wave that way though which we will talk about next.

Popping aka Discontinuity

The problem with how we generated the wave file only becomes apparent when we try to play two tones right next to each other, like in the following code segment:

//make a discontinuitous (popping) sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
if(nIndex < nNumSamples / 2)
float fCurrentFrequency = CalcFrequency(3,3);
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate);
float fCurrentFrequency = CalcFrequency(3,4);
pData[nIndex] = sin((float)nIndex * 2 * (float)M_PI * fCurrentFrequency / (float)nSampleRate);

Quick note about a new function shown here, called CalcFrequency.  I made that function so that you pass the note you want, and the octave you want, and it will return the frequency for that note.  For instance, to get middle C aka C4 (the tone all these samples use), you use CalcFrequency(3,3), which returns approximately 261.626.

Listen to the wave file generated and you can hear a popping noise where the tone changes from one frequency to the next: Discontinuous Sine Wave

So why is this? The reason is because how we are generating our sine waves makes a discontinuity where the 2 wave files change.

Here you can see the point that the frequencies change and how a pretty small discontinuity can make a pretty big impact on your sound! The sound you are hearing has an official name, called a “pop” (DSP / synth / other audio people will talk about popping in their audio, and discontinuity is the reason for it)

Sine Wave Popping

So how do we fix it? Instead of making the sine wave be rigidly based on time, where for each point, we calculate the sine value with no regard to previous values, we use a “Free Spinning Oscillator”.

That is a fancy way of saying we just have a variable keep track of the current PHASE (angle) that we are at in the sine wave for the current sample, and to get the next sample, we advance our phase based on the frequency at the time. Basically our oscillator is a wheel that spins freely, and our current frequency just says how fast to turn the wheel (from wherever it is now) to get the value for the next sample.

Here’s what the looks like in code:

//make a continuous sine wave that changes frequencies
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
if(nIndex < nNumSamples / 2)
float fCurrentFrequency = CalcFrequency(3,3);
fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

pData[nIndex] = sin(fPhase);
float fCurrentFrequency = CalcFrequency(3,4);
fPhase += 2 * (float)M_PI * fCurrentFrequency/(float)nSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

pData[nIndex] = sin(fPhase);

Note that we keep the phase between 0 and 2 * PI. There’s no mathematical reason for needing to do this, but in floating point math, if you let a value get too large, it starts to lose precision. That means, that if you made a wave file that lasted a long time, the audio would start to degrade the longer it played. I also use a while loop instead of a regular if statement, because if someone uses very large frequencies, you can pass 2 * PI a couple of times in a single sample. Also, i check that it’s above zero, because it is valid to use negative frequency values! All stuff to be mindful of when making your own synth programs (:

Here’s what the generated wave file sounds like, notice the smooth transition between the two notes:
Continuous Sine Wave

And here’s what it looks like visually where the wave changes frequency, which you can see is nice and smooth (the bottom wave). The top wave is the popping sine wave image again at the same point in time for reference. On the smooth wave it isn’t even visually noticeable that the frequency has changed.

Continuous Frequency Change

One last word on this… popping is actually sometimes desired and can help make up a part of a good sound. For instance, some percussion sounds can make use of popping to sound more appropriate!

Sine Wave Oscillator

For our final incarnation of a sine wave oscillator, here’s a nice simple helper function:

float AdvanceOscilator_Sine(float &fPhase, float fFrequency, float fSampleRate)
fPhase += 2 * (float)M_PI * fFrequency/fSampleRate;

while(fPhase >= 2 * (float)M_PI)
fPhase -= 2 * (float)M_PI;

while(fPhase < 0)
fPhase += 2 * (float)M_PI;

return sin(fPhase);

You pass that function your current phase, the frequency you want, and the sample rate, and it will advance your phase, and return the value for your next audio sample.

Here’s an example of how to use it:

//make a sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate);

Here’s what it sounds like (nothing new at this point!):
Vanilla Sine Wave

Wave Amplitude, Volume and Clipping

You can adjust the AMPLITUDE of any wave form by multiplying each sample by a value. Values greater than one increase the amplitude, making it louder, values less than one decrease the amplitude, making it quieter, and negative values flip the wave over, but also have the ability to make it quieter or louder.

One place people use negative amplitudes (volumes) is for noise cancellation. If you have a complex sound that has some noise in it, but you know the source of the noise, you can take that noice, multiply it by -1 to get a volume of -1, and ADD IT (or MIX IT) into the more complex sound, effectively removing the noise from the sound. There are other uses too but this is one concrete, real world example.

This code sample generates a quieter wave file:

//make a quieter sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 0.4f;

And here’s what that sounds like:
Vanilla Sine Wave – Quiet

And here’s what that looks like:
Sine Quiet

If you recall though, when we write a wave file, we map -1 to the smallest int number we can store, and 1 to the highest int number we can store. What happens if we make something too loud, so that it goes above 1.0 or below -1.0?

One way to fix this would be to “Normalize” the sound data.  To normalize it, you would loop through each sample in the stream and find the highest absolute value sample.  For instance if you had 3 samples: 1.0, -1.2, 0.8,  the highest absolute sample value would be 1.2.

Once you have this value, you loop through the samples in the stream and divide by this number.  After you do this, every sample in the stream will be within the range -1 to 1.  Note that if you had any data that would be clipping, this process has the side effect of making your entire stream quieter since it reduces the amplitude of every sample.  If you didn’t have any clipping data, this process has the side effect of making your entire stream louder because it increases the amplitude of every sample.

Another way to deal with it is to just clamp the values to the -1, 1 range.  In the case of a sine wave, that means we chop off the top and/or the bottom of the wave and there’s just a flat plateau where the numbers went out of range.

This is called clipping, and along with popping are 2 of the main problems people have with audio quality degradation.  Aliasing is a third, and is something we address in the next chapter by the way! (

Here’s some code for generating a clipping sine wave:

//make a clipping sine wave
for(int nIndex = 0; nIndex < nNumSamples; ++nIndex)
pData[nIndex] = AdvanceOscilator_Sine(fPhase,fFrequency,(float)nSampleRate) * 1.4f;

And here’s what it sounds like:
Vanilla Sine Wave – Clipping

Also, here’s what it looks like:
Clipping Sine Wave

Note that in this case, it doesn’t necessarily sound BAD compared to a regular, non clipping sine wave, but it does sound different. That might be a good thing, or a bad thing, depending on your intentions. With more complex sounds, like voice, or acoustic music, this will usually make it sound terrible. Audio engineers have to carefully control the levels (volumes) of the channels being mixed (added) together to make sure the resulting output doesn’t go outside of the valid range and cause clipping. Also, in analog hardware, going out of range can cause damage to the devices if they aren’t built to protect themselves from it!

In the case of real time synthesis, as you might imagine, normalizing wave data is impossible to do because it requires that you know all the sound data up front to be able to normalize the data.  In real time applications, besides just making sure the levels keep everything in range, you also have the option of using a compressor which sort of dynamically normalizes on the fly.  Check this out for more information:

Square Wave Oscillator

Here’s the code for the square wave oscillator:

float AdvanceOscilator_Square(float &fPhase, float fFrequency, float fSampleRate)
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

if(fPhase <= 0.5f)
return -1.0f;
return 1.0f;

Note that we are using the phase as if it’s a percentage, instead of an angle. Since we are using it differently, that means if you switch from sine wave to square wave, there will be a discontinuity (a pop). However, in practice this happens anyways almost all the time because unless you change from sine to square at the very top or bottom of the sine wave, there will be discontinuity anyways. In reality, this really doesn’t matter, but you could “fix” it to switch only on those boundaries, or you could use “cross fading” or “blending” to fade one wave out (decrease amplitude from 1 to 0), while bringing the new wave in (increase amplitude from 0 to 1), adding them together to get the output. Doing so will make a smooth transition but adds some complexity, and square waves by nature constantly pop anyways – it’s what gives them their sound!

Here’s what a square wave sounds like and looks like:
Square Wave
Square Wave

Saw Wave Oscillator

We used the saw wave in chapter one. Here’s the code for a saw wave oscillator:

float AdvanceOscilator_Saw(float &fPhase, float fFrequency, float fSampleRate)
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

return (fPhase * 2.0f) - 1.0f;

Here’s what a saw wave looks and sounds like:
Saw Wave
Saw Wave

Note that sometimes saw waves point the other direction and the “drop off” is on the left instead of on the right, and the rest of the way descends instead of rises but as far as I have seen, there is no audible or practical difference.

Triangle Wave Oscillator

A lot of synths don’t even bother with a triangle wave, and those that do, are just for approximations of a sine wave. A triangle wave sounds a lot like a sine wave and looks a bit like it too.

Here’s the code for a triangle wave oscillator:

float AdvanceOscilator_Triangle(float &fPhase, float fFrequency, float fSampleRate)
fPhase += fFrequency/fSampleRate;

while(fPhase > 1.0f)
fPhase -= 1.0f;

while(fPhase < 0.0f)
fPhase += 1.0f;

float fRet;
if(fPhase <= 0.5f)
fRet=(1.0f - fPhase)*2;

return (fRet * 2.0f) - 1.0f;

Here’s what it looks and sounds like:
Triangle Wave
Triangle Wave

Noise Oscillator

Believe it or not, even static has it’s place too. It’s used sometimes for percussion (put an envelope around some static to make a “clap” sound), it can be used as a low frequency oscillator aka LFO (the old “hold and sample” type stuff) and other things as well. Static is just random audio samples.

The code for a noise oscillator is slightly different than the others. You have to pass it the last sample generated (you can pass 0 if it’s the first sample) and it will continue returning that last value until it’s time to generate a new random number. It determines when it’s time based on the frequency you pass in. A higher frequency mean more random numbers will be chosen in the same amount of audio data while a lower frequency means that fewer random numbers will be chosen.

At lower frequencies (like in the sample), it kind of sounds like an explosion or rocket ship sound effect from the 80s which is fun 😛

Here’s the code:

float AdvanceOscilator_Noise(float &fPhase, float fFrequency, float fSampleRate, float fLastValue)
unsigned int nLastSeed = (unsigned int)fPhase;
fPhase += fFrequency/fSampleRate;
unsigned int nSeed = (unsigned int)fPhase;

while(fPhase > 2.0f)
fPhase -= 1.0f;

if(nSeed != nLastSeed)
float fValue = ((float)rand()) / ((float)RAND_MAX);
fValue = (fValue * 2.0f) - 1.0f;

//uncomment the below to make it slightly more intense
if(fValue < 0)
fValue = -1.0f;
fValue = 1.0f;

return fValue;
return fLastValue;

Here’s what it looks and sounds like:
Noise Audio

I think it kind of looks like the Arizona desert 😛

As a quick aside, i have the random numbers as random floating point numbers (they can be anything between -1.0 and 1.0). Another way to generate noise is to make it so it will choose only EITHER -1 or 1 and nothing in between. It gives a slightly harsher sound. The code to do that is in the oscillator if you want to try it out, it’s just commented out. There are other ways to generate noise too (check out “pink noise” but this ought to be good enough for our immediate needs!

More Exotic Wave Forms

Two other oscillators I’ve used on occasion is the squared sine wave and the rectangle wave.

To create a “squared sine wave” all you need to do is multiply each sample by itself (square the audio sample). This makes a wave form that is similar to sine waves, but a little bit different, and sounds a bit different too.

A rectangle wave is created by making it so the wave spends either more or less time in the “up” or “down” part of the wave. Instead of it being 50% of the time in “up”, and 50% of the time in “down” you can make it so it spends 80% of the time in up, and 20% of the time in down. It makes it sound quite a bit different, and the more different the percentages are, the “brighter” it sounds.

Also, you can add multiple wave form samples together to get more interesting wave forms (like adding a triangle and a square wave of the same frequency together, and reducing the amplitude to avoid clipping). That’s called additive synthesis and we’ll talk more about that next chapter, including how to make more correct wave forms using sine waves to avoid aliasing.

You can also multiply wave forms together to create other, more interesting waves. Strictly speaking this is called AM synthesis (amplitude modulation synthesis) which is also sometimes known as ring modulation when done a certain way.

As you can see, there are a lot of different ways to create oscillators, and the wave forms are just limited by your imagination. Play around and try to make your own oscillators and experiment!

Final Samples

Now we have the simple basics down for being able to create music. here’s a small “song” that is generated in the sample code:
Simple Song

And just to re-inforce how important keeping your wave data continuous is, here’s the same wave file, but about 0.75 seconds in a put a SINGLE -1.0 sample where it doesn’t belong. a single sample wrong when there’s 44100 samples per second and look how much it affects the audio.
Simple Song With Pop

Until Next Time…

Next up we will talk about “aliasing” and how to avoid it, making much better sounding saw, square and triangle waves that are less harsh on the ears.