Mechanical Computer Quest Part I

For years I’ve been wanting to make mechanical logic gates that i could put together into mechanical logic circuits. I want to do this mostly for novelty purposes, but I also think it could be leveraged into either an entertaining learning device, or some sort of game or puzzle.

Since I’m a software engineer, one of the main problems I’ve had is my inability to manifest my ideas in a physical way. Thanks to some modern technology, I think I might be able to partner up with my good buddy Eric and finally make this idea into a reality!

I’m sure some of the things I’m trying to figure out are solved problems, but I want to try and figure it out as much as I can before researching the right way to do it hehe.

Here is some basic background knowledge relating to what I’m trying to achieve, as well as my plans for moving forward.

Logic Gate Basics

Logic gates are the basic building blocks of logic circuits and computers. You can put them together in various ways to make lots of different things happen such as adding or multiplying numbers together, or comparing two numbers to see which is bigger.

The basic logic gates themselves take in either 1 or 2 inputs, and give one output. The inputs and outputs are binary where each one is either a one or a zero.

The 3 basic textbook logic gates are AND, OR and NOT.

AND: this takes in two inputs and gives one output. If both inputs are 1, it gives a 1 as output, else it gives a 0 as output.

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

OR: this takes in two inputs and gives one output. If either input is 1, it gives a 1 as output. If the inputs are both 0, it gives a 0 as output.

0 OR 0 = 0
0 OR 1 = 1
1 OR 0 = 1
1 OR 1 = 1

NOT: this takes in one input and gives one output. If you give it a 1, it gives out a 0. If you give it a 0, it gives out a 1.

NOT 0 = 1
NOT 1 = 0

If you are able to do these 3 things, and combine them together arbitrarily, that is considered “Turing Complete”. That basically means it’s capable of doing anything a computer can do, but check out this wikipedia page for more information http://en.wikipedia.org/wiki/Turing_completeness.

Mechanical Logic Gates

To make mechanical versions of these things I was thinking of having each logic gate a self contained object that had sticks for input and output. If a stick was pushed “up” it would be a 1, if a stick wasn’t pushed up, it would be a 0.

I then set out to make a wooden prototype of each gate to make sure I had the basic designs worked out.

NOT

To make a NOT gate, you would have a box with an input stick going into it and an output stick coming out of it. If you pushed the input stick up, the output stick would go down.

If you pulled the input stick back down, the output stick would go up again.

I made one of these out of wood by making a box with two sticks going into it, that inside were attached with a cross peice that could rotate but not move.

This way, when you push up the input stick, the crosspeice rotates and the output stick goes down. When you pull down the input stick, the reverse happens.

It worked fairly decently.

NotGate

OR

To make an OR gate, I made a box with two input sticks going in and one output stick coming out. I had the output stick attached to a cross peice that rested above the input sticks. This way, when you pushed either input stick up, the output stick would be pushed up.

One thing I didn’t like about this design was that it relied on gravity to reset the output after changing the input pins. I’d prefer the mechanical logic gates made as few assumptions about their environment as possible so that there are less points of failure. That way, the mechanical computer would work in space, or on it’s side, etc. A spring could be used, but springs wear out and that’s just another moving part that can fail.

It wasn’t 100% ideal but it worked “ok”.

OrGate

AND

To make an AND gate, if you push up one input pin, nothing should happen to the output pin and it should remain at zero. If you push up both input pins, the output pin should then push up.

The best way i could think of how to do this was to have the output pin rest on a slack string. The string should be slack just enough such that if you push up one input pin, it takes the slack out of the string and rests against the bottom of the output pin. If you push up the second pin, it should make the output pin raise up to the full output pin level.

Unfortunately I never got this working correctly – I’m really bad at making things, I told you! haha.

I wasn’t a fan of the string since it was a finicky, error prone part of the design that could easily fail.

Also, you would need to attach the string ends to the side of the box or something to make sure that as you raised both pins, you didn’t just re-introduce the slack in the string between the sticks. It sounds in my head like it would work, but again, unfortunately I’m a SOFTWARE engineer and this is not really my forte so I’m not sure hehe.

AndGate

Connecting Logic Gates, Metrics and Physical Requirements

So, assuming the mechanical logic gates actually work like they should, there are still some problems to solve, as well as some in general design requirements about how these gates need to work.

  1. Gate pins have to be able to connect to eachother somehow. An input pin needs to be able to connect to an output pin in a rigid way such that if one moves, the other one moves as well.  It would be nice if there was a way for the peices to easily snap together and easily be taken apart again. During normal use they should stay snapped though obviously!
  2. The difference between zero and one needs to be standardized. For instance, the difference between a pin being pushed out and pushed in could be one inch, and all logic gates and all logic gate configurations need to obey this measurement in both their input and output pins.  Some metric needs to be decided on when prototypes start getting designed.
  3. The different logic gates should be the same dimensions (or compatible dimensions) so that you dont end up with a situation where you are trying to connect two logic gates together but can’t because there is a gap between them.  We’ll have to figure out the metrics when prototypes start getting designed.
  4. If i have a gate which takes two inputs, changing one input value should not change the other input value. For example, if in the OR gate, the input pins were rigidly attached to the cross bar, pushing up a pin on either side would cause the other input pin to also change from 0 to 1.  Gates will have to be designed so that this is true.
  5. It would be nice if the gates were set up such that you could see their internals working. This is just for fun to be able to see the computations at work.  We’ll see if we can make this happen when making prototypes.
  6. The gates should be as simple as possible, and manufacturing should be simple. each gate should be made of similar basic building blocks with as simple design and as few moving parts as possible.  We’ll have to keep the design simple when making prototypes. I want to stay away from strings, springs, wheels etc. Nothing that wears out quickly or has any significant fail rate.
  7. When you push a pin up that results in a cascading change across several other gates, friction / resistance shouldn’t be so hard that it gets unreasonably hard to push a pin up, or cause damage to the gates themselves when they are used reasonably.  Hopefully this will be true if we use a light but strong material.
  8. It would be nice if gates liked to rest at whatever setting they were at (1 or 0), as opposed to easily sliding between values (0.8 for instance), or some gates having a tendancy to want to rest at zero instead of one.  I’m not sure if we are good enough engineers to make this happen haha… we may just have to say that these mechanical gates have to operate on their sides or something unfortunately…We’ll have to see!
  9. Changing the input pins should be action enough to change the output pins of each gate. What i mean is that if you had one of your OR inputs as a 1, making it so the output is 1, when you pull down the input pin back to 0, the output pin should also move back to 0, and not require any extra action, such as pushing down the output pin manually.  Just something to be thinking about in the design.

NOR and NAND Logic Gates

There are actually a couple other ways to make logic circuits besides using AND, OR and NOT.

There is something called a “Universal Gate”, which is just a logic gate that can be combined with itself in various ways to make the functionality of AND, OR and NOT (and thus others such as XOR, XNOR, and everything else).

The two universal gates are NOR and NAND. Here are the truth tables:

0 NOR 0 = 1
0 NOR 1 = 0
1 NOR 0 = 0
1 NOR 1 = 0

0 NAND 0 = 1
0 NAND 1 = 1
1 NAND 0 = 1
1 NAND 1 = 0

For more information on how these gates are universal, check out these Wikipedia pages!

http://en.wikipedia.org/wiki/NOR_logic

http://en.wikipedia.org/wiki/NAND_logic

While figuring out how to make a NOR or NAND mechanically would solve my problem of not having a working design for the AND gate, it goes against one of my design goals, which is to be able to give someone AND, OR and NOT building blocks to put together and so learn logic circuits physically the same way they are shown in text books and online.

One possibility would be to take NANDs or NORs and put them together to make AND, OR and NOT gates that came pre-assembled, but that goes against my design goal of keeping things as simple as possible with as few moving parts as possible. It would also add needless visual complexity to the computations, which might LOOK NEAT, but would just add confusion to what was going on.

I haven’t yet decided on a solution to these issues.

Manufacture

For manufacturing, I wanted to work with my friend Eric to get some 3d models made up and get them printed up into 3d objects by http://www.shapeways.com/.

I’m sure it’ll take multiple iterations to get it right, but this seems like a pretty nice solution.

After we have proven our idea on a small scale, maybe in the future we’ll be able to do something like get a kick starter project together to make some kind of mechanical logic gate set you can buy, or some sort of board or puzzle game.

Next Steps

For the next step, I want to think about our logic gate options a bit more, talk to Eric about them, and try and get some prototypes modeled and printed out.

I’ll post an update when we make some progress (:

B.A.M. Neural Networks

Neural networks (officially “Artificial Neural Networks”) are computer simulations of neurons.  Simulating neurons in software allows programs to do things that you would normally need a human brain to do, such as recognizing patterns, learning over time, or making non-obvious decisions based on complex data.

Simulating neurons is not enough to create human levels of intelligence however.  Last I heard, someone could make toddler level intelligence via neural networks, but even that is somewhat misleading since toddlers can walk, use vocal chords, swim and understand complex emotions, while the neural network could do none of those things.

Despite the limitations of neural networks, there are quite a number of practical uses of artificial neural networks in the real world.  These uses include…

  • Helping missiles identify enemy tanks or combatants on the battlefield
  • Helping to predict stock market trends (It’s rumored that several top traders have proprietary neural networks which help them preform better)
  • OCR (turning scanned images into text documents based on the text in the image)
  • General machine vision (like, for robots or security systems)
  • Controlling complex machinery at speeds a human wouldn’t be able to keep up with
  • Facial recognition in computers
  • Diagnosing medical conditions

I was reading an article in Scientific American recently about how a girl in high school trained a neural network to recognize certain types of cancer with 99% accuracy.   She trained a neural network to analyze the results of a non-invasive cancer test which up til then had been too unreliable to use in any realistic situation.  Her neural network learned some hidden pattern in the data that we have not yet discovered or understood.

Just like in that example, you can feed a network complex data for it to look for patterns in, but unfortunately it won’t be able to explain to you what it learned, or what it looks for when trying to recognize patterns.  It can learn, but it can’t tell you what it learned.

My friend Doug often tells a funny story where this didn’t work out so well.  I’m not sure if it’s true exactly as told or not, but it definitely is plausible.  Apparently the US army for whatever reason was training a neural network to recognize tanks in the battlefield (surely for a missile or perhaps some kind of recon drone).

They fed the network hundreds or thousands of photos which either contained a tank or did not.  While they fed each picture to the neural network, they also told it “yes this photo contains a tank” or “no, this photo does not contain a tank” so that it was able to infer what it was that people were trying to teach it.

The network learned well, and with their sample photos, it was getting a very high rate of correct answers about whether a tank was there or not.

However, when they deployed this to the battle field (or perhaps, just a realistic live test run), it failed miserably and was getting near 0% accuracy.

Since the network wasn’t able to explain it’s thought process, the people involved were forced to try and deduce what the problem was and eventually they noticed a pattern….

The photoshoots apparently happened on 2 seperate days… on the first day they took pictures of tanks, and the second day they took pictures without tanks.  The unfortunate truth is that it was overcast on the first day, but clear skies and sunny on the second, which means…. the neural network learned to distinguish between a sunny day and an overcast day, not whether nor not there were tanks present!

A pretty funny story, but it shows the importance of giving good, realistic data to your network to learn from, or else you can run into silly problems like that!

Types of Neural Networks

There are a lot of different types of neural networks that have different properties and so thus are useful in different situations.

Some of the main types of flavors are…

  • Supervised learning – Just like in the tank example above, you give information to the neural network to learn, and you also tell it the information it should learn from each peice of data (ie… here’s a picture, it DOES contain a tank).
  • Unsupervised learning  – These work by finding natural groupings of input data.  You can then look at the groupings (or ask it to group more data) and can gain information about the nature of the data itself.  This is often used for data mining, having the neural network pick out interesting correlations in the data that a human might not figure out.
  • Some networks are static once they are created and are unable to learn further.
  • Other networks are able to continuously learn as they get more and more data.

Local Minima vs Global Minimum

Just like a human brain, a neural network is not infallible.  It can often think that is has found an answer, or something of interest, when in fact it hasn’t.

Similarly, we as humans can sometimes think we’ve found a solution to a problem, and then someone comes along and says “You forgot to consider this part of it!” and suddenly you realize your solution is not the right answer and it’s back to the drawing board.

A neural network can have the same issue believe it or not.  This can come from the fact that it wasn’t provided with good enough data to learn from, or, just because!  Just like humans, it can either just learn something wrong, or be incorrect about an answer.

If you think of a problem space as a graph, you can think of the lowest point on the graph being the optimal solution.  How neural networks work is by starting at some point on the graph (often chosen randomly) and then traveling downhill til it finds the bottom of a dip.

This works great if you happen to find the lowest dip in the graph (also called the global minimum), but if you find a dip that isn’t the lowest, you have effectively become trapped in a local minima, and end up with the wrong answer, or an imperfect understanding of the problem space.

minima

There are ways of dealing with this problem luckily.  One way is that if you find a minima, remember where it was at, but then choose another random point on the graph and try again to see if you find a deeper minima.  Rinse and repeat until you are reasonably satisfied with the results.

Have you ever been stuck trying to figure something out and forgotten about it (or went to sleep), only to come back to it later and find the answer.  I personally attribute that phenomenon to this “randomization” effect.  I don’t know the results of studies to this effect, but if you stop thinking about something for a while, then come back to it, you often see it from a different angle (essentially starting at a different spot on the problem space graph) and can sometimes figure out a better solution, or a deeper understanding of the problem (pun intended!)

Anyways, for normal problem spaces, they probably aren’t going to be just 2d like the above, but are perhaps 3d, 4d, or even higher dimensions.  In the end though, the neural network still is just trying to find the deepest valley it can find by essentially traveling downhill.

Now that you know the basics of neural networks, let’s get onto the implementation of one type of neural network!

Bidirectional Associative Memory (AKA BAM)

Bidrectional associative memory is perhaps the easiest useful neural network to create.  All you need is the ability to multiply vectors by other vectors, multiply vectors by matrices, and add matrices together.  If you know how to do those 3 things, you will be able to program your own neural network very quickly and easily.

In fact, I learned about this guy when I was 14 or so, and was able to implement a simple OCR system using microsoft excel (seriously!) 😛

The main point of BAM is to act as memory, where you can teach it to associate several patterns together.  This way, if you teach it to associate a pattern A with a pattern B, when you give it A again, it will spit out B.  Since it’s bidirectional, you can also give it pattern B and it will spit out pattern A in response.  You can teach it several pattern pairs to associate, and can even corrupt some of the data, and it will give you what it thinks is the best match for the data you gave it.  Just like a human brain, it sort of uses it’s best judgement and can say “umm… i think you meant pattern D but I’m not quite sure”.

Besides associating different patterns, you can also associate patterns with themselves (such as associating pattern A with pattern A and pattern B with pattern B).  If you do this, you are able to put in possibly corrupted data and it will give you what it thinks the data really is.  This way, if you have data that is noisy because it came over the radio, or because you scanned a document with a low quality scanner, it will be able to see through the noise and pick out the correct data (hopefully) in the same way a human could.  There are limits to this of course though, just like sometimes we can’t make out messy handwriting sometimes.

While BAM is useful, even in some realistic uses of neural networks, it also is a little bit limited compared to more sophisticated neural network implementations.

  • BAM is fairly limited in how many patterns you can teach it
  • You have to teach it in advance via supervised learning.  No further learning happens after it’s created.
  • It’s fairly strict in it’s mapping from input to output.  This means if you use it to recognize written or typed letters, it will be thrown off by variations in handwriting or different fonts.

That being said, it’s still pretty cool, and lots of fun to play with.

Creating a BAM Network

Creating a BAM network is pretty straight forward.  It has M input bits (you decide how many that is) and N output bits (again, you decide how many that is).

Once you have all of your input / output data pairs, the first step is to convert all the zeros in your pattern pairs to -1’s.   Where 0’s and 1’s is called binary, this form of -1’s and 1’s is called “unipolar”.

Then, for each pattern you multiply the input pattern vector, by the transpose of the output pattern vector (turn it on it’s side) so that when you multiply them together, you get a matrix that is MxN in size.

Continue this for each data pair so that you end up with one matrix per data pair.

Then, add all the matrices together so that you end up with a final matrix.  This is your trained neural network!

In the BAM neural network, the neural topology is that there are M input neurons and N output neurons, with no neurons in between.    The more neurons you have in your network, the more data the neural network is able to store, and the more distinctions between different types of data it’s able to make.  The fact that there are only 2 layers of neurons in BAM is part of the reason for it’s limitations.

For more information about why 2 layers of neurons are limited in their learning, I recommend searching for information on the perceptron xor problem and linear separability.  A 2 layer network is inherently incapable of preforming (and perhaps “understanding”) xor!

Using a BAM Network

To use a neural network, you take an input vector (in binary) of size M and multiply it by the matrix. This will result in a vector of size N that it made up numbers which may be positive, negative, or zero. Convert all positive numbers to 1 and all negative numbers to 0, and you’ll end up with the N sized output pattern that the neural network associated with.

Dealing with zeros is sort of up to your own discretion unfortunately. I’ve seen some people say that zeros should be treated as 1’s, other people say that zeros should be treated as 0’s, and other people have other rules such as “if it’s zero, set it to whatever that bit output the last time you had it output something” which IMO is a pretty odd way to deal with it.  I think this is an unfortunate flaw in how the BAM network works, but you can also chalk this up to the network being uncertain of the result, which it basically is.

Since BAM networks are bidirectional, you can also take the transpose of your matrix (turn it on it’s side) and then multiply it by a vector sized N to get a vector of size M as output, which is the vector associated with your N sized input vector.  So, it works both ways; you can put in an input pattern and get an output pattern out, or you can put in an output pattern and get an input pattern back.

Don’t forget… you can also associate patterns with themselves if you want it to do “pattern recognition” instead of “pattern association”.

Example

Let’s say we want to have an input size of 6 bits and an output size of 4 bits and that we have these 2 data pairs that we want to associate in the neural network:

  1. 101011 <-> 0010
  2. 110010 <-> 0101

The first step is to convert all zeros to -1’s.  Doing so our data pairs become this:

  1. 1 -1 1 -1 1 1 <-> -1 -1 1 -1
  2. 1 1 -1 -1 1 -1 <-> -1 1 -1 1

The next step is to multiply the input patterns with the output patterns to make a 6×3 matrix for each pattern pair.

First Pair:
1 * [-1 -1 1 -1]
-1
1
-1
1
1

=

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

Second Pair:
1 * [-1 1 -1 1]
1
-1
-1
1
-1

=

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

Next up, you add all the matrices together to get the final trained neural network:

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

+

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

=

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

Now that we have our trained neural network, let’s plug in our first input pattern to make sure we get our first output pattern

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

*

1
0
1
0
1
1

=

-4 -4 4 -4

When we convert negatives to zeros, and positives to ones we get:

0 0 1 0

Which is the first output pattern.  It recalled our pattern correctly!

Next, let’s put the second output pattern into the transposed matrix to see if we can go the opposite direction and recall the second input pattern.

-2 0 0 2 -2 0
0 2 -2 0 0 -2
0 -2 2 0 0 2
0 2 -2 0 0 2

*

0
1
0
1

=

0 4 -4 0 0 0

Converting that to binary by turning negatives into zeros, and non negatives into ones, and zeros into question marks we get:

? 1 0 ? ? ?

The pattern it was supposed to recall is:

1 1 0 0 1 0

The two bits that it did recall are correct, but as you can see it only recalled 2 of the 6 bits. Not very good!

With just two patterns, the network was unable to recall some of the info it was trained with.

Normally BAM isn’t this bad, it looks like I just chose some unfortunate input / output pairs. If you encounter problems with a network recalling data, sometimes adding more neurons (larger input or output patterns) can help, but sometimes that will be ineffective too. Like i mentioned earlier, a neural network that has only 2 neuron layers – like BAM does – is incapable of learning XOR, no matter how many input or output neurons you have, so these types of networks are somewhat limited.

Using for OCR

If you wanted to use a BAM network for being able to recognize drawn or written letters, one way to do so would be to say “we are going to store our letters in an 8×8 black and white grid”.

That means that you have a grid of binary (black / white) that is 8×8.  Another way to represent the grid of binary would be just to have 64 bits in a row.

So, for the letters you want to train the network to recognize, you would just draw out your letters in an 8×8 grid, take each letter as it’s 64 bits, and associate each letter with itself.

Your network matrix will be 64×64 but will be able to do simple OCR on 8×8 black and white images.

Often times, images will come to you in color, or not in 8×8 resolution, but what neural network engineers often do in this situation is they will process the images in advance to make them black and white, and 8×8, before feeding them into the neural network.

Now, you are able to feed characters into your neural network and it will attempt to correct any corruption in the image, and return to you an 8×8 image of what it think you entered.

Instead of associating a letter’s image with itself, you can also associate it with a number (say, the ascii code?) so that when you put in the image of a character, it will spit out the number corresponding to the closest match it can find instead of the raw character image itself.

That’s All!

BAM is a nice introductory neural network that nearly anyone can implement.  It may be limited in some ways, but it actually is used in the real world for some applications.

In the future I’ll write about some more advanced neural networks, but until then, I hope you found this informative, or at least interesting! (:

How to Render the Mandelbrot Set

mandelbrot set

The Mandelbrot set is a beautiful creation of mathematics discovered by a French-American mathematician named Benoit Mandelbrot. it is also a fractal, meaning that it’s infinitely detailed and that it’s self-similar and made up of smaller versions of itself.

Wikipedia does a better job of explaining the history and the background more than I do so please check out this link for more info!

Mandelbrot Set

I also wrote an HTML5 powered mandelbrot set viewer that you can use to explore the fractal and manipulate colors to create your own mathematical works of art.

HTML5 Mandelbrot Explorer

In this article I’m going to explain how to render a Mandelbrot set yourself.  It’s going to be from a programming slant more than a mathematical slant so if you want the raw unadulterated math, I recommend checking out Wikipedia or other sources.

Rendering the Mandelbrot Set

The first step is to have some way to draw individual pixels on a canvas of some sort. It doesn’t matter what method you go with, it just matters that you are able to draw pixels somehow.

Various ways include:

  • direct pixel accesss in HTML5 (what my Mandelbrot Explorer uses)
  • drawing pixels into an image file
  • using DirectX or OpenGL to render it to a 2d screen buffer.
  • using graph paper, a calculator and some colored pencils to create it by hand (Possible, but ouch! Send me a picture if you actually do this! hehe)

Viewport

Now that you have a rectangle that you are able to render pixels to, we need to define a viewport.

In the Mandelbrot Set image at the top of this article, my rectangle is about 500×500 pixels big, and the x axis ranges from -2.5 to 2.5 and the y axis also ranges from -2.5 to 2.5.

I like to define my viewport in terms of the center point, and the width and height, so our viewports parameters are a center point of (0,0) and width and height of 5.

We have now established the parameters of our viewport!

We now need to iterate through each pixel in our rectangle and do the following steps for each…

Pixel Space to Viewport Space

The first thing we need to do is convert from pixel coordinates to viewport coordinates.

How we do that is like this:
ViewportX = ViewportMinX + (PixelX / PixelWidth) * ViewportWidth
ViewportY = ViewportMinY + (PixelY / PixelHeight) * ViewportHeight

After we have converted our pixel’s location from pixel space to viewport space, we are ready to do some math.

The Magical Mandelbrot Function

Like I mentioned earlier, the Mandelbrot set is a work of mathematical art.  The function itself isn’t very complex but it involves imaginary numbers.  To calculate the Mandelbrot set itself, you plug the viewport location of the pixel into the function.  After that, you take the output of the function and plug it back into the input of the function.  You continue this until the output of the function goes above some value (the common value to use is 2.  You’ll see me compare against 4 because I’m comparing squared numbers).  If the output goes above the threshold, it has essentially “escaped”.

There are pixels which may take a very very very large amount of iterations to escape, and as far as I know, they haven’t proven that all values will escape (I could be wrong though), so besides waiting for the function to “escape”, you should also set a maximum iteration count to keep it from iterating forever (or for a very long time).

The number of iterations it took to escape is what you use to set the pixel color for that pixel.

Here is some code (pseudo javascript) to show you the details of that process:

var g_maxIterations = 255; //TODO: set this to however many iterations you want to allow maximum
var currentX;  //TODO: need to set this to the viewport X location of the current pixel
var currentY;  //TODO: need to set this to the viewport Y location of the current pixel

var z = 0;
var zi = 0;
var inset = true;
var numInterations = 0;

var newz;
var newzi;

for(indexIter=0; indexIter 4)
	{
		inset = false;
		numInterations = indexIter;
		indexIter = g_maxIterations;
	}
}

if (inset)
{
	//we never escaped, this pixel is the default color
	//TODO: render a default color pixel
}
else
{ 
	//we escaped!  numInterations is how many iterations it took
	//TODO: convert iterations to a color and render the pixel
}

Colorizing The Pixel

There are several ways you could turn an iteration count into a color for a pixel. There are some ways listed below, but this is definitely not an exhaustive list! Play around with your own techniques and see what sort of interesting things you can create!

  • Make a maximum iteration time of 255.  Make your output image be an 8bit greyscale (or color palleted) image, taking the iteration count and writing that out raw as the output color.
  • Make several ranges of iteration values (for instance… 0-255, 256-511, 512-767, etc) where you define a full RGB color at each edge of the value ranges.  From there, figure out where your iteration count falls within the value ranges, and do a lerp between the color to the left and the color to the right based on your distance in the specific value range.  This way, you have smoothly blending color gradients and can go well beyond 255 maximum iterations.  I use a variation of this in my HTML5 Mandelbrot Explorer.
  • Use arbitrary math functions to figure out the RGB of each pixel.  Such as R = Iterations % 256, G = (Iterations * 3) % 256, B = (Iterations * 7 + 39) % 256.

After you have the color for your pixel, you are done. Render that pixel, then move to the next until you have rendered them all.

Zooming and Panning (Scrolling)

By virtue of setting up a viewport as a centerpoint and a width and height, and making the code use that information to convert from pixel space to viewport space, we have made it really simple to implement zooming and panning.

To zoom in, just set your viewport width and height to be smaller (like divide them by 2 for instance).   To zoom out, set your viewport width and height to be larger.  You want to make sure and keep the same aspect ratio (width / height) in your viewport as your rendering rectangle to avoid distortion though, so be careful.   OR, you may want the distortion… it’s up to you (:

To pan the screen, or scroll it left, right, up or down, you just change the centerpoint to be more to the left, the right, higher, or lower.

Very simple, but it is really fun to scroll around and zoom in and out on the fractal to discover new and interesting features to share with your friends.

If you zoom in far enough, you might notice that at some point you have vertical or horizontal lines instead of the fractal shape, and that if you zoom in a little bit more, you’ll get a solid color.

You might ask yourself “Hey, I thought he said fractals were infinitely detailed?”

Well they are infinitely detailed, and in theory you could zoom in FOREVER and always see more and more things, but computers themselves don’t have infinite precision (it would take a computer infinitely large to let you zoom in infinitely), and you are just seeing the edge of the precision of your computer.

If you are using a language like C++, you can change your code to use doubles instead of floats to get a little more breathing room, or another option is to use a scientific mathematics library that is capable of a lot more precision than floats or doubles.

That’s All!

That’s really all there is to it, not that complex is it?

As I keep mentioning, I made something that allows you to explore the Mandelbrot Set in your browser.  You can find that here: Mandelbrot Explorer

Anyways, if you have any questions or comments or want to share some screenshots of creations you made, drop me a line or leave a comment in the comments section with a link to your creation for other people to check out too!

Bias And Gain Are Your Friend

Often times in game development, you have situations where you want an object to move from one place to another, you want something to grow or shrink from one size to another, you want a color to change from A to B, or any other one of the myriad tasks where you want to do something from A to B over time (or over distance).

That’s pretty abstract but let’s take some examples:

  1. You want to move a camera along a straight line from A to B
  2. You want to raise the lighting from dark to bright in a room
  3. When the player clicks an icon, you want to grow a window from small to big
  4. You want to cross fade one skeletal animation to another via the blend weights of the animations (an example from my Anatomy of a Skeletal Animation System articles)
  5. You want to use a gradient in a shader for some effect.

When you are doing these things, it’s real easy to take a percent based on time or distance and just use that percent raw to make a linear effect.   Often times a linear effect just isn’t good enough though because it looks or feels mechanical instead of organic, and unpolished.

Often times the way these things are softened and made more organic is by giving a content creator a curve editor so that they can soften the edges, speed up or slow down the processes over time or distance.

Many game engines don’t come with curve editors that can be easily used for these purposes, and other times you just want to deal with it in code for one reason or another, so don’t have the luxury of giving a content creator carte blanche with a curve editor.

There are a couple techniques for handling these situations but I want to talk to you about 2 of my favorite techniques, which are Ken Perlin’s bias and gain functions.  I actually use Christophe Schlick’s faster approximation functions (as seen in game programming gems 2), but the end result is the same thing.

If you want to skip ahead and see these things in action, I made an interactive demonstration about these functions, check em out! HTML5 Bias and gain

Bias – Not as in bigotry

The bias function takes in a number between 0 and 1 as input (I like to think of this as the percent) and also takes a number between 0 and 1 as the “tuning parameter” which defines how the function will bend your curve.

With a value of 0.5, the percent you put in is the percent you get out (so is linear), but if you put in a number > 0.5 or < 0.5, that's when the interesting things happen.

Shown here are graphs of the bias function with parameters of 0.5, 0.25, 0.75 and 0.97:

Bias 0.5 Bias 0.25Bias 0.75Bias 0.97

In javascript, the code for bias looks like this:

function GetBias(time,bias)
{
  return (time / ((((1.0/bias) - 2.0)*(1.0 - time))+1.0));
}

Gain – Not as in my weight during the holidays

The gain function is like bias in that it takes in both a 0 to 1 input (I think of this as the percent as well) and also takes a number between 0 and 1 as the “tuning parameter”.

Again, with a value of 0.5, the percent you put in is the percent you get out (again, this makes it linear) but if you put in other numbers, you get interesting curves.

Here are graphs of the gain function with the same parameters of 0.5, 0.25, 0.75 and 0.97:

gain 0.5Gain 0.25Gain 0.75Gain 0.97

In javascript, the code for gain looks like the below. You might notice it makes use of the GetBias function. Gain is just bias and reflected bias.

function GetGain(time,gain)
{
  if(time < 0.5)
    return GetBias(time * 2.0,gain)/2.0;
  else
    return GetBias(time * 2.0 - 1.0,1.0 - gain)/2.0 + 0.5;
}

That’s It!

Well that’s about it, pretty straightforward stuff. Wherever you find yourself using a percent in your code, you can try passing it through a bias / gain function (and optionally exposing the tuning parameter to content creators) and see if you can make things feel a little more organic and polished.

Sometimes its the little things that make the biggest difference!

One again, the link to the interactive example of these things is at:
HTML5 Bias and Gain

Anatomy of a Skeletal Animation System Part 3

This is part three of “Anatomy of a Skeletal Animation System”

Animation System Optimizations and Features

Here are some various animation system optimizations and techniques that you might find useful…

Multithreaded Animation Blending

If you are even mildly comfortable writing multithreaded code, this one is fairly easy to implement.

Basically every animated model that needs an update goes into a queue every frame.  (Things that haven’t been on screen for a little while could be exempt from the list so you don’t waste time on things that aren’t being rendered)

At some point in your main loop, you do the animation sampling / anim blend tree blending / etc work to come up with the final bone group. You do this by grabbing the first model in the queue, processing it, then moving to the next model.

Your main loop doesn’t continue until all of the models have been processed.

Now, imagine that you had other worker threads also grabbing models from the queue and processing them, and that the main thread will wait to continue the main loop until the queue was empty and all models had been processed.

TA-DA! You are done and have multithreaded animation blending. It can help A LOT, depending on how many hardware threads you have available for helping work.

Bias / Gain Curves in Anim Blends

With normal animation blending, it’s a linear crossfade from one animation to another.

Sometimes, an animator can make things look nicer if they have the option of doing non linear crossfading.   One nice option for doing this is exposing a bias and gain parameter to the blend in / out parameters.

Bias and gain are great ways of letting content creators create non linear curves for a variety of uses.  Ken Perlin did a lot of work in this area, but in “Game Programming Gems 2”, a guy named Cristophe Schlick presented some simplified, quick equations to calculate approximations of bias and gain.

I highly recommend checking that out and using them for this, and everything else in your game. Using bias and gain you can do things like have your camera move from point A to point B, but start out fast and slow down as it gets closer to B, giving it a nice organic feel to it, instead of a rigid lerp.  With bias and gain you pass in a % and get out a different %.  Real simple to use and extremely useful in every part of your game just about.

Here’s an interactive demonstration of the bias/gain functions I made. The source code for the functions are there too:
HTML5 Bias and Gain

Round Robin Anim Evaluation

There are some situations when you don’t need every model to have perfectly up to date animation data every single frame. One example of this is if you are simulating the game world on a server, where skeletal animation data doesn’t need to be perfectly up to date since network latency already makes it somewhat innacurate.

In these cases, one thing you could do is split the list of models you need to update into perhaps 4 different lists. Then, each frame, you only process one of the 4 lists, thus reducing your animation CPU load down to 25% of what it was. Quick and easy way to save some real CPU time quickly if you don’t need the most up to date animation data all the time.

Pose Sharing

Sometimes you have a lot of different models where many of the models are preforming the same animations – such as if you have a crowd of people in a crowded area.

One way to deal with this is to let some of the people doing the same animations SHARE their computed animation data.

If you are in a crowd, and there’s lots of different looking people walking all sorts of different directions, you aren’t going to easily notice that there are people who are using the exact same bone data, but facing different directions.

Going this route, if you have a group of 4 let’s say that all share the same bone data, you only need to calculate it for one person, and the rest of the group uses the data already calculated.

Less animations to sample and blend so you gain some CPU back.

Skeleton LODing

As things get farther away, or smaller, the smaller details are less noticeable. Because of this, you can “remove” bones from a skeleton as a model is farther away. I mentioned this briefly with facial animations, but the same is true of arm bones, leg bones, hand bones, etc.

You just have to make sure your anim system is able to handle LODing out bones gracefully (no popping) and efficiently (no excessive processing to get a lower LOD skeleton, it should just be a flag on the bones or something).

Runtime Debugging Essentials

Here are some debugging tools that I’ve found essential in debugging day to day animation bugs (popping, twitching, incorrect animations, etc).

Real Time Info On Screen

You really need the ability to show some kind of status on screen for a specified model. The info should show what animations are playing on which animation controllers, the current time of the animation controller, the playback rate of the controller, the state of the state machine, etc.

Using this, when you see a pop, you might see that for a fraction of a second, that an animation switches from one animation to another, then back to the first. From there you can go on debugging it further.

Timeline Log

Sometimes it’s useful to be able to turn on animation logging for a specified model. This way, you can generally log more info than you can on the screen in real time, and can also take your sweet time looking at very small intervals of time to see what went wrong and why.

Very useful.

Show the Bones

Sometimes you really just need to be able to look at the skeleton to see an issue more clearly, or be able to determine if the problem is with a model or the animation data.

Having a way to turn on bone rendering such that it draws 2d (unprojected) lines on the screen showing the bones of a specified model is very useful. Also sometimes it’s nice to be able to see the bones of all the animation data that went into the final blended pose, instead of just seeing the final blended pose.

Control Time Itself!

Lastly, sometimes it’s really useful to be able to slow down time to see a problem in greater detail. Rarely, it’s also useful to be able to speed up time. Having the ability to do both while the game running can be a really big help.

That’s All She Wrote

That, and MURDER I mean.

I hope you enjoyed these articles on the anatomy of a skeletal animation system. Drop me a line or post a comment if you have any questions or comments (:

Anatomy of a Skeletal Animation System Part 2

This is part two of “Anatomy of a Skeletal Animation System”

Animation Controller v3 – Bone Groups

In part 1, we talked about how to make a skeletal animation system that was able to play smooth, non popping animations on a model, it could communicate back to the engine to play sound effects, spawn objects in specific spots, and many other things as well.  What it could not do however, was play a different animation on the upper body and lower body.

To solve this, instead of having a single animation controller for our model, we need to have multiple animation controllers, where each controller controls a specific set of bones.  Note that multiple controllers should be able to affect the same set of bones, and in the end result, a bone’s position is made up by blending the data from all animation controllers that affect it.

Each animation controller should have a blend weight so that it can be blended in and out to keep animation motion smooth and continuous, and also the blend weighting allows you to turn on and off specific animation controllers as needed.

Some great example uses for this are…

  • Having a seperate animation controller for the upper and lower body so that they can work independently (the lower body can look like it’s jumping, without having to care if the upper body is firing a gun or not).
  • Having a seperate full body animation controller that affects all bones.  In most situations, this animation controller would be off, but in the rare cases that you want to play a full body animation, you turn this one on and play an animation on it.
  • Having a facial animation anim controller that only turns on if the camera is close enough to a characters’s face.  This way, if you look closely at another player, you can see their face moving, but if you are far away from them, the game engine doesn’t bother animating the facial bones since you can’t see them very well anyways.

The order that these animation controllers are evaluated should be explicit (instead of left up to load order or things like that).  You want to be very clear about which animation controllers over-ride which other animation controllers for the case of having multiple on at the same time, affecting the same bones.

For the sake of efficiency, when trying to blend the animation data together from each animation controller that affects that bone, you should start at the last fully weight (100% weight) anim controller in the anim controller list.  This way, you don’t bother evaluating animations for anim controllers that are just going to be completely masked out by other animation controllers.

If there is no full weight anim controller in the list that affects the specific bone, initialize the bone data to the “T-Pose” animation position before blending the other anim controller bone data on top of it.

We now have a very robust animation system, but it isn’t quite there yet.  Interacting with this animation system from game code means you having to tell specific game controllers when to play specific animations.   This is quite cumbersome and not very maintainable.  Ideally, the animation logic would be separated from the game play logic. Besides making the code more maintainable, this means that non animation programmers will be able to write game play code that interacts with the animation system which is a big win for everyone. Fewer development bottlenecks.

Animation Selection

There are two good techniques i’ve seen for separating the logic and preforming animation selection for you.

The first way is via “animation properties” and the second way is by using an animation state machine. There are pros and cons to each.

Animation Properties

For the animation properties method, you essentially have a list of enums that describe the player’s state.  These enums include things such as being able to say whether the player is crouched or standing, whether the player is unarmed, holding a pistol, or holding a rifle, or even how injured the player is (not injured, somewhat injured, or near death).

The game play code would be in charge of making sure these enums were set to the right values, and the animation controller(s) would use these values to determine the appropriate animations to play.

For instance, the game code may set the enum values to this:

  • WeaponType = Rifle (vs Unarmed, Pistol, etc)
  • WeaponAction = Idle (vs Firing, Reloading, etc)
  • PlayerHealth = NearDeath (vs healthy, injured, etc)
  • MovementType = WalkForward (vs Idle, Running, LungeRight, etc)

From here, the animation system takes over.

The lower body animation controller perhaps only cares about “MovementType” and “PlayerHealth”.  It notices that the player is walking forward (WalkForward) and that they have very low health (NearDeath).  From this, it uses a table that animators created in advance that says for this combination of animation properties, the lower body animation controller should play the “WalkNearDeathFwd” animation.  So, the lower body animation controller obliges and plays that animation for the lower body bones.

The upper body animation controller perhaps just cares about WeaponAction, WeaponType and PlayerHealth.  It notices that the player has a rifle, they aren’t shooting it, and they have very low health.  From this, the upper body animation controller looks into it’s animation properties table and sees that it should play the “RifleIdleInjured” animation, so it plays that animation on the upper body bones.

The logic of game play and animation are completely seperate, and the animators have a lot of control over what animations to play in which situations.

Once again, you’d want an editor of some sort for animators to set up these animation properties tables so that it’s easier for them to work with, it verifies the data to reduce the bug count, and everyone wins.

Your tool also ought to pack each animation properties table (upper body, lower body, facial animation, full body animation, etc) into some run-time friendly structure, such as perhaps a balanced decision tree to facilitate quick lookups based on animation properties.

Animation State Machine

Another way to handle animation selection is to have the animation controllers run animation state machines, having the game code send animation events to the state machines. Each state of the state machine corresponds to a specific animation.

When the player presses the crouch button for instance, it could send an event to all of the animation controllers saying so, maybe ACTION_BEGINCROUCH.

Depending on the logic of the state that each anim controller state machine is in, it may respond to that event, or ignore it.

The upper body anim controller may be in the “Idle” state. The logic for the idle state says that it doesn’t do anything if it recieves the ACTION_BEGINCROUCH event, so it does nothing and keeps doing the animation it was doing before.

The lower body anim controller may also be in a state named “Idle”. The logic for the lower body idle state says that if it recieves the ACTION_BEGINCROUCH event, that it should transition to the “StartCrouch” state. So, it transitions to that state which says to play the “CrouchBegin” animation (also says to ignore all incoming events perhaps), and when that animation is done, it should automatically transition to the “CrouchIdle” state, which it does, and that state says to play the “Crouching” animation, so it does that, waiting for various events to happen, including an ACTION_ENDCROUCH event to be sent from game code when the player lets go of the crouch button.

The interesting thing about the anim state machine is that it gives content creators a lot more control over the actual control of the player himself (they can say when the player is allowed to crouch for instance!) which can be either a good or bad thing, depending on your needs, use cases and skill sets of your content creators.

Going this route, you are going to want a full on state machine editor for content people to be able to set up states, the rules for state switching, and they should be able to see a model and simulate state switches to see how things look. If you DO make such an editor, it’s also a great place to allow them to define and edit bone groups. You might even be able to combine it with the key string editor and make a one stop shop editor for animation (and beyond).

Animation Controller v4 – Animation Blend Trees

At this point, our animation system is in pretty good shape, but we can do a bit better before calling it shippable.

The thing we can do to really spruce it up is instead of dealing with individual animations (for blending, animation selection, etc), is to replace them with animation blend trees like the below:

Animtree

In the animation blend tree above, you can see that it’s playing two animations (FireGun and GunSight) and blending them together to create the final bone data.

As you can imagine, you might have different nodes that preformed different functionality which would result in lots of different kinds of animations using the same animation blend tree.

You will be in good shape if you make a nice animation blend tree editor where a content creator can create an animation blend tree, set parameters on animation blend tree nodes, and preview their work within that editor to be able to quickly iterate on their changes.  Again, without this tool, everyone’s lives will be quite a bit harder, and a little less happy so it’s in your interest to invest the effort!

Some really useful animation nodes for use in the blend trees might include:

  • PlayAnimation – definitely needed!
  • AnimationSequence – This node has N number of “children” and will play each child in order from 1 to N in a sequence.  You may optionally specify (in the editor) that you want the children chosen at random and you specify a weighting to each child for the random choosing.  This is useful for “idle animations” so that periodically an idle character will do silly things.
  • AimGrid – this animation node uses the player data to see yaw and pitch of the player’s aim.  It uses this information to figure out how to blend between a grid of 9 animations of the player pointing in the main directions to give a proper resulting aim.  This node has 9 children, which specify the animations that specify the following aiming animations: Up Left, Up, Up Right, Left, Forward, Right, Down Left, Down, Down Right.  Note that since this is a generalized anim blend tree, these child nodes can be ANY type of animation node, they aren’t required to be a “PlayAnimation” node.  This in essence is the basis of parametric animation (which i mentioned at the beginning of part 1), so this is a way to get some parametric animation into your system without having to go full bore on it.
  • IK / FK Nodes – get full or partial ragdoll on your model.  Also get it to do IK solving to position hands correctly for specified targets and such.
  • BlendBySpeed – You give N number of children, and movement speeds for each child.  This animation node will choose the correct animation, or blend between the correct animations, based on the current traveling speed of the player.  This way you get a smooth blend between walk, run and sprint animations and the player can move at whatever speed they ought to (perhaps the speed is defined by the pathing system, or the player’s input).  To solve the problem of feet “dancing” as they blend, you need to make sure the footfalls happen on the same time (in %) on each animation that will blend together.  This way, the animations don’t fight eachother, and the feet will appear to move properly.
  • BlendByHealth – if you want the player to walk differently when they are injured, this node could be used to specify various walk animations with matching health levels so that it will blend between them (for upper or lower body or whatever else) as is appropriate for the player’s current health level.
  • Additive Blending – to get gun recoils and such

As you can see, animation blend trees have quite a bit of power.  They are also very technical which means engineers may need to help out content folk in making good trees to resolve some edge case bugs.  In my experience, animators are often very technical folk themselves, so can do quite a bit on their own generally.

Combine anim blend trees with the animation selection systems (FSM or anim properties) and the ability to smoothly blend an animation controller between it’s internal animations (or anim trees) it’s playing and you have a really robust, high quality animation system.

Often time with this work flow, an animator will just say “hey i need an anim node which can do X”, so an animation engineer creates the node and the animators start using it to do interesting things.  No need for an engineer to be deeply involved in the process of making the animation work like the animator wants, or having to worry about triggering it in the right situations etc.

Sure there will be bugs, and some things will be more complex than this, but by and large, it’s a very low hassle system that very much empowers content creators, and removes engineers from needing to be involved in most changes – which is a beautiful thing.

End of Part 2

This is the end of part 2. In the next and final part, we’ll talk about a few other miscellaneous features and optimizations.

Anatomy of a Skeletal Animation System Part 1

This is part one of “Anatomy of a Skeletal Animation System”

There is quite a bit of information out there on the basics of skeletal animation, including how to export and read animation and model data, how to animate bones and thus transform a mesh, how to blend bone data together and other related animation topics.

However, there is a lot less information out there about how to set up a system to use these techniques in a realistic way, such as you might find in your average modern 3d video game.

I myself have been an animation programmer on a few games including an open world unreal engine game called “This is Vegas” (unfortunately cancelled due to Midway going bankrupt) and also a multiplayer only first person shooter called “Gotham City Impostors” which was released earlier this year for PC, 360 and PS3.  The info I’m presenting is based on experience developing those games, as well as info i gathered from other developers or read about in books or online.

In this article I’m going to assume you already know how to get animation bone data into memory, how to use that animation data to animate models (meshes), and also how to blend animation bone data together.  I’m going to start off with the most simple animation system possible and slowly introduce features until we end up at something that would be fully featured for a typical modern game.

The “next generation” of skeletal animation seems like it’s going to be heavily based on parametric animation, and while we will TOUCH on the basics of parametric animation, we won’t dig into it very much beyond that.   If you are making a next gen AAA title, parametric animation may possibly be for you (and maybe not), but with the rise of 3d in flash, the rise of mobile games, and also indie game development, I think traditional pose driven skeletal animation is here to stay at least for a while.

Depending on the needs of your project, and how high a quality bar you want vs how much CPU time you want to spend on animation, some of these features may not be appropriate.  Feel free to take what is useful to you, and leave what isn’t.  Every game is different.

Animation Controller v1 – Super Simple

The simplest point we will start out is that if you have a mesh with an animation controller on it (to control what animations should play on it and such), it has these features:

  • If you tell it to play a looping animation, it will continue playing that looping animation forever.
  • If you tell it to play a non looping animation, it will play the animation and have some way of notifying you when the animation is done.  This is either by having it call a callback when it’s done, or by setting some flag on itself saying that the animation is done (won’t ever get set on a looping animation)
  • You should be able to tell it a playback multiplier to play the animation at, such as if you tell it to play at 3.0, it will play 3 times as fast, or if you tell it to play at 0.5, it will play half as fast and look like slow motion.
  • If you tell it to play an animation while another animation is playing, it will instantly stop the animation it’s playing and start playing the new animation.

With this simple animation system, we could conceivably make a game that has animated characters.

That being said, the animation system is lacking in a few ways:

  1.  You can only play full body animations, meaning if you want the lower body to look like it’s jumping, and the upper body to look like it’s firing a rifle, you have to make an animation that looks like that.  If you want the same thing, but you want the lower body to look like it’s standing around while the upper body is firing a rifle, you have to make an entirely different animation that looks like that!  The permutations of actions can get quite large and you have to decide in advance which animation you want to use.  That is, when the player is jumping, they cant change their mind that they suddenly want to start shooting.
  2. When you switch animations, there is visible “popping”.  Popping is when a bone goes from doing one thing to doing something else instantly.  It looks like the bone teleported and is very visible to players.  It looks buggy and unpolished.
  3. If you are doing something like having the player throw a grenade, you have no way of knowing when to actually spawn the grenade model, and where to spawn it.  You could “hard code” it to spawn at the same place relative to the player each time, when the animation stops playing, but that is pretty hackish and not very maintainable.

Lets start off by working on solving problem #3 of not being able to specify where to spawn a grenade or when to spawn it.

Keyframe Strings

To solve the problem of WHEN to spawn it, a feature common to nearly all animation systems is the ability to put game engine events on animation key frames.

This way, when the arm is at the correct position in the throw animation, someone would be able to put an event like “throw grenade” on that animation key.  When the animation reaches that animation frame, it sends the message to the game engine, which can then create a grenade (with any specified parameters to the event).

Often times I’ve seen this implemented as an actual string that is associated with an animation key frame.  The strings might be things like:

Playsound Laugh.wav   (to play a sound to go along with the animation)

SpawnPhysicsProjectile  Grenade.mdl 0 0 5   (to spawn a projectile with the specified mesh and velocity vector)

FootFallSound (This would tell the engine to play a footstep sound, based on the material the player was standing on, such as a metalic sound if on metal, or a duller thud if walking on dirt)

You could also use it to hide and show attachments or a myriad of other things.  Basically you can use it for anything that you want to be tied to an animation.

Usually you’ll want some kind of editor for animators and other content creators to be able to associate these key strings with specific key frames.   If they have to work with a text file where they have to hand enter times and key strings associated with those times, it’s going to be really tedious and they are going to be sad.  Also, it will be very error prone which makes everyone sad when it generates more bugs than it needs to, slowing down dev time.

On the topic of creating unnecessary bugs, while i’ve often seen keystrings implemented as actual strings, it’s actually a lot less error prone if you have some kind of structured input system in your key string editor.

For instance, instead of them typing a command name and supplying any required parameters, it would be a lot better for them to have to choose a key string command from a drop down list.  When they choose one, it should display any parameters that might be needed, and have some way of validating that their input is valid.

This editor should be tightly coupled with your game engine.  Example ways for doing this including having a shared header file that defines all key string commands and what parameters they require, or having the key string editor load a game dll to get at the data that way.

If you have to manually maintain the tool to match game code, it will often get out of sync and cause you pain you don’t need.  Avoiding that pain means you can work on developing more features instead of fighting reoccurring bugs, and means QA can focus on finding harder to find bugs.  In the end it means a better product which is great for the company, your continued paycheck, and the player’s experience.

Some other potential bugs can come up with key frames that I don’t have a good answer for, it’s just something you have to mindful of.

One of these bugs is that when an animation is interrupted, a key frame might not get hit when you expect a key frame to get hit.  For instance if an animation attaches something to a players hand, and at the end of the animation hides that attached object, if you interrupt the animation midway through, it won’t get hidden and the attachment will be stuck to the hand as the player does other things – which looks very weird.  Your best bet is to design things in such a way that if key strings are missed, it isn’t a problem.  Not always possible with all features unfortunately though…

Another problem that comes up when you have more advanced anim systems is that you may be blending out an animation which is no longer relevant, but while it is blending out, it hits a key frame.  For instance if you a player is holstering a weapon, but blending out a fire animation that got interupted, you may get a “firegun” key string command, when you really don’t want it because it’s not relevant anymore.  Sometimes you would want a key string to fire in that case though, so there is no real global solution to the problem that I’m aware of.

Sockets

Now that we have a way of knowing WHEN to spawn a grenade in a grenade throw animation, we don’t know WHERE to spawn it.  This is where sockets come in – no I’m not talking about TCP/IP or UDP sockets!

A seemingly obvious solution is probably to say which bone to spawn the grenade on in the “throw grenade” animation key string.    An issue here though is that maybe if you spawn it right on the “rhand” bone, it might clip through the hand (inter-penetrate the hand) and look sloppy.  Also, for other use cases, you might want to attach something where there isn’t a bone nearby.

Another seemingly obvious solution might be to add extra bones to the animation data that aren’t tied to any real geometry.  This way, you can use the bones to attach things to, or spawn things at, but they aren’t tied to any real model geometry so you can make them move however you want.

The problem with this solution is that you are paying the cost of animating those bones even if you aren’t using them for anything.  Enter sockets!

Sockets are a transformation (translation and rotation) away from a specified bone.  They are usually only calculated on demand so that when you aren’t using them, you don’t pay a price for having them.

This way, sockets act as very cheap attachment / reference points on a model during animations to attach other models to (such as capes, helmets, guns, grenades).

When a key string command takes a socket or bone as a parameter, you should have it accept either a bone or a socket.  They should be usable interchangeably, because sometimes you really do want to attach something to a bone, and you shouldn’t make an animator make an extra socket just to make it match a bone.

We now have a way of specifying WHEN to spawn a grenade (via a key string), and also WHERE to spawn it (specifying a socket to spawn it at as a parameter to the key string command).

Animation Controller v2 – Blending

I mentioned popping earlier and said it was caused by a bone changing where it is or how it’s moving by a drastic amount in a single frame.  If you’ve read my DIY Synth articles, you probably remember how important in audio programming it is to make sure that your sound data stays continuous.   The same is true of animation data, you have to make sure that bone motion / position stays continuous always, or else you’ll get popping.

Just like in audio programming, you use envelopes to help keep things continuous when you add a new animation into the mix, or remove an old animation.

For instance, If a model is playing one animation and you tell it to play another, the new animation should start at a blend weight of 0.0 and slowly increase while the old animation decreases from a blend weight of 1.0 down to 0.0.  This gives you a nice smooth blend between two animations and works for MOST animations (more on that in a second).

Typically, when crossfading from one animation to another, the magic number is to blend over 0.2 seconds, but certain uses may warrant a longer or shorter blend time.  You might also blend out the old animation at a different rate than you blend in the new animation.  Give your animators the option to choose so they can do whatever they need.  They will be happy that they have the control, and you will be happy that you don’t have to one off program things all the time for them.  Everyone wins!

What happens if you want to play an animation while an animation blend is in progress already?  0.2 seconds of blend time sounds like a short amount of time, but this actually comes up ALL THE TIME.

There are two ways to deal with this issue that I’m going to talk about.

The first way to deal with this problem is to keep a list of all the animations that are currently playing, so that if you tell the animation controller to play a bunch of different animations really quickly, it will end up sampling a bunch of different animations as various  ones blend out, and the final one is blending in.  This can result in A LOT of animation sampling which can take a serious toll on your game’s performance.  I encountered a bug on a game I worked on once that caused around 100 animations to be getting sampled on a single model for several frames due to this problem and it made the game tank HARD.

The second way to deal with this, and how I like to implement it usually, is to make it so only two animations can play at once (a main animation and a blend animation) and you have another field on the animation controller which says what the next animation  to blend in is.

Going this route, when you say to play a new animation while a blend is in progress, it goes into the “next animation” field.  When the current blend is done, that next animation will blend in and the last one will blend out.

If there is already another animation in the “next animation field”, it’s replaced and it’s never seen.

This way, only two animations will be sampled / blended at a time maximum, yet you will get a perfectly smooth blending between animations, and the controls will still feel fairly snappy, although there may be a noticeable delay in control response if animations change a lot really often.  You’ll have to make a judgement call about the needs of your game.

Lastly, I said blending works nicely for most animations but not all.  One exception to this rule is when you try to blend different lower body animations together, such as trying to blend a walk animation and a run animation together.  Often times, the feet will be in different places and when you blend them, it makes the feet look like they are doing a little stuttering dance and it looks ugly.  I’ll talk about getting around this specific problem in the next part, but as a preview, the short version of the solution is to make sure the feet are in the same positions at the same time for the two animations.

End of Part 1

At this point we have a fairly nice animation system but it isn’t quite ready yet. The most glaring problem we have is that we can only play full body animations still, which is not acceptable.  A real animation system NEEDS to be able to play different animations on different sets of bones independently.

We’ll tackle that problem, and others, in part 2.

Encryption 101: Realistic Security

This is the fifth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Realistic Security

“Everyone has a plan ’till they get punched in the mouth.” — Mike Tyson

Cryptography is really awesome, and as a friend of mine said today  (BOOOOOORIIIIIIIS your grandma is calling you!) , there’s a certain mathematical purity to it that’s really appealing.

However, in most security systems, cryptography is not the bottleneck. There’s often way easier things to attack and often you just need to defeat the weakest link in the chain to break open the whole thing.

A popular and successful method of attacking secure systems is something called Social Engineering which you see a lot of in movies like “mission impossible” and “sneakers”.

Social engineering is when you chat up the receptionist and get her to give you info she really ought not to give out, or when you call a company claiming to be maintenance and asking for the door code to get in after hours. Often much easier than trying to factor gigantic primes or the like 😛

Beyond social engineering, there is also physical security to watch out for. I attended DEF CON in vegas for a few years with my good buddy LagGod and learned some really interesting things.  DEF CON has gotten pretty packed in recent years but i highly recommend going if you are at all interested in security. Lots of really talented people on both sides of the fence (attackers aka black hats, and defenders aka white hats) and even some feds and random technophiles thrown in. Here’s two really memorable security lessons I learned at those conferences that really put security into perspective for me.

Hacking Into a Wifi Network the Easy Way

Note: this no longer works as advertised, thanks to advancements in wifi security technology, but the principles are still interesting and could work in other situations you may find yourself in.  Also, it’s good to know weakness of systems past and present to better protect other systems.  Otherwise, only the criminals have guns and we are all screwed 😛

Ok so lets say that you want to hack into a company’s network, and lets say that they have a wireless router where when you first try to access it, you are presented with a web browser login screen to type in your username and password.

How wifi networks used to work is that if you were trying to connect to a wifi network, it would pick the router that had the strongest signal that was broadcasting the network id that you wanted to connect to.

What this means is that you as a hacker could drive into the parking lot of a company and broadcast their network id with a really strong signal.  Then, when people tried to use their network, the traffic would be directed to your machine.

If you saved the html of their login page before turning on your fake network, you would be able to present a web page to the people hitting your network that looked exactly like the login page they were used to seeing, except that you could take all those usernames and passwords they entered, and log them to a text file!

After you’ve harvested a few logins, you turn off your network and then log into theirs. Thanks for the logins d00ds!

I’m not sure how they solved this problem, but you could probably do something with public key encryption to make sure that everyone who is broadcasting a network id is actually legitimately part of that wireless network.

Defeating Biometrics

Again this is somewhat dated info, but it’s still pretty interesting, and possibly useful for other situations.

It used to be that finger print scanners were a lot simpler (some cheap ones might still be). It used to be that if you mashed a gummy bear onto a finger print scanner, that the scanner would pick up the oily fingerprint of the last person that used it, which surely is a valid user, and so, the door would open, the laptop would unlock, or whatever else.

They fixed that problem by having it detect heat, listen for a heartbeat, and probably lots of other secret or publicized ways, but it used to work pretty regularly!

Something else to say about biometrics is that despite the complexity of the actions they preform, I’ve been told that often times there is just a single wire going into them, and a single wire going out of them. For all those fancy actions, all the thing does in the end is complete a circuit of two wires. If you really need to get in somewhere, you are likely able to smash open the box and connect the wires, circumventing the “infallible” biometrics reader.

Final Notes on Security

Here are some final words on security.

  •  There is no such thing as perfect security, there is only good enough security.  The only way to get perfect security is to lock your computer in a safe and drop it into the Marianas Trench (although I hear you have to watch out for James Cameron these days).
  • Good enough security often means just making sure you aren’t the low hanging fruit.  If you are more difficult to attack than your peers, you are safer than they are.  if you and someone else are running from a lion, you don’t need to outrun the lion, you just need to ourtun the other guy!
  • If your security is based on the fact that your algorithm is secret, that is called “Security Through Obscurity” and is really weak security.  You should assume your attacker knows the details of everything for better security.  Also, secret algorithms don’t get peer reviewed, so weak techniques don’t get weeded out.  Don’t forget that people STILL haven’t cracked the 72 bit RC5 message.  A single message with a 9 byte key, published in the mid 90s, attacked by distributed computer networks, and still, it hasn’t been cracked despite the algorithm being publicly available.  That is some good security right there.

I went to a talk at either DEF CON, or San Diego’s Toorcon (sorry, can’t remember which) where the author Bruce Schneier (who is mentioned in the disclaimer / header of these articles) gave a talk after he had just published a book as a sequel to Applied Cryptography.  He said something like “Throw away the other book… physical security is the only thing you really need to be worried about.”

BTW Bruce, if you are reading this, thanks for that first book anyways man, you rock (:

… and let me know if i misquoted you 😛

Thanks for reading!  Now go forth and cryptophy.  HACK THE PLANET!

Cryptography 101: Encryption – Asymmetric Keys

This is the fourth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Asymmetric Key Encryption (Public and Private Keys)

Unlike symmetric key encryption, which uses the same key for encryption and decryption, Asymmetric key encryption uses one key for encryption and a different key for decryption.

This probably sounds strange why you would want to have two passwords, but the reason is that you keep one for yourself, and give the other one out to another individual or a group.

Because you keep one to yourself (private) and give the other out (public) these are called public and private keys, and this technique is called Public Key Cryptography.

Depending on which key you keep private (the encryption or decryption key), you can get different effects.

Usage Pattern 1 – Private Encryption, Public Decryption

If you keep the encryption key secret, but publish the decryption key out to the public (or to a group of people, or to another individual), what that means is that you can encrypt data which can be read by anyone. What is useful about this is that they have to use your public key to decrypt the data, so they know it was encrypted with your private key, which means they can be reasonably sure that you were the one that wrote the message. You have effectively cryptographically signed your message so that people know it was in fact you that sent that message.

People use this technique all the time in computers, this is how you can verify that something is from a legitamate source, regardless of if we are talking about a web page (HTTPS), a valid device driver (digitally signed device drivers), or other things of that nature.

Another neat thing about this usage pattern is that getting creative, you can also be ensured that the message or data hasn’t been tampered with.

For instance, let’s say you were making a computer operating system where you only allowed the computer to run trusted (signed executables).

Re-visiting a technique mentioned in the first article in this series on hashing, a “signed executable” might look like the below:

  • [Cryptographic Hash of Unencrypted Executable Data]
  • [Encrypted Executable Data]

So, you as the “central signing authority” for the operating system would receive programs from people wanting to release software on your operating system.

First, you would put the software through it’s paces via analysis and testing to make sure  the program worked as intended, was up to the level of quality you wanted software on your OS to be, followed any specific rules about how the software should behave and interact with the rest of the operating system, and also you would make sure the software wasn’t malicious.  Also, you would have to make sure the software wasn’t insecure in any ways that could compromise the rest of your security (for instance, if it had a Buffer Overflow, that could let attackers run arbitrary, unsigned code on your operating system, causing viruses to spread and other malicious things).

Once the program is verified safe, next up you would make the hash of the unencrypted program, write that to a file, then  encrypt the program with your private key and write that to the file after the hash.

You now have a trusted / signed executable to distribute.

When a user downloads this executable from your application store and tries to run it, the operating system could take the following measures to verify that the executable was trusted and unaltered from the time of it’s signing:

  1. Unencrypt the executable using the public key.
  2. Hash the unencrypted data and ensure that it matches the hash at the beginning of the file.

If the hashes match, you know that the executable was indeed signed by the central authority, and that it has not been altered in any way since it’s signing. Therefore, it is safe to run!

I am pretty sure variations of this sort of algorithm are used by things such as the xbox, playstation and iphone / ipad devices.

Usage Pattern 2 – Public Encryption, Private Decryption

The other way to use asymmetric key encryption is to publicize the encryption key, but keep the decryption key private.

What this allows is for anyone to encrypt a message that only you can read.

One thing you could do with this is would be to be able to communicate securely with people if all you had was public communication.

For instance you could post to a public forum saying “This message is for Jesse”, and then put the encrypted data after that.

Since only Jesse knows his private key, and thus only Jesse can decrypt the data, only Jesse will be able to read your message, even though it is visible to everyone.

Despite this, there are still several unknowns in this particular communication, including:

  1. Jesse doesn’t know that you really are who you say you are
  2. You don’t know that Jesse got the message
  3. Jesse doesn’t really know that the message wasn’t tampered with (well… if it’s a text message you are sending, and jesse unencrypts it and it’s garbage, he knows that the message was tampered with, but if the expected data was not so obvious when it was wrong, he may not be able to know that the message hadnt been tampered with).

But those problems, and others, are solvable, which leads to our next point…

Cryptographic Protocols

A neat thing about cryptographic techniques like this one, symmetric key cryptography, and hashing is that they are basically just building blocks that you can stack together in different ways to be able to do useful and interesting things.

Once you learn some of the basic building blocks of cryptography (what this cryptography 101 series of articles is supposed to be all about), you can then learn more about how to put those building blocks together to preform useful tasks.  The recipes for preforming these useful tasks are called Cryptographic Protocols and they can (and often should) contain more than just cryptographic techniques.

In the first usage pattern, I showed how combining asymmetric key encryption with hashing can provide you with a system for creating and verifying trusted executables.  That series of steps for creating and using trusted executables was a cryptographic protocol that contained important steps even beyond just encryption and hashing – such as verifying that the executable was not malicious or insecure.  Leaving those steps out creates a big security hole, so they are very important to the overall protocol.

For the second usage pattern, here’s some cryptographic protocols to solve the problems i called out:

  1. To solve the issue of Jesse not being sure that you are who you say you are, you could take the encrypted message you created, and sign it with your own private key (of which the decryption key is public… this is usage pattern 1).  This way, when Jesse gets the encrypted message from you, he first unencrypts it with your public key, and then unencrypts it with his own private key.  If the message comes out as garbage in the end, he knows that one of the two steps failed.  Specifically, either it wasn’t YOU who sent the message, OR, you used the wrong public key when signing a message to send to him.  Jesse doesn’t know which step went wrong, but he does know the message is invalid one way or another.
  2. To solve the problem of you not knowing that Jesse got the message, you could tell Jesse in the encrypted message “Jesse, if you get this message, respond by sending me back an encrypted message that says ‘the password is forty two'”.   Then, if Jesse got the message, he could encrypt a message saying “the password is forty two” using your public key, and then post it on the board again for you to unencrypt with your private key and see that he got receipt of your message.  While it’s true that anyone is able to encrypt messages meant for you, and so anyone could have written that message, there is some level of security there because the specific message you said to send was encrypted in such a way that only Jesse could have read it.  This way, you can be reasonably sure that Jesse got the note.
  3. To solve the issue of Jesse not knowing if the message was tampered with at all or not (in the case that it’s hard to tell if you got the right data out or not), one way would be to just put a hash of the unencrypted data on the front of the message.  You’d have to agree with Jesse in advance on the protocol, but using the hash again, it would let Jesse know that the data hadn’t been tampered with.

Generation of Key Pairs and Algorithm

By the very nature that these keys work in tandem means that they are somehow linked together mathematically.

I was trying to think of a really simple way to show how public and private keys work together and how they are linked, with a minimal piece of sample code. I thought i had figured out a simplified way, but unfortunately it turned out I was mistaken and my method didn’t work at all.

So, I have to refer you to this page which is pretty darn helpful for understanding how the real thing works with RSA, but unfortunately it doesn’t explain the full nitty gritty of WHY it works to my liking.  Still a very good read though: http://sergematovic.tripod.com/rsa1.html

Common Algorithms

Some commonly used Public Key Encryption algorithms are SSH, IKE and apparently even Bitcoins use it!.

Oops!

After I wrote up this article, my friend Patrick corrected me saying that the process i described is not the usual process for digitally signing data. He said:

You got signing a little mixed up for asymmetric. Traditionally the process is:
1. Alice creates a public and private key pair.
2. Alice shares her public key with the world.
3. Alice never shares her private key.
4. Bob can now encrypt messages using Alice’s public key and only Alice can unencrypt them using her private key.
5. Alice can take a hash of something she wants people to verify as coming from her. Alice then signs that hash with her private key. Now Bob can verify the item coming from Alice by taking the hash of the data and comparing it against the hash in the signature using Alice’s public key.

Some additional reference:
RSA Labs Digital Signing Explanation

There are two reasons that I can think of why that process is better that the one I described:

  1. You can sign data without obfuscating it via encryption.
  2. Public key encryption takes a lot of processing power apparently, so you want to minimize how much data you encrypt with it.  This method encrypts a far smaller (and constant) amount of data.

Thanks for the correction Patrick!

Cryptography 101: Encryption – Symmetric Keys

This is the third article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Symmetric Key Encryption

Symmetric key encryption is a fancy name for the type of encryption you are probably most familiar with, which is using a password to scramble and unscramble data to make sure only certain people can see it.

This is in contrast to asymmetric key encryption, where you have two passwords; one for encrypting and one for decrypting (The next article is going to be on asymmetric key encryption).

Security

There are numerous symmetric key encryption algorithms out there but they all have one thing in common: their security relies on only the right people having the password, and the assumption that the best way attackers have for getting the plaintext from the ciphertext is to guess the password via brute force.

In good (modern) algorithms, people say things like “on average it will take geological or astronomical amounts of time to guess a password with the computing technology of today and the projected future” so they are reasonably sure people won’t be able to brute force the password in any useful amount of time.

Quantum computers give some forms of cryptography a scare though, because there is something called Simon’s Algorithm which is a quantum computing algorithm that can brute force search ANYTHING with exponentially fewer operations than classical computing.  This means it can brute force guess passwords of an encryption algorithm a lot faster than a normal computer.  At the time of writing this, I think the record for quantum computing power is something like having 4 cubits work together to do some simple math operation (like multiplication).  We could be on the precipice of disaster regarding cryptography, but luckily there are encryption algorithms that take the same amount of time, or longer, for quantum computing to solve, so it isn’t all doom and gloom.

When decrypting data with either symetric or asymetric key encryption, there is no built in way to know if you had the right password or not. You can know by looking at the recovered plaintext and seeing if you got junk out, or meaningful data, but if you don’t know what the data out is supposed to be exactly, or what it’s supposed to look like, there’s no way to know if decrypted it correctly. This makes it so sometimes it can be difficult for attackers to even KNOW if they have guessed the right password or not, which is good for us folk trying to protect data.

Just like a good hashing algorithm, small changes in input should ideally yield large changes in output, which makes it a Chaotic Function and makes it so the cipher text gives as little information about the plaintext as possible.

Sometimes people will use multiple encryption algorithms on a piece of data in the hopes of making it harder to crack, which sometimes works, but can also be fairly dangerous.

To understand the danger, consider how every program, no matter how complex, is essentially a traditional algebraic function (with perhaps lots and lots and lots of terms).  For encryption, the input is the plain text and key, and the output is the cipher text.

Now, just like in junior high and high school, sometimes when you plug one function into another like f(g(x)) and preform algebraic substitution, terms from f and g maybe cancel out.  You may end up with a function that is less complex than either f(x) or g(x), or it just may be less complex for certain values of x.  An attacker could exploit these attacks to their advantage and it might be easier for them to recover some or all of the plaintext because you used two encryption algorithms instead of one.

On the other hand, using multiple algorithms, or the same algorithm multiple times (perhaps with different keys) can also make it a lot more secure.  It’s just something to be mindful of.

Clever programmers and mathematicians sometime come up with encryption techniques where attacking the algorithm itself is the literal equivalent of having to solve famous unsolved math problems from the ages.  These often seem really secure because for some of these problems, the best and brightest minds in all of history have been fighting with the problems for hundreds or thousands of years and making no progress.

Every now and then, some smarty figures one of these out though, and suddenly, encryption algorithms based on it become essentially worthless.

Another common way that people attack ciphertext is via something called a “known plaintext attack”.  What this means is that if the attacker knows any part of the plaintext before it became ciphertext, they can sometimes leverage that knowledge to know a bit more about the key or algorithm used to encrypt the data.  That simplifies their work and makes it more likely that they can get the plaintext back without having to revert to brute force.

One really common way this comes up is if people do something like compress their data before encrypting, or they encrypt known file types like executables, word processing documents, image files etc.

The reason for this is because in all of those file types, there is a standard, well known header that those files have, which allow other programs to use them.  That header data is known plaintext and can be used by an attacker to get more information how to recover the plaintext.

For all the clever people out there trying to make encryption based on super advanced mathematics, in the end, some of the very most secure algorithms out there are based on very simple computing operations such as addition, subtraction, bit rotation, and XOR.

As an example, there is an algorithm called RC5 which only uses those basic operations (you can find the source code for it easily!) and yet is extremely secure. The makers of RC5 published their source code, and encrypted some data with various key sizes (7 byte, 8 byte and 9 byte) in 1994, and it took something like 5 years for the first one to be cracked (via brute force), 10 years for the second, and they project that cracking the third will take 200 more years. More information available here: RC5

Algorithm Components

A symmetric key algorithm is any deterministic algorithm where given a key, has the ability to obfuscate (hide / scramble) data, and then later given the same key, has the ability to undo the operations that it did to get the original data back.

Since all operations have to be reversible, that limits you to non destructive operations.  XOR isn’t destructive, because A XOR B XOR B = A.  Addition and subtraction isn’t destructive, because A + B – B = A (even true when you wrap around the max size of your integer).  Division is destructive however, because when you divide on a computer, you have finite precision (even with floating point numbers) which means you can never fully recover the origional data when trying to undo a division with a multiplication.  Bit rotation is another operation that isn’t destructive.  NOT isn’t destructive, but AND and OR are destructive.  Another operation that isn’t destructive is moving bytes around, since you could just do the moves again in reverse order to get the original data back.

As simple as all this sounds, these are essentially the building blocks of all encryption algorithms.

Example Algorithm

Here’s an example algorithm that you could use to encrypt and unencrypt data.  I don’t do any byte swapping (moving bytes around), or bit rotation, but those would be some good ways to improve it.

//Takes a pointer and length so you can encrypt binary data as well as text
//the pOutData parameter should point to memory that is the same size as pData
//If bEncrypt is true, it will encrypt data. If bEncrypt is false, it will decrypt data.
void EncryptData(const unsigned char *pData, int nDataLength, unsigned char *pOutData, const unsigned char *pKey, int nKeyLength, bool bEncrypt)
{
  int nKeyIndex = 0;
  unsigned char nRunningSum = 0;
  for(int nDataIndex = 0; nDataIndex < nDataLength; ++nDataIndex)
  {
    //update our running sum
    nRunningSum += pKey[nKeyIndex % nKeyLength];

    //get our current byte of plaintext or ciphertext
    unsigned char nDataByte = pData[nDataIndex];

    //to decrypt, it subtracts a running sum of the key then xors against the current key byte
    if(!bEncrypt)
    {
      nDataByte -= nRunningSum;
    }

    //do our xor, whether we are encrypting or decrypting
    nDataByte = nDataByte ^ pKey[nKeyIndex % nKeyLength];

    //to encrypt, it xors against the current key byte and then adds a running sum of the key
    if(bEncrypt)
    {
      nDataByte += nRunningSum;
    }

    //set the output data byte
    pOutData[nDataIndex] = nDataByte;

    //move to the next byte in the key
    nKeyIndex++;
  }  
}

Also, here’s some example code of how to use this function:

void DemoEncryption()
{
  //our key and plain text
  const char *pKey = "MyKeyIsFairlyLongButThatIsJustFine!124351 seven";
  const char *pPlainText = "This is some plaintext, how do you do?";

  //allocate space for our cipher text and recovered plain text
  unsigned char *pCipherText = new unsigned char[strlen(pPlainText)];
  unsigned char *pRecoveredPlainText = new unsigned char [strlen(pPlainText)+1];

  //print out our plain text
  printf("%snn",pPlainText);

  //encrypt the plain text
  EncryptData((unsigned char *)pPlainText,strlen(pPlainText),pCipherText,(unsigned char *)pKey,strlen(pKey),true);

  //print out the cipher text as hex digits
  for(int nIndex = 0; nIndex < strlen(pPlainText); ++nIndex)
  {
    printf("%0.2x",pCipherText[nIndex]);
  }
  printf("nn");
  
  //decrypt the cipher text to recover the plain text
  EncryptData(pCipherText,strlen(pPlainText),pRecoveredPlainText,(unsigned char *)pKey,strlen(pKey),false);

  //print out the recovered plain text after we null terminate it
  pRecoveredPlainText[strlen(pPlainText)]=0;
  printf("%snn",pRecoveredPlainText);

  //free the memory we allocated
  delete[] pCipherText;
  delete[] pRecoveredPlainText;
}

Common Algorithms

Some commonly used symmetric key encryption algorithms in use today are AES, Blowfish and 3DES.

Until Next Time!

That’s it for symmetric key algorithms, next up I’ll be talking about asymmetric key algorithms, which have some pretty interesting uses.