B.A.M. Neural Networks

Neural networks (officially “Artificial Neural Networks”) are computer simulations of neurons.  Simulating neurons in software allows programs to do things that you would normally need a human brain to do, such as recognizing patterns, learning over time, or making non-obvious decisions based on complex data.

Simulating neurons is not enough to create human levels of intelligence however.  Last I heard, someone could make toddler level intelligence via neural networks, but even that is somewhat misleading since toddlers can walk, use vocal chords, swim and understand complex emotions, while the neural network could do none of those things.

Despite the limitations of neural networks, there are quite a number of practical uses of artificial neural networks in the real world.  These uses include…

  • Helping missiles identify enemy tanks or combatants on the battlefield
  • Helping to predict stock market trends (It’s rumored that several top traders have proprietary neural networks which help them preform better)
  • OCR (turning scanned images into text documents based on the text in the image)
  • General machine vision (like, for robots or security systems)
  • Controlling complex machinery at speeds a human wouldn’t be able to keep up with
  • Facial recognition in computers
  • Diagnosing medical conditions

I was reading an article in Scientific American recently about how a girl in high school trained a neural network to recognize certain types of cancer with 99% accuracy.   She trained a neural network to analyze the results of a non-invasive cancer test which up til then had been too unreliable to use in any realistic situation.  Her neural network learned some hidden pattern in the data that we have not yet discovered or understood.

Just like in that example, you can feed a network complex data for it to look for patterns in, but unfortunately it won’t be able to explain to you what it learned, or what it looks for when trying to recognize patterns.  It can learn, but it can’t tell you what it learned.

My friend Doug often tells a funny story where this didn’t work out so well.  I’m not sure if it’s true exactly as told or not, but it definitely is plausible.  Apparently the US army for whatever reason was training a neural network to recognize tanks in the battlefield (surely for a missile or perhaps some kind of recon drone).

They fed the network hundreds or thousands of photos which either contained a tank or did not.  While they fed each picture to the neural network, they also told it “yes this photo contains a tank” or “no, this photo does not contain a tank” so that it was able to infer what it was that people were trying to teach it.

The network learned well, and with their sample photos, it was getting a very high rate of correct answers about whether a tank was there or not.

However, when they deployed this to the battle field (or perhaps, just a realistic live test run), it failed miserably and was getting near 0% accuracy.

Since the network wasn’t able to explain it’s thought process, the people involved were forced to try and deduce what the problem was and eventually they noticed a pattern….

The photoshoots apparently happened on 2 seperate days… on the first day they took pictures of tanks, and the second day they took pictures without tanks.  The unfortunate truth is that it was overcast on the first day, but clear skies and sunny on the second, which means…. the neural network learned to distinguish between a sunny day and an overcast day, not whether nor not there were tanks present!

A pretty funny story, but it shows the importance of giving good, realistic data to your network to learn from, or else you can run into silly problems like that!

Types of Neural Networks

There are a lot of different types of neural networks that have different properties and so thus are useful in different situations.

Some of the main types of flavors are…

  • Supervised learning – Just like in the tank example above, you give information to the neural network to learn, and you also tell it the information it should learn from each peice of data (ie… here’s a picture, it DOES contain a tank).
  • Unsupervised learning  – These work by finding natural groupings of input data.  You can then look at the groupings (or ask it to group more data) and can gain information about the nature of the data itself.  This is often used for data mining, having the neural network pick out interesting correlations in the data that a human might not figure out.
  • Some networks are static once they are created and are unable to learn further.
  • Other networks are able to continuously learn as they get more and more data.

Local Minima vs Global Minimum

Just like a human brain, a neural network is not infallible.  It can often think that is has found an answer, or something of interest, when in fact it hasn’t.

Similarly, we as humans can sometimes think we’ve found a solution to a problem, and then someone comes along and says “You forgot to consider this part of it!” and suddenly you realize your solution is not the right answer and it’s back to the drawing board.

A neural network can have the same issue believe it or not.  This can come from the fact that it wasn’t provided with good enough data to learn from, or, just because!  Just like humans, it can either just learn something wrong, or be incorrect about an answer.

If you think of a problem space as a graph, you can think of the lowest point on the graph being the optimal solution.  How neural networks work is by starting at some point on the graph (often chosen randomly) and then traveling downhill til it finds the bottom of a dip.

This works great if you happen to find the lowest dip in the graph (also called the global minimum), but if you find a dip that isn’t the lowest, you have effectively become trapped in a local minima, and end up with the wrong answer, or an imperfect understanding of the problem space.

minima

There are ways of dealing with this problem luckily.  One way is that if you find a minima, remember where it was at, but then choose another random point on the graph and try again to see if you find a deeper minima.  Rinse and repeat until you are reasonably satisfied with the results.

Have you ever been stuck trying to figure something out and forgotten about it (or went to sleep), only to come back to it later and find the answer.  I personally attribute that phenomenon to this “randomization” effect.  I don’t know the results of studies to this effect, but if you stop thinking about something for a while, then come back to it, you often see it from a different angle (essentially starting at a different spot on the problem space graph) and can sometimes figure out a better solution, or a deeper understanding of the problem (pun intended!)

Anyways, for normal problem spaces, they probably aren’t going to be just 2d like the above, but are perhaps 3d, 4d, or even higher dimensions.  In the end though, the neural network still is just trying to find the deepest valley it can find by essentially traveling downhill.

Now that you know the basics of neural networks, let’s get onto the implementation of one type of neural network!

Bidirectional Associative Memory (AKA BAM)

Bidrectional associative memory is perhaps the easiest useful neural network to create.  All you need is the ability to multiply vectors by other vectors, multiply vectors by matrices, and add matrices together.  If you know how to do those 3 things, you will be able to program your own neural network very quickly and easily.

In fact, I learned about this guy when I was 14 or so, and was able to implement a simple OCR system using microsoft excel (seriously!) 😛

The main point of BAM is to act as memory, where you can teach it to associate several patterns together.  This way, if you teach it to associate a pattern A with a pattern B, when you give it A again, it will spit out B.  Since it’s bidirectional, you can also give it pattern B and it will spit out pattern A in response.  You can teach it several pattern pairs to associate, and can even corrupt some of the data, and it will give you what it thinks is the best match for the data you gave it.  Just like a human brain, it sort of uses it’s best judgement and can say “umm… i think you meant pattern D but I’m not quite sure”.

Besides associating different patterns, you can also associate patterns with themselves (such as associating pattern A with pattern A and pattern B with pattern B).  If you do this, you are able to put in possibly corrupted data and it will give you what it thinks the data really is.  This way, if you have data that is noisy because it came over the radio, or because you scanned a document with a low quality scanner, it will be able to see through the noise and pick out the correct data (hopefully) in the same way a human could.  There are limits to this of course though, just like sometimes we can’t make out messy handwriting sometimes.

While BAM is useful, even in some realistic uses of neural networks, it also is a little bit limited compared to more sophisticated neural network implementations.

  • BAM is fairly limited in how many patterns you can teach it
  • You have to teach it in advance via supervised learning.  No further learning happens after it’s created.
  • It’s fairly strict in it’s mapping from input to output.  This means if you use it to recognize written or typed letters, it will be thrown off by variations in handwriting or different fonts.

That being said, it’s still pretty cool, and lots of fun to play with.

Creating a BAM Network

Creating a BAM network is pretty straight forward.  It has M input bits (you decide how many that is) and N output bits (again, you decide how many that is).

Once you have all of your input / output data pairs, the first step is to convert all the zeros in your pattern pairs to -1’s.   Where 0’s and 1’s is called binary, this form of -1’s and 1’s is called “unipolar”.

Then, for each pattern you multiply the input pattern vector, by the transpose of the output pattern vector (turn it on it’s side) so that when you multiply them together, you get a matrix that is MxN in size.

Continue this for each data pair so that you end up with one matrix per data pair.

Then, add all the matrices together so that you end up with a final matrix.  This is your trained neural network!

In the BAM neural network, the neural topology is that there are M input neurons and N output neurons, with no neurons in between.    The more neurons you have in your network, the more data the neural network is able to store, and the more distinctions between different types of data it’s able to make.  The fact that there are only 2 layers of neurons in BAM is part of the reason for it’s limitations.

For more information about why 2 layers of neurons are limited in their learning, I recommend searching for information on the perceptron xor problem and linear separability.  A 2 layer network is inherently incapable of preforming (and perhaps “understanding”) xor!

Using a BAM Network

To use a neural network, you take an input vector (in binary) of size M and multiply it by the matrix. This will result in a vector of size N that it made up numbers which may be positive, negative, or zero. Convert all positive numbers to 1 and all negative numbers to 0, and you’ll end up with the N sized output pattern that the neural network associated with.

Dealing with zeros is sort of up to your own discretion unfortunately. I’ve seen some people say that zeros should be treated as 1’s, other people say that zeros should be treated as 0’s, and other people have other rules such as “if it’s zero, set it to whatever that bit output the last time you had it output something” which IMO is a pretty odd way to deal with it.  I think this is an unfortunate flaw in how the BAM network works, but you can also chalk this up to the network being uncertain of the result, which it basically is.

Since BAM networks are bidirectional, you can also take the transpose of your matrix (turn it on it’s side) and then multiply it by a vector sized N to get a vector of size M as output, which is the vector associated with your N sized input vector.  So, it works both ways; you can put in an input pattern and get an output pattern out, or you can put in an output pattern and get an input pattern back.

Don’t forget… you can also associate patterns with themselves if you want it to do “pattern recognition” instead of “pattern association”.

Example

Let’s say we want to have an input size of 6 bits and an output size of 4 bits and that we have these 2 data pairs that we want to associate in the neural network:

  1. 101011 <-> 0010
  2. 110010 <-> 0101

The first step is to convert all zeros to -1’s.  Doing so our data pairs become this:

  1. 1 -1 1 -1 1 1 <-> -1 -1 1 -1
  2. 1 1 -1 -1 1 -1 <-> -1 1 -1 1

The next step is to multiply the input patterns with the output patterns to make a 6×3 matrix for each pattern pair.

First Pair:
1 * [-1 -1 1 -1]
-1
1
-1
1
1

=

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

Second Pair:
1 * [-1 1 -1 1]
1
-1
-1
1
-1

=

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

Next up, you add all the matrices together to get the final trained neural network:

-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
1 1 -1 1
-1 -1 1 -1
-1 -1 1 -1

+

-1 1 -1 1
-1 1 -1 1
1 -1 1 -1
1 -1 1 -1
-1 1 -1 1
1 -1 1 -1

=

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

Now that we have our trained neural network, let’s plug in our first input pattern to make sure we get our first output pattern

-2 0 0 0
0 2 -2 2
0 -2 2 -2
2 0 0 0
-2 0 0 0
0 -2 2 -2

*

1
0
1
0
1
1

=

-4 -4 4 -4

When we convert negatives to zeros, and positives to ones we get:

0 0 1 0

Which is the first output pattern.  It recalled our pattern correctly!

Next, let’s put the second output pattern into the transposed matrix to see if we can go the opposite direction and recall the second input pattern.

-2 0 0 2 -2 0
0 2 -2 0 0 -2
0 -2 2 0 0 2
0 2 -2 0 0 2

*

0
1
0
1

=

0 4 -4 0 0 0

Converting that to binary by turning negatives into zeros, and non negatives into ones, and zeros into question marks we get:

? 1 0 ? ? ?

The pattern it was supposed to recall is:

1 1 0 0 1 0

The two bits that it did recall are correct, but as you can see it only recalled 2 of the 6 bits. Not very good!

With just two patterns, the network was unable to recall some of the info it was trained with.

Normally BAM isn’t this bad, it looks like I just chose some unfortunate input / output pairs. If you encounter problems with a network recalling data, sometimes adding more neurons (larger input or output patterns) can help, but sometimes that will be ineffective too. Like i mentioned earlier, a neural network that has only 2 neuron layers – like BAM does – is incapable of learning XOR, no matter how many input or output neurons you have, so these types of networks are somewhat limited.

Using for OCR

If you wanted to use a BAM network for being able to recognize drawn or written letters, one way to do so would be to say “we are going to store our letters in an 8×8 black and white grid”.

That means that you have a grid of binary (black / white) that is 8×8.  Another way to represent the grid of binary would be just to have 64 bits in a row.

So, for the letters you want to train the network to recognize, you would just draw out your letters in an 8×8 grid, take each letter as it’s 64 bits, and associate each letter with itself.

Your network matrix will be 64×64 but will be able to do simple OCR on 8×8 black and white images.

Often times, images will come to you in color, or not in 8×8 resolution, but what neural network engineers often do in this situation is they will process the images in advance to make them black and white, and 8×8, before feeding them into the neural network.

Now, you are able to feed characters into your neural network and it will attempt to correct any corruption in the image, and return to you an 8×8 image of what it think you entered.

Instead of associating a letter’s image with itself, you can also associate it with a number (say, the ascii code?) so that when you put in the image of a character, it will spit out the number corresponding to the closest match it can find instead of the raw character image itself.

That’s All!

BAM is a nice introductory neural network that nearly anyone can implement.  It may be limited in some ways, but it actually is used in the real world for some applications.

In the future I’ll write about some more advanced neural networks, but until then, I hope you found this informative, or at least interesting! (:


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s