Encryption 101: Realistic Security

This is the fifth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Realistic Security

“Everyone has a plan ’till they get punched in the mouth.” — Mike Tyson

Cryptography is really awesome, and as a friend of mine said today  (BOOOOOORIIIIIIIS your grandma is calling you!) , there’s a certain mathematical purity to it that’s really appealing.

However, in most security systems, cryptography is not the bottleneck. There’s often way easier things to attack and often you just need to defeat the weakest link in the chain to break open the whole thing.

A popular and successful method of attacking secure systems is something called Social Engineering which you see a lot of in movies like “mission impossible” and “sneakers”.

Social engineering is when you chat up the receptionist and get her to give you info she really ought not to give out, or when you call a company claiming to be maintenance and asking for the door code to get in after hours. Often much easier than trying to factor gigantic primes or the like 😛

Beyond social engineering, there is also physical security to watch out for. I attended DEF CON in vegas for a few years with my good buddy LagGod and learned some really interesting things.  DEF CON has gotten pretty packed in recent years but i highly recommend going if you are at all interested in security. Lots of really talented people on both sides of the fence (attackers aka black hats, and defenders aka white hats) and even some feds and random technophiles thrown in. Here’s two really memorable security lessons I learned at those conferences that really put security into perspective for me.

Hacking Into a Wifi Network the Easy Way

Note: this no longer works as advertised, thanks to advancements in wifi security technology, but the principles are still interesting and could work in other situations you may find yourself in.  Also, it’s good to know weakness of systems past and present to better protect other systems.  Otherwise, only the criminals have guns and we are all screwed 😛

Ok so lets say that you want to hack into a company’s network, and lets say that they have a wireless router where when you first try to access it, you are presented with a web browser login screen to type in your username and password.

How wifi networks used to work is that if you were trying to connect to a wifi network, it would pick the router that had the strongest signal that was broadcasting the network id that you wanted to connect to.

What this means is that you as a hacker could drive into the parking lot of a company and broadcast their network id with a really strong signal.  Then, when people tried to use their network, the traffic would be directed to your machine.

If you saved the html of their login page before turning on your fake network, you would be able to present a web page to the people hitting your network that looked exactly like the login page they were used to seeing, except that you could take all those usernames and passwords they entered, and log them to a text file!

After you’ve harvested a few logins, you turn off your network and then log into theirs. Thanks for the logins d00ds!

I’m not sure how they solved this problem, but you could probably do something with public key encryption to make sure that everyone who is broadcasting a network id is actually legitimately part of that wireless network.

Defeating Biometrics

Again this is somewhat dated info, but it’s still pretty interesting, and possibly useful for other situations.

It used to be that finger print scanners were a lot simpler (some cheap ones might still be). It used to be that if you mashed a gummy bear onto a finger print scanner, that the scanner would pick up the oily fingerprint of the last person that used it, which surely is a valid user, and so, the door would open, the laptop would unlock, or whatever else.

They fixed that problem by having it detect heat, listen for a heartbeat, and probably lots of other secret or publicized ways, but it used to work pretty regularly!

Something else to say about biometrics is that despite the complexity of the actions they preform, I’ve been told that often times there is just a single wire going into them, and a single wire going out of them. For all those fancy actions, all the thing does in the end is complete a circuit of two wires. If you really need to get in somewhere, you are likely able to smash open the box and connect the wires, circumventing the “infallible” biometrics reader.

Final Notes on Security

Here are some final words on security.

  •  There is no such thing as perfect security, there is only good enough security.  The only way to get perfect security is to lock your computer in a safe and drop it into the Marianas Trench (although I hear you have to watch out for James Cameron these days).
  • Good enough security often means just making sure you aren’t the low hanging fruit.  If you are more difficult to attack than your peers, you are safer than they are.  if you and someone else are running from a lion, you don’t need to outrun the lion, you just need to ourtun the other guy!
  • If your security is based on the fact that your algorithm is secret, that is called “Security Through Obscurity” and is really weak security.  You should assume your attacker knows the details of everything for better security.  Also, secret algorithms don’t get peer reviewed, so weak techniques don’t get weeded out.  Don’t forget that people STILL haven’t cracked the 72 bit RC5 message.  A single message with a 9 byte key, published in the mid 90s, attacked by distributed computer networks, and still, it hasn’t been cracked despite the algorithm being publicly available.  That is some good security right there.

I went to a talk at either DEF CON, or San Diego’s Toorcon (sorry, can’t remember which) where the author Bruce Schneier (who is mentioned in the disclaimer / header of these articles) gave a talk after he had just published a book as a sequel to Applied Cryptography.  He said something like “Throw away the other book… physical security is the only thing you really need to be worried about.”

BTW Bruce, if you are reading this, thanks for that first book anyways man, you rock (:

… and let me know if i misquoted you 😛

Thanks for reading!  Now go forth and cryptophy.  HACK THE PLANET!

Cryptography 101: Encryption – Asymmetric Keys

This is the fourth article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Asymmetric Key Encryption (Public and Private Keys)

Unlike symmetric key encryption, which uses the same key for encryption and decryption, Asymmetric key encryption uses one key for encryption and a different key for decryption.

This probably sounds strange why you would want to have two passwords, but the reason is that you keep one for yourself, and give the other one out to another individual or a group.

Because you keep one to yourself (private) and give the other out (public) these are called public and private keys, and this technique is called Public Key Cryptography.

Depending on which key you keep private (the encryption or decryption key), you can get different effects.

Usage Pattern 1 – Private Encryption, Public Decryption

If you keep the encryption key secret, but publish the decryption key out to the public (or to a group of people, or to another individual), what that means is that you can encrypt data which can be read by anyone. What is useful about this is that they have to use your public key to decrypt the data, so they know it was encrypted with your private key, which means they can be reasonably sure that you were the one that wrote the message. You have effectively cryptographically signed your message so that people know it was in fact you that sent that message.

People use this technique all the time in computers, this is how you can verify that something is from a legitamate source, regardless of if we are talking about a web page (HTTPS), a valid device driver (digitally signed device drivers), or other things of that nature.

Another neat thing about this usage pattern is that getting creative, you can also be ensured that the message or data hasn’t been tampered with.

For instance, let’s say you were making a computer operating system where you only allowed the computer to run trusted (signed executables).

Re-visiting a technique mentioned in the first article in this series on hashing, a “signed executable” might look like the below:

  • [Cryptographic Hash of Unencrypted Executable Data]
  • [Encrypted Executable Data]

So, you as the “central signing authority” for the operating system would receive programs from people wanting to release software on your operating system.

First, you would put the software through it’s paces via analysis and testing to make sure  the program worked as intended, was up to the level of quality you wanted software on your OS to be, followed any specific rules about how the software should behave and interact with the rest of the operating system, and also you would make sure the software wasn’t malicious.  Also, you would have to make sure the software wasn’t insecure in any ways that could compromise the rest of your security (for instance, if it had a Buffer Overflow, that could let attackers run arbitrary, unsigned code on your operating system, causing viruses to spread and other malicious things).

Once the program is verified safe, next up you would make the hash of the unencrypted program, write that to a file, then  encrypt the program with your private key and write that to the file after the hash.

You now have a trusted / signed executable to distribute.

When a user downloads this executable from your application store and tries to run it, the operating system could take the following measures to verify that the executable was trusted and unaltered from the time of it’s signing:

  1. Unencrypt the executable using the public key.
  2. Hash the unencrypted data and ensure that it matches the hash at the beginning of the file.

If the hashes match, you know that the executable was indeed signed by the central authority, and that it has not been altered in any way since it’s signing. Therefore, it is safe to run!

I am pretty sure variations of this sort of algorithm are used by things such as the xbox, playstation and iphone / ipad devices.

Usage Pattern 2 – Public Encryption, Private Decryption

The other way to use asymmetric key encryption is to publicize the encryption key, but keep the decryption key private.

What this allows is for anyone to encrypt a message that only you can read.

One thing you could do with this is would be to be able to communicate securely with people if all you had was public communication.

For instance you could post to a public forum saying “This message is for Jesse”, and then put the encrypted data after that.

Since only Jesse knows his private key, and thus only Jesse can decrypt the data, only Jesse will be able to read your message, even though it is visible to everyone.

Despite this, there are still several unknowns in this particular communication, including:

  1. Jesse doesn’t know that you really are who you say you are
  2. You don’t know that Jesse got the message
  3. Jesse doesn’t really know that the message wasn’t tampered with (well… if it’s a text message you are sending, and jesse unencrypts it and it’s garbage, he knows that the message was tampered with, but if the expected data was not so obvious when it was wrong, he may not be able to know that the message hadnt been tampered with).

But those problems, and others, are solvable, which leads to our next point…

Cryptographic Protocols

A neat thing about cryptographic techniques like this one, symmetric key cryptography, and hashing is that they are basically just building blocks that you can stack together in different ways to be able to do useful and interesting things.

Once you learn some of the basic building blocks of cryptography (what this cryptography 101 series of articles is supposed to be all about), you can then learn more about how to put those building blocks together to preform useful tasks.  The recipes for preforming these useful tasks are called Cryptographic Protocols and they can (and often should) contain more than just cryptographic techniques.

In the first usage pattern, I showed how combining asymmetric key encryption with hashing can provide you with a system for creating and verifying trusted executables.  That series of steps for creating and using trusted executables was a cryptographic protocol that contained important steps even beyond just encryption and hashing – such as verifying that the executable was not malicious or insecure.  Leaving those steps out creates a big security hole, so they are very important to the overall protocol.

For the second usage pattern, here’s some cryptographic protocols to solve the problems i called out:

  1. To solve the issue of Jesse not being sure that you are who you say you are, you could take the encrypted message you created, and sign it with your own private key (of which the decryption key is public… this is usage pattern 1).  This way, when Jesse gets the encrypted message from you, he first unencrypts it with your public key, and then unencrypts it with his own private key.  If the message comes out as garbage in the end, he knows that one of the two steps failed.  Specifically, either it wasn’t YOU who sent the message, OR, you used the wrong public key when signing a message to send to him.  Jesse doesn’t know which step went wrong, but he does know the message is invalid one way or another.
  2. To solve the problem of you not knowing that Jesse got the message, you could tell Jesse in the encrypted message “Jesse, if you get this message, respond by sending me back an encrypted message that says ‘the password is forty two'”.   Then, if Jesse got the message, he could encrypt a message saying “the password is forty two” using your public key, and then post it on the board again for you to unencrypt with your private key and see that he got receipt of your message.  While it’s true that anyone is able to encrypt messages meant for you, and so anyone could have written that message, there is some level of security there because the specific message you said to send was encrypted in such a way that only Jesse could have read it.  This way, you can be reasonably sure that Jesse got the note.
  3. To solve the issue of Jesse not knowing if the message was tampered with at all or not (in the case that it’s hard to tell if you got the right data out or not), one way would be to just put a hash of the unencrypted data on the front of the message.  You’d have to agree with Jesse in advance on the protocol, but using the hash again, it would let Jesse know that the data hadn’t been tampered with.

Generation of Key Pairs and Algorithm

By the very nature that these keys work in tandem means that they are somehow linked together mathematically.

I was trying to think of a really simple way to show how public and private keys work together and how they are linked, with a minimal piece of sample code. I thought i had figured out a simplified way, but unfortunately it turned out I was mistaken and my method didn’t work at all.

So, I have to refer you to this page which is pretty darn helpful for understanding how the real thing works with RSA, but unfortunately it doesn’t explain the full nitty gritty of WHY it works to my liking.  Still a very good read though: http://sergematovic.tripod.com/rsa1.html

Common Algorithms

Some commonly used Public Key Encryption algorithms are SSH, IKE and apparently even Bitcoins use it!.

Oops!

After I wrote up this article, my friend Patrick corrected me saying that the process i described is not the usual process for digitally signing data. He said:

You got signing a little mixed up for asymmetric. Traditionally the process is:
1. Alice creates a public and private key pair.
2. Alice shares her public key with the world.
3. Alice never shares her private key.
4. Bob can now encrypt messages using Alice’s public key and only Alice can unencrypt them using her private key.
5. Alice can take a hash of something she wants people to verify as coming from her. Alice then signs that hash with her private key. Now Bob can verify the item coming from Alice by taking the hash of the data and comparing it against the hash in the signature using Alice’s public key.

Some additional reference:
RSA Labs Digital Signing Explanation

There are two reasons that I can think of why that process is better that the one I described:

  1. You can sign data without obfuscating it via encryption.
  2. Public key encryption takes a lot of processing power apparently, so you want to minimize how much data you encrypt with it.  This method encrypts a far smaller (and constant) amount of data.

Thanks for the correction Patrick!

Cryptography 101: Encryption – Symmetric Keys

This is the third article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Symmetric Key Encryption

Symmetric key encryption is a fancy name for the type of encryption you are probably most familiar with, which is using a password to scramble and unscramble data to make sure only certain people can see it.

This is in contrast to asymmetric key encryption, where you have two passwords; one for encrypting and one for decrypting (The next article is going to be on asymmetric key encryption).

Security

There are numerous symmetric key encryption algorithms out there but they all have one thing in common: their security relies on only the right people having the password, and the assumption that the best way attackers have for getting the plaintext from the ciphertext is to guess the password via brute force.

In good (modern) algorithms, people say things like “on average it will take geological or astronomical amounts of time to guess a password with the computing technology of today and the projected future” so they are reasonably sure people won’t be able to brute force the password in any useful amount of time.

Quantum computers give some forms of cryptography a scare though, because there is something called Simon’s Algorithm which is a quantum computing algorithm that can brute force search ANYTHING with exponentially fewer operations than classical computing.  This means it can brute force guess passwords of an encryption algorithm a lot faster than a normal computer.  At the time of writing this, I think the record for quantum computing power is something like having 4 cubits work together to do some simple math operation (like multiplication).  We could be on the precipice of disaster regarding cryptography, but luckily there are encryption algorithms that take the same amount of time, or longer, for quantum computing to solve, so it isn’t all doom and gloom.

When decrypting data with either symetric or asymetric key encryption, there is no built in way to know if you had the right password or not. You can know by looking at the recovered plaintext and seeing if you got junk out, or meaningful data, but if you don’t know what the data out is supposed to be exactly, or what it’s supposed to look like, there’s no way to know if decrypted it correctly. This makes it so sometimes it can be difficult for attackers to even KNOW if they have guessed the right password or not, which is good for us folk trying to protect data.

Just like a good hashing algorithm, small changes in input should ideally yield large changes in output, which makes it a Chaotic Function and makes it so the cipher text gives as little information about the plaintext as possible.

Sometimes people will use multiple encryption algorithms on a piece of data in the hopes of making it harder to crack, which sometimes works, but can also be fairly dangerous.

To understand the danger, consider how every program, no matter how complex, is essentially a traditional algebraic function (with perhaps lots and lots and lots of terms).  For encryption, the input is the plain text and key, and the output is the cipher text.

Now, just like in junior high and high school, sometimes when you plug one function into another like f(g(x)) and preform algebraic substitution, terms from f and g maybe cancel out.  You may end up with a function that is less complex than either f(x) or g(x), or it just may be less complex for certain values of x.  An attacker could exploit these attacks to their advantage and it might be easier for them to recover some or all of the plaintext because you used two encryption algorithms instead of one.

On the other hand, using multiple algorithms, or the same algorithm multiple times (perhaps with different keys) can also make it a lot more secure.  It’s just something to be mindful of.

Clever programmers and mathematicians sometime come up with encryption techniques where attacking the algorithm itself is the literal equivalent of having to solve famous unsolved math problems from the ages.  These often seem really secure because for some of these problems, the best and brightest minds in all of history have been fighting with the problems for hundreds or thousands of years and making no progress.

Every now and then, some smarty figures one of these out though, and suddenly, encryption algorithms based on it become essentially worthless.

Another common way that people attack ciphertext is via something called a “known plaintext attack”.  What this means is that if the attacker knows any part of the plaintext before it became ciphertext, they can sometimes leverage that knowledge to know a bit more about the key or algorithm used to encrypt the data.  That simplifies their work and makes it more likely that they can get the plaintext back without having to revert to brute force.

One really common way this comes up is if people do something like compress their data before encrypting, or they encrypt known file types like executables, word processing documents, image files etc.

The reason for this is because in all of those file types, there is a standard, well known header that those files have, which allow other programs to use them.  That header data is known plaintext and can be used by an attacker to get more information how to recover the plaintext.

For all the clever people out there trying to make encryption based on super advanced mathematics, in the end, some of the very most secure algorithms out there are based on very simple computing operations such as addition, subtraction, bit rotation, and XOR.

As an example, there is an algorithm called RC5 which only uses those basic operations (you can find the source code for it easily!) and yet is extremely secure. The makers of RC5 published their source code, and encrypted some data with various key sizes (7 byte, 8 byte and 9 byte) in 1994, and it took something like 5 years for the first one to be cracked (via brute force), 10 years for the second, and they project that cracking the third will take 200 more years. More information available here: RC5

Algorithm Components

A symmetric key algorithm is any deterministic algorithm where given a key, has the ability to obfuscate (hide / scramble) data, and then later given the same key, has the ability to undo the operations that it did to get the original data back.

Since all operations have to be reversible, that limits you to non destructive operations.  XOR isn’t destructive, because A XOR B XOR B = A.  Addition and subtraction isn’t destructive, because A + B – B = A (even true when you wrap around the max size of your integer).  Division is destructive however, because when you divide on a computer, you have finite precision (even with floating point numbers) which means you can never fully recover the origional data when trying to undo a division with a multiplication.  Bit rotation is another operation that isn’t destructive.  NOT isn’t destructive, but AND and OR are destructive.  Another operation that isn’t destructive is moving bytes around, since you could just do the moves again in reverse order to get the original data back.

As simple as all this sounds, these are essentially the building blocks of all encryption algorithms.

Example Algorithm

Here’s an example algorithm that you could use to encrypt and unencrypt data.  I don’t do any byte swapping (moving bytes around), or bit rotation, but those would be some good ways to improve it.

//Takes a pointer and length so you can encrypt binary data as well as text
//the pOutData parameter should point to memory that is the same size as pData
//If bEncrypt is true, it will encrypt data. If bEncrypt is false, it will decrypt data.
void EncryptData(const unsigned char *pData, int nDataLength, unsigned char *pOutData, const unsigned char *pKey, int nKeyLength, bool bEncrypt)
{
  int nKeyIndex = 0;
  unsigned char nRunningSum = 0;
  for(int nDataIndex = 0; nDataIndex < nDataLength; ++nDataIndex)
  {
    //update our running sum
    nRunningSum += pKey[nKeyIndex % nKeyLength];

    //get our current byte of plaintext or ciphertext
    unsigned char nDataByte = pData[nDataIndex];

    //to decrypt, it subtracts a running sum of the key then xors against the current key byte
    if(!bEncrypt)
    {
      nDataByte -= nRunningSum;
    }

    //do our xor, whether we are encrypting or decrypting
    nDataByte = nDataByte ^ pKey[nKeyIndex % nKeyLength];

    //to encrypt, it xors against the current key byte and then adds a running sum of the key
    if(bEncrypt)
    {
      nDataByte += nRunningSum;
    }

    //set the output data byte
    pOutData[nDataIndex] = nDataByte;

    //move to the next byte in the key
    nKeyIndex++;
  }  
}

Also, here’s some example code of how to use this function:

void DemoEncryption()
{
  //our key and plain text
  const char *pKey = "MyKeyIsFairlyLongButThatIsJustFine!124351 seven";
  const char *pPlainText = "This is some plaintext, how do you do?";

  //allocate space for our cipher text and recovered plain text
  unsigned char *pCipherText = new unsigned char[strlen(pPlainText)];
  unsigned char *pRecoveredPlainText = new unsigned char [strlen(pPlainText)+1];

  //print out our plain text
  printf("%snn",pPlainText);

  //encrypt the plain text
  EncryptData((unsigned char *)pPlainText,strlen(pPlainText),pCipherText,(unsigned char *)pKey,strlen(pKey),true);

  //print out the cipher text as hex digits
  for(int nIndex = 0; nIndex < strlen(pPlainText); ++nIndex)
  {
    printf("%0.2x",pCipherText[nIndex]);
  }
  printf("nn");
  
  //decrypt the cipher text to recover the plain text
  EncryptData(pCipherText,strlen(pPlainText),pRecoveredPlainText,(unsigned char *)pKey,strlen(pKey),false);

  //print out the recovered plain text after we null terminate it
  pRecoveredPlainText[strlen(pPlainText)]=0;
  printf("%snn",pRecoveredPlainText);

  //free the memory we allocated
  delete[] pCipherText;
  delete[] pRecoveredPlainText;
}

Common Algorithms

Some commonly used symmetric key encryption algorithms in use today are AES, Blowfish and 3DES.

Until Next Time!

That’s it for symmetric key algorithms, next up I’ll be talking about asymmetric key algorithms, which have some pretty interesting uses.

Cryptography 101: Encryption – One Time Pad

This is the second article in a series on the basics of cryptography:

DISCLAIMER: These articles are meant for educational purposes only. The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications. For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved. Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Plaintext, Ciphertext and Keys

When talking about encryption, you’ll often hear two terms: Plaintext and Ciphertext.

The plaintext is the unencrypted data and may be either text or binary data.

Ciphertext is the encrypted data.

Ideally, the ciphertext will give no information about the nature of the plaintext that created it, other than perhaps the size of the plaintext itself.  Good ciphertext will look indistinguishable from random numbers, both by the human eye and mathematically.

This is because the point of encryption is to hide any patterns in the data, and good encryption will hide all discernible patterns. The only possible exception to this would be if the encryption process made ciphertext with misleading patterns that didn’t give any information about the plaintext.  I’m not sure if this comes up in practice, but it definitely could.

Another term you’ll hear often is “Keys”. A key is just the data that you encrypt or unencrypt data with. You can think of it as the password.

The One Time Pad

The one time pad is an extremely simple, yet secure way of encrypting data.

It is so simple that it only uses the xor operation, and is so secure that the ciphertext is literally uncrackable if done correctly.

The downside is that it requires a lot of pre-shared data which gets used up as you encrypt data. When you run out, you have to share more of this data if you want to keep communicating with that person.

This pre-shared data is the key used for encryption and unencryption.

Implementing

To use a one time pad, you first gather a large amount of random data and share that with the person you want to communicate securely with. This is the one time pad itself and you’ll want one byte of random data for each byte of information you want to send to that person.  This step is also the crux of the security.  You need to make sure that nobody else is able to get the one time pad except your intended target, and you also need to ensure that you have high quality random data (more on that later on).

To encrypt data, you take one byte from the one time pad for each byte of data you want to encrypt and XOR them together. When you are done, you throw away the used bytes of the one time pad and never use them again.

Then, you send the ciphertext to the person you already pre-shared the one time pad data with.

To decrypt the data, that person xors each byte of the encrypted data with a byte of the one time pad, and they also throw away each byte used of the one time pad just like you did.

When they are done decrypting, they will have the plaintext data, and their one time pad will be in the same state that yours is in (ie their next number will be your next number).

From here you can rinse and repeat until you run out of one time pad data.

Super simple, and as long as nobody else has your one time pad data, and your one time pad data is truly random, nobody will be able to crack your ciphertext and get the plaintext.

The Importance of Randomness

Besides securely transmitting the random data,the other crux of the security i mentioned was the quality of the random numbers in your one time pad.

The reason this is important is because if the numbers aren’t truly random, there will be patterns in the data. If there are patterns in the data, people can possibly discover those patterns, thus being able to separate the plaintext from the key and unencrypting some or all of your data.

Randomness comes up EVERYWHERE in cryptography, both in input and output to cryptographic algorithms. Because of this, truly random data is often somewhat of a commodity to cryptographers. Since re-using random data means that its slightly less secure (would be attackers have a pattern to gain knowledge with if you re-use your random numbers!), it’s also a consumable commodity!

In fact, there are famous books that are nothing but hundreds and hundreds of pages of random numbers generated from various real world sources – such as taking the wind speed over time in Juneau, Alaska and multiplying it by static gathered from a radio antenna which is tuned to dead air. Using real world data like that, people can be relatively sure that the data doesn’t have any discernible patterns. They just have to watch out for those pesky physicists unlocking the nature of the universe and finding the patterns in the background radiation 😛

I’m not even joking about these books by the way, check this out, here’s one such book!
A Million Random Digits with 100,000 Normal Deviates

Using random numbers from a published book makes your random numbers slightly less random (since other people have the book too, and attackers may notice it on your bookshelf or something), but so long as you don’t just use the numbers of the first or last pages (or anything else predictable), and the book actually contains high quality random numbers, it ought to be fine.

you can also BUY large amounts of high quality random data online from places like random.org.

The astute reader might ask “Why don’t i just use a pseudo random number generator on each side and never run out of one time pad data?”.

Well, if someone knows the PRNG you are using, and your seed, they would be able to unencrypt your data just like your intended target can.

HOWEVER, this kind of setup can be appropriate sometimes if you know the risks and are ok with them. Check out this wikipedia page for more information:
Cryptographically Secure Pseudorandom Number Generator

Specific Attack Against Randomness

As an extreme example, lets say that instead of random numbers, your one time pad data is all 0xFFFFFFFF and that you are using it to encrypt a text file (say, this article for instance).

When you encrypted your data by XORing each byte against 255 (0xFF), all the bits of each byte would be flipped from 0 to 1 or 1 to 0.

While it’s true that it would make the data un readable, and seemingly random, garbage data to the human eye, mathematically it’s a very different story.

If someone were analyzing your ciphertext, they would first notice that the byte value 154 (which looks like Ü and has a binary value of 10011010) occurs in the ciphertext roughly the same amount that the letter ‘e’ appears in the typical english language text document. This would be astute because that value of 154 is just the flipped bits of ‘e’ which has a byte value of 105 and a binary value of 1100101 (the binary bits are just flipped due to the XOR against 0xFF).

Then, they may notice the same for other letters… that some other value occurs as often as you’d expect an ‘o’ to appear in english, or an ‘m’ etc.

Pretty soon they have a clear picture that this is english plaintext, and they can start replacing letters with what they seem like they should be statistically (for the statistically significant letters).

After that, they have some of your plain text, and figuring out the rest is similar to playing sudoku… figuring out which letters fit where, based on how words are spelled, and then doing a find / replace in the entire document for each letter you figure out.

In the end, they have your plaintext and your encryption failed you.

This is an extreme case that is really simple to break, but hopefully you can see that if you even use slightly lower quality random numbers (such as the built in rand() function of C++, whether or not you use srand(time(0)) or not!) that you open yourself up to attack and it can compromise your whole communication stream.

Requiring Less Pre-Shared Data

You can modify the one time pad algorithm to use less pre-shared data if you are ok with the changes in your security profile (your data may be weaker against some attacks, stronger against others).

There are many ways to skin a cat but I’ll just talk about a couple.

One way would be to generate more random data from the random data you do have. For instance, if you and the person you are pre-sharing data with agree on a protocol of MD5 hashing every 100 bytes of one time pad data to generate more random bytes that you can interleave with your one time pad data, you would have a way of generating 16% more one time pad data than what you gathered or shared with the other person. (16% more because MD5 hashes of 100 byte blocks spit out 16 byte hashes of seemingly random numbers – see the previous article on hashing for more information!).

However, doing this obviously makes the “random” data *somewhat* lower quality since there is a pattern to some of the random data. As non obvious as that pattern may be, if someone were to do fancy mathematical analysis of the data, this sort of technique may cause patterns to crop up which lead to a “chink in the armor” giving the attacker a foothold in recovering all or some of the plaintext.

Another way of making your one time pad go farther is instead of XORing the one time pad data against the plaintext and ciphertext to encrypt and unencrypt, you can use the one time pad to give you the keys (passwords) to encrypt / decrypt each communication.

For instance, if you and the person you are communicating with agree in advance on a symmetric key encryption algorithm (more on this topic in the next article!) that takes a 16 byte encryption key, you could use every 16 byte block of one time pad data for an entire single message no matter how large the message is.

For instance, you could encrypt 2GB of data using the first 16 bytes of a one time pad, send that to the person, then you encrypt 500MB with the next 16 bytes and send that to the person.

You’ve effectively used 32 bytes of your one time pad to encrypt 2.5GB of data, which is a crazy good ratio compared to the traditional one time pad protocol which would have required 2.5GB of pre-shared one time pad random data.

If you go this route, your ciphertext now becomes vulnerable to whatever attacks your symmetric key encryption algorithm are vulnerable to though. If the algorithm you are using turns out to have a serious flaw that mathematicians find out about (such as there’s a really easy way to recover the plaintext – this happens fairly often believe it or not!), your whole communication channel is screwed, whereas with the one time pad, it’s just the quality of your random numbers, and the security of your pre-shared data that define the security. So, there are definitely pros and cons to weigh.

Other Weaknesses

There are a lot of ways to attack each cryptographic technique, and if you are serious about cryptography you really need to read up on a lot of things and be extremely clever, thinking of every possible situation that anyone else might think of.

Security is hard because often times you have a limited amount of time to implement your security (because you need to ship your software or open your service to the public SOME DAY), and there are most certainly more attackers than there are security professionals on your team, and they have all the time in the world to search for what you’ve missed! Just as there is no rest for the wicked, the same too is true for security professionals.

I mentioned that the quality of your random numbers and the security of your pre-shared data was the lynchpin of protecting against people getting your plaintext from your cyphertext, but there is another way to attack the communication channel as well.

Namely, if someone were to intercept a message between you and your target person, they may not be able to get your plaintext out, but if they can keep that message from getting to your target, and do so in a way that you aren’t aware of this, they can completely break your communication channel.

The reason for this is that doing this makes the one time pads of you and your target person get out of sync when you throw away one time pad data that the target person did not throw away. This means that the random numbers you are using to encrypt your data is not the same numbers your target person is decrypting data with, so they will get garbage, random data as output and not be able to recover the plaintext.

A malicious person in the middle was able to thwart your ability to communicate securely!

Also, if a person was able to modify the ORDER that the target person got the encrypted messages in, they would be able to break the channel that way as well (at least temporarily) since it would make the recieving person unencrypt the messages with the wrong pieces of data. The next message the person got would be unencryptable in this case though, since the same number of bytes were used up by the out of order messages as if they had come in the right order.

This is not the traditional man in the middle attack, but it is definitely *A* man in the middle attack.

As with so many things, there are often strange, non obvious connections between different subjects.  Case in point, one way to protect against these sort of attacks of lost or re-ordered messages would be to implement the sorts of algorithms used in network programming (like those used in TCP/IP) that ensure “guaranteed” and “in order” communication between two computers or individuals.

Going this route, just like how computers on the internet can know when they got message B but haven’t received message A yet, or that when they sent a message to another person that it never got there, you too would be able to know if a message got to the target, and they would be able to know if they have received messages out of order or not.

Until Next Time!

That’s the essence of the one time pad and I hope you found it interesting!

Next Up I’ll be talking about symmetric key algorithms which are the more traditional way of encrypting where you use a password to protect data.

For those interested in cracking encrypted data (which technically is against the DMCA these days, but used to be a common academic activity, and a way of weeding out insecure algorithms), here’s a nice morsel for you. It’s hexadecimal encoded encrypted data.  Every 2 hex characters equals one byte of encrypted data. If you use the information from the article, you ought to be able to crack it (there’s an easy way and a hard way).

And no, cracking the encrypted data below is not even technically against the law, I’m giving you explicit permission to crack it if you can (:

b699df86908adf9c9e91df8d9a9e9bdf8b97968cd3df889e86df8b90df9890dedfdfb79e899adf86
908adf8b97908a98978bdf9e9d908a8bdf8b9e94969198df8a8fdf9c8d868f8b90988d9e8f9786df
8f8d90999a8c8c9690919e939386c0c0dfdfb79a8d9ad88cdf8c90929adf9c909093df8c9c969a91
9c9adfd0df929e8b97df86908a8b8a9d9adf9c979e91919a938cdf99908ddf86908a8ddf9a919590
86929a918bdedfdfb29e8b97c5978b8b8fc5d0d0888888d186908a8b8a9d9ad19c9092d08a8c9a8d
d0918a929d9a8d8f9796939adfdfaf97868c969c8cc5978b8b8fc5d0d0888888d186908a8b8a9d9a
d19c9092d08a8c9a8dd08c96878b868c86929d90938cdfdfbc979a92968c8b8d86c5978b8b8fc5d0
d0888888d186908a8b8a9d9ad19c9092d08a8c9a8dd08f9a8d96909b969c89969b9a908cd1dfdfbb
8d908fdf929adf9edf9396919adf9699df86908adf889e918bdf8b90df939a8bdf929adf94919088
df8b979e8bdf86908adf9b9a9c968f979a8d9a9bdf9286df8b9a878bdedfdfb6d892df9c8a8d9690
8a8cdf88979e8bdf929a8b97909b8cdf86908adf9a919b9a9bdf8a8fdf8a8c969198df8b90df9996
988a8d9adf968bdf908a8bdedf9e939e91d1889093999abf98929e9693d19c9092d1dfdfab979adf
92908d9adf8b9a878bdf8b979a8d9adf968cdf979a8d9ad3df8b979adf9a9e8c969a8ddf968bdf88
969393df9d9adf8b90df9b9a9c8d868f8bd3df9e8cdf8b979a8d9adf88969393df9d9adf92908d9a
df939a8b8b9a8d8cdf9e919bdf8b979a86df88969393df9d9adf9c93908c9a8ddf8b90df8b979adf
9e899a8d9e989adf9b968c8b8d969d8a8b969091df9099df8b979adf9a919893968c97df939e9198
8a9e989ad3df8c90df979a8d9ad88cdf8c90929adf92908d9adf93909198df8b9a878bdf8b90df97
9a938fdf8f9a908f939adf908a8bd1dfdfbc9091988d9e8b8a939e8b9690918cdf9e989e9691df90
91df94969c94969198df8c90929adf929e95908ddf9d8a8b8bde

Cryptography 101: Hashing

Welcome to the first article in a series aimed to teach the basics of cryptography:

In this digital age, cryptography is more important than ever.  It’s used to protect financial transactions, ensure the anonymity of political dissidents, protect private conversations, help prevent cheating in video games and many other things as well.

DISCLAIMER: These articles are meant for educational purposes only.  The methods explained here are meant only to illustrate the basic concepts of cryptography and may or may not be suitable in real world applications.  For serious applications such as financial transactions, I recommend hiring security professionals and also getting a lawyer involved.  Use this info and code at your own risk, I claim no responsibility!

If you want more in depth information about cryptography than these introductory articles provide, I highly recommend a book called Applied Cryptography by Bruce Schneier. That book literally almost didn’t get published because the NSA didn’t want the info getting out into the public. Yay for the 1st amendment!

Hashing in Computer Science

You may be familiar with the term “hashing” from computer science. In computer science, a hash function is a function which takes input data, preforms some operations on it and spits out some (usually) smaller output data that can be used more or less as a unique identifier for the input data.

Common uses of hashing include:

  • Hashing the contents of large files to be able to compare the hashes to quickly know if they are the same or different, instead of having to compare the files byte by byte.  This is especially useful when comparing files over a network connection.
  • Hashing peices of data which are difficult or time consuming to compare (such as strings) and using the hashed value as a “look up key” within a database or array or list, so that you can look up items very quickly by their hash, instead of having to do more expensive string compares (or whatever other more complex comparison and lookup methods).

More info is available here: http://en.wikipedia.org/wiki/Hash_function

Hashing in Cryptography

As you can probably guess, since hashing makes large pieces of data (such as entire files) into small pieces of data (often only a handful of bytes large), there are many pieces of larger source data that can result in the same smaller hashed data.  When this happens, it’s called a hash collision and a good hash function will do it’s best to minimize collisions for optimal performance.  The more output bits you have, the more “space” you have before a collision is unavoidable.

Often times, a good hashing algorithm will have 2 properties to minimize collisions…

  1. Small changes in input give large changes in output.  In other words, it’s very sensitive to initial conditions and so is a chaotic function (http://en.wikipedia.org/wiki/Chaos_theory)
  2. If you give it a set of well distributed random inputs, it should give a well distributed set of random outputs.  Heck, if you give it any set of (varying) inputs, it should give a well distributed set of random outputs ideally.  By random i mean no discernible patterns.

If these things aren’t true, the hashed output can give clues as to the nature of the input, or, it can make it easier to provide input that hashes to the same output (which is the main way to attack hash based security).

For instance, if your hashing algorithm made an 8 bit hash (very small!) that always (or often) set the 7th and 8th bits to 1, that means effectively you really have a 6 bit hash, because 2 of the bits are almost always the same. In general, more bits means more security, since it’s harder to get a hash collision on purpose.

Quick aside, can you think of something else with these properties?  Some deterministic algorithm that spits out chaotic, well distributed, seemingly random numbers based on (perhaps) non random input?  How about a pseudo random number generator?  There is a lot of crossover between these two types of algorithms and I find it pretty neat that they are working towards almost the same goals, but that they are used for such different things.

Also as you might guess, hashes are one way.  If you are given hashed data, it’s difficult or impossible to work backwards and get the source data back again.  In fact, you often hear hash functions referred to as “one way hash functions” because of this.  This is important because in cryptographical uses you want a hash to reveal as little information about the source data as possible.

Example Uses of Cryptographic Hashing

Here’s two examples of places where hashing comes in handy. One is for protecting passwords, and the other is for protecting save game data of video games.

Protecting Passwords

For protecting passwords, many times when you have a large online service such as facebook, youtube, etc, there will be a central database (cluster) storing everyone’s account information, including their passwords.

If someone were to hack a server and get access to the user database table, they would have everyone’s username and password and the users would be screwed.

A way that people address this is to store the HASH of a each password in the database table instead of the password itself. When people log in, the server takes the password that it received from the user, puts it through the hash algorithm, and compares it to the hash stored in the database. If the hashes match, they know (can assume with a good level of certainty) that the user is who they claim to be. However, if this server gets hacked and the database is compromised, the attacker won’t have the passwords, they will only have the hashed passwords.  The attacker will have to try and brute force the hashes, which is essentially the same as having to brute force the password – except that they can do it on their own computer in their own time of course, which makes it easier and untraceable unfortunately.  Hopefully if this happens, the service can tell their users to change their passwords before the attacker is able to crack many of the logins.

Protecting Save Game Data

For protecting save game data, hashes are used in conjunction with encryption to prevent both read and write access to the save game data.

To write this protected save game data, you first hash the unencrypted save game data, and write that to the front of the file.  Next, you encrypt the save game data and write that after the hash.

When reading save game data, you read in both the hash and the encrypted save game data.  Next you unencrypt the save game data and hash it.  Then, you can compare the hash you made with the hash stored in the file and if they don’t match, you know that someone tried to tamper with the file and you can consider it invalid / corrupt.

Also, since the save game data is encrypted, it’s difficult for a user to read the data in your save game data.  Thus you protect the file from both reading and writing.

It’s possible that the person could modify the data in such a way that it will unencrypt and then hash to the same hash value stored in the beginning of the file, but it’s extremely unlikely, and also even less unlikely that doing so will result in something favorable for the attacker.  They can’t even be sure they are increasing a value thanks to the encryption function scrambling the data completely.

Hashing Algorithm Overview

In a nutshell, besides all the stuff we talked about above, a hashing algorithm is just a deterministic algorithm (meaning it acts the same way every time, no randomness) that takes some input, chews it up, and spits out some (often) smaller piece of data to represent it. When chewing it up, it can do things that are destructive to the data (such as integer division, which loses precision) and isn’t just limited to non destructive operations like encryption algorithms are (non destruction operations can be reversed, such as XOR, addition, subtraction, bit rotations).

As an extra piece of security, people often “SALT” their hashes which means they hash some constant before hashing whatever data they want to hash. This constant is called the salt and you can think of it kind of like a password. This way, even if someone knows what algorithm you are using to hash data (such as the popular MD5 or SHA-1 hash functions), you’d also have to know the salt used to more effectively attack the system.  It’s a little extra bit of security, which is always nice.

Example Hash Function

Here’s an example hash function in C++.   Again, note that this is not really fit for real world use or important situations, it’s just for educational purposes.  You’d want to do more “chewing” and use different operations, bit rotations to make sure all the bits got “hit” by the xor’s, etc.  Check out some more complex, real world hashing algorithms for more info!

//assuming sizeof(int) == 4
typedef unsigned int uint32;

//Takes a pointer and length so you can hash binary data as well as text
//note that this function as is won't give the same answers on machines with different endian-ness
uint32 Calculate4ByteHash(const unsigned char *pData, int nDataLength, const unsigned char *pSalt, int nSaltLength)
{
  //setup some variables
  uint32 nHash = 0;
  unsigned char *pHashPointer = (unsigned char *)&nHash;

  //salt the hash
  for(int nIndex = 0; nIndex < nSaltLength; ++nIndex)
  {
    pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pSalt[nIndex];
  }

  //hash the data
  for(int nIndex = 0; nIndex < nDataLength; ++nIndex)
  {
    pHashPointer[nIndex%4] = pHashPointer[nIndex%4] ^ pData[nIndex];
  }

  return nHash;
}

Rainbow Tables

Assuming the algorithm meets the critera above, the only real way to attack something secured by hashing (besides asking the nice receptionist for the secret info while batting your eyelashes) is to brute force hash a bunch of values until you find something that gives the same hash as what you are looking for.

Unfortunately, there are something called “Rainbow Tables” where people have gone through and created tables of unique hashes and “source data” that results in those hash values for common algorithms.  This way, if for instance, an attacker saw that a hashed value was “3”, and he knew you were using the MD5 algorithm, he could look at an MD5 rainbow table to find the value he could put into your system to result in a hash value of “3” and thus he’d gain some ground at attacking your security.

Of course, if you salt your hash, he would have to find out your salt value too and perhaps salting would invalidate the rainbow table entirely (depending on the algorithms used).  Also, the more bits your output hash contains, the larger a rainbow table would have to be, so if you really wanna screw with would be attackers, make your output bit count larger – it makes their job exponentially harder! (:

Popular Hashing Algorithms

Two common hashing algorithms used for various real world applications are MD5 and SHA-1.  You’ve probably seen them around, especially if you’ve used open sourced software.