What Happens When you Mix Hash Tables and Binary Searching?

While not the most cache friendly operation, binary searching a sorted data set is a pretty good way to search a data set for a specific value because for N items, you have an average case and worst case of O(log N) (Wikipedia: Binary Search Algorithm).

Hashing is also a decent way to do data searching, because it has a best case for search of O(1) with the average case being able to be near that when tuned well, but in practice due to collisions it can get get up to O(n) (Wikipedia: Hash Table).

Note that if you know the keys in advance, you can ALWAYS get O(1) lookup by using the information from the last post: Minimal Perfect Hashing.

One way to deal with hash collisions is to store all the collisions for a specific hash value in a list or array.

For instance, if you had a 4 bit hash, you could have 16 arrays, where the [0]th array stored all the items that hashed to 0, the [1]th array stored all items that hashed to 1 and so on, going up to the [15]th array which stored all items that hashed to 15.

What would happen if we stored the arrays in sorted order and then did a binary search within the buckets? What would that mean to search times?

Interestingly, assuming a good hash function, the answer is that every bit of hash you use subtracts one from the number of tests you’d have to do with binary searching (See footnote 1).

Examples

For instance, let’s say you had 1024 items sorted in an array, you would have to do up to 10 tests to search for a value in that array using a binary search since log2(1024)=10 (See footnote 2).

If we use a 1 bit hash, assuming a good hash function, that would split the array into two arrays each having 512 items in it. Those each can take up to 9 tests to search using a binary search since log2(512)=9. Doing the hash to choose which of the two lists to search cuts our search from up to 10 tests, down to 9 tests.

If instead we used a 2 bit hash, we would divide the 1024 item list into four lists each having 256 items in them. Each of these lists can be searched with up to 8 tests since log2(256) = 8. So using a 2 bit hash, we can cut our search down to 8 tests max.

Let’s say that we used an 8 bit hash. That would cut our list up into 256 lists, each of which only has 4 items in it. Each list can be searched with up to 2 tests since log2(4) = 2. Using an 8 bit hash cuts our max number of searches from 10 down to 2!

Let’s say we used a 16 bit hash, what would happen then?

Well, that would mean we had 65536 hash buckets, but only 1024 items to store in the buckets. If we had a best case collision free hash, that means only 1024 buckets had an item in them, and the rest were empty.

You’d have to hash the value you were searching for to get the hash bucket index, and if there was an item there, you’d have to test that item. If there was no item there, you could return that the value wasn’t found.

The hash isn’t free though, so this is basically O(1) or doing 1 test.

So, while each bit of hash subtracts one test from the binary search, it of course can’t go negative, or become free, so it basically has a minimum of 1.

Footnotes

  1. Technically it only approximately subtracts one from the number of tests you’d have to do, even if the hash function divides the list as evenly as possible, due to footnote 2 and not being able to split an odd number of items exactly in half.
  2. technically 1024 items in an array could take up to 11 tests! You can see why with 4 items with indices 0,1,2,3. First you’d test index 2. If the value we were looking for was greater, we’d test 3 and then be done with either a found or not found result. That’s just 2 searches total. But, if the value we were looking for was less than index 2, we’d have to test index 1, and then optionally test index 0 depending on how the search value compared to index 1. With only 3 items, indicies 0,1,2, we test index 1, then either index 0 or 2 and be done, and so only have to test twice. A binary search takes log2(N+1) tests, where N is the number of items you are looking through.

Quick Blurb

The last post i wrote on minimal perfect hashing was insanely popular (by my standards) on both reddit and hacker news. The previous highest number of visitors I had ever had to my blog in one day was about 230, but on the 16th I got 4187 visitors, and on the 17th I got 7277 visitors. That means there were over 10,000 visitors over those two days.

That is nuts!!

As a data point for those who might find it interesting, the bulk of the traffic from the first day was reddit, and the bulk of the traffic from the second day was hacker news.

I also found it interesting to see what all those people looked at after the minimal perfect hashing algorithm.

I’ve learned as the years have gone on so some of my older stuff probably contains more errors than my newer stuff (which is also not error free I’m sure).

Anyways, thanks for reading, and I hope you guys find my writings interesting and useful. Please feel free to comment and correct any misinformation, or let us know about better alternatives (:

O(1) Data Lookups With Minimal Perfect Hashing

Hash tables are great in that you can hash a key and then use that hash as an index into an array to get the information associated with that key. That is very fast, so long as you use a fast hash function.

The story doesn’t end there though because hash functions can have collisions – multiple things can hash to the same value even though they are different. This means that in practice you have to do things like store a list of objects per hash value, and when doing a lookup, you need to do a slower full comparison against each item in the bucket to look for a match, instead of being able to only rely on the hash value comparisons.

What if our hash function didn’t have collisions though? What if we could take in N items, hash them, and get 0 to N-1 as output, with no collisions? This post will talk about how to make that very thing happen, with simple sample C++ code as well, believe it or not!

Minimal Perfect Hashing

Perfect hashing is a hash function which has no collisions. You can hash N items, and you get out N different hash values with no collisions. The number of items being hashed has to be smaller than or equal to the possible values your hash can give as output though. For instance if you have a 4 bit hash which gives 16 possible different hash values, and you hash 17 different items, you are going to have at least one collision, it’s unavoidable. But, if you hash 16 items or fewer, it’s possible in that situation that you could have a hash which had no collisions for the items you hashed.

Minimal perfect hashing takes this a step further. Not only are there no collisions, but when you hash N items, you get 0 to N-1 as output.

You might imagine that this is possible – because you could craft limitless numbers of hashing algorithms, and could pass any different salt value to it to change the output values it gives for inputs – but finding the specific hashing algorithm, and the specific salt value(s) to use sounds like a super hard brute force operation.

The method we are going to talk about today is indeed brute force, but it cleverly breaks the problem apart into smaller, easier to solve problems, which is pretty awesome. It’s actually a pretty simple algorithm too.

Here is how you create a minimal perfect hash table:

  1. Hash the items into buckets – there will be collisions at this point.
  2. Sort the buckets from most items to fewest items.
  3. Find a salt value for each bucket such that when all items in that bucket are hashed, they claim only unclaimed slots. Store this array of salt values for later. it will be needed when doing a data lookup.
  4. If a bucket only has one item, instead of searching for a salt value, you can just find an unclaimed index and store -(index+1) into the salt array.

Once you have your minimal perfect hash calculated, here is how you do a lookup:

  1. Hash the key, and use that hash to find what salt to use.
  2. If the salt is positive (or zero), hash the key again using the salt. The result of that hash will be an index into the data table.
  3. If the salt is negative, take the absolute value and subtract one to get the index in the data table.
  4. Since it’s possible the key being looked for isn’t in the table, compare the key with the key stored at that index in the table. If they match, return the data at that index. If they don’t match, the key was not in the table.

Pretty simple isn’t it?

Algorithm Characteristics

This perfect minimal hash algorithm is set up to be slow to generate, but fast to query. This makes it great for situations where the keys are static and unchanging, but you want fast lookup times – like for instance loading a data file for use in a game.

Interestingly though, while you can’t add new keys, you could make it so you could delete keys. You would just remove the key / value pair from the data set, and then when doing a lookup you’d find an empty slot.

Also, there is nothing about this algorithm that says you can’t modify the data associated with the keys, at runtime. Modifying the data associated with a key doesn’t affect where the key/value pair is stored in the table, so you can modify the data all you want.

If you wanted to be able to visit the items in a sorted order, when searching for the perfect minimal hash, you could also make the constraint that when looking for the salt values, that not only did the items in the bucket map to an unclaimed slot, you could make sure they mapped to the correct slot that they should be in to be in sorted order. That would increase the time it took to generate the table, and increase the chances that there was no valid solution for any salt values used, but it is possible if you desire being able to know the items in some sorted order.

Interestingly, the generation time of the minimal perfect hash apparently grows linearly with the number of items it acts on. That makes it scale well. On my own computer for instance, I am able to generate the table for 100,000 items in about 4.5 seconds.

Also, in my implementation, if you have N items, it has the next lower power of two number of salt values. You could decrease the number of salt values used if you wanted to use less memory, but that would again come at the cost of increased time to generate the table, as well as increase the chances that there was no valid solution for any salt values used.

Example Code

Below is a simple implementation of the algorithm described.

The main point of the code (besides functionality) is readability so it isn’t optimized as well as it could be, but still runs very fast (100,000 items processed in about 4.5 seconds on my machine). Debug is quite a bit slower than release for me though – I gave up on those same 100,000 items after a few minutes running in debug.

The code below uses MurmurHash2, but you could drop in whatever hashing algorithm you wanted.

The data file for this code is words.txt and comes to us courtesy of English Wordlists.

#include <vector>
#include <algorithm>
#include <assert.h>
#include <fstream>
#include <string>

unsigned int MurmurHash2(const void * key, int len, unsigned int seed);

//=================================================================================
template <typename VALUE>
class CPerfectHashTable {
public:

    typedef std::pair<std::string, VALUE> TKeyValuePair;
    typedef std::vector<TKeyValuePair> TKeyValueVector;
    struct SKeyValueVectorBucket {
        TKeyValueVector m_vector;
        size_t          m_bucketIndex;
    };
    typedef std::vector<struct SKeyValueVectorBucket> TKeyValueVectorBuckets;

    // Create the perfect hash table for the specified data items
    void Calculate (const TKeyValueVector& data) {

        // ----- Step 1: hash each data item and collect them into buckets.
        m_numItems = data.size();
        m_numBuckets = NumBucketsForSize(m_numItems);
        m_bucketMask = m_numBuckets - 1;
        m_salts.resize(m_numBuckets);
        m_data.resize(m_numItems);
        TKeyValueVectorBuckets buckets;
        buckets.resize(m_numBuckets);

        for (size_t i = 0; i < m_numBuckets; ++i)
            buckets[i].m_bucketIndex = i;

        for (const TKeyValuePair& pair : data) {
            size_t bucket = FirstHash(pair.first.c_str());
            buckets[bucket].m_vector.push_back(pair);
        }

        // ----- Step 2: sort buckets from most items to fewest items
        std::sort(
            buckets.begin(),
            buckets.end(),
            [](const SKeyValueVectorBucket& a, const SKeyValueVectorBucket& b) {
                return a.m_vector.size() > b.m_vector.size();
            }
        );

        // ----- Step 3: find salt values for each bucket such that when all items
        // are hashed with their bucket's salt value, that there are no collisions.
        // Note that we can stop when we hit a zero sized bucket since they are sorted
        // by length descending.
        std::vector<bool> slotsClaimed;
        slotsClaimed.resize(m_numItems);
        for (size_t bucketIndex = 0, bucketCount = buckets.size(); bucketIndex < bucketCount; ++bucketIndex) {
            if (buckets[bucketIndex].m_vector.size() == 0)
                break;
            FindSaltForBucket(buckets[bucketIndex], slotsClaimed);
        }
    }

    // Look up a value by key.  Get a pointer back.  null means not found.
    // You can modify data if you want, but you can't add/remove/modify keys without recalculating.
    VALUE* GetValue (const char* key) {

        // do the first hash lookup and get the salt to use
        size_t bucket = FirstHash(key);
        int salt = m_salts[bucket];

        // if the salt is negative, it's absolute value minus 1 is the index to use.
        size_t dataIndex;
        if (salt < 0)
            dataIndex = (size_t)((salt * -1) - 1);
        // else do the second hash lookup to get where the key value pair should be
        else
            dataIndex = MurmurHash2(key, strlen(key), (unsigned int)salt) % m_data.size();

        // if the keys match, we found it, else it doesn't exist in the table
        if (m_data[dataIndex].first.compare(key) == 0)
            return &m_data[dataIndex].second;
        return nullptr;
    }

private:

    unsigned int FirstHash (const char* key) {
        return MurmurHash2(key, strlen(key), 435) & m_bucketMask;
    }

    void FindSaltForBucket (const SKeyValueVectorBucket& bucket, std::vector<bool>& slotsClaimed) {

        // if the bucket size is 1, instead of looking for a salt, just encode the index to use in the salt.
        // store it as (index+1)*-1 so that we can use index 0 too.
        if (bucket.m_vector.size() == 1) {
            for (size_t i = 0, c = slotsClaimed.size(); i < c; ++i)
            {
                if (!slotsClaimed[i])
                {
                    slotsClaimed[i] = true;
                    m_salts[bucket.m_bucketIndex] = (i + 1)*-1;
                    m_data[i] = bucket.m_vector[0];
                    return;
                }
            }
            // we shouldn't ever get here
            assert(false);
        }

        // find the salt value for the items in this bucket that cause these items to take
        // only unclaimed slots
        for (int salt = 0; ; ++salt) {
            // if salt ever overflows to a negative number, that's a problem.
            assert(salt >= 0);
            std::vector<size_t> slotsThisBucket;
            bool success = std::all_of(
                bucket.m_vector.begin(),
                bucket.m_vector.end(),
                [this, &slotsThisBucket, salt, &slotsClaimed](const TKeyValuePair& keyValuePair) -> bool {
                    const char* key = keyValuePair.first.c_str();
                    unsigned int slotWanted = MurmurHash2(key, strlen(key), (unsigned int)salt) % m_numItems;
                    if (slotsClaimed[slotWanted])
                        return false;
                    if (std::find(slotsThisBucket.begin(), slotsThisBucket.end(), slotWanted) != slotsThisBucket.end())
                        return false;
                    slotsThisBucket.push_back(slotWanted);
                    return true;
                }
            );

            // When we find a salt value that fits our constraints, remember the salt
            // value and also claim all the buckets.
            if (success)
            {
                m_salts[bucket.m_bucketIndex] = salt;
                for (size_t i = 0, c = bucket.m_vector.size(); i < c; ++i)
                {
                    m_data[slotsThisBucket[i]] = bucket.m_vector[i];
                    slotsClaimed[slotsThisBucket[i]] = true;
                }
                return;
            }
        }
    }

    static size_t NumBucketsForSize (size_t size) {
        // returns how many buckets should be used for a specific number of elements.
        // Just uses the power of 2 lower than the size, or 1, whichever is bigger.
        if (!size)
            return 1;

        size_t ret = 1;
        size = size >> 1;
        while (size) {
            ret = ret << 1;
            size = size >> 1;
        }
        return ret;
    }

    // When doing a lookup, a first hash is done to find what salt to use
    // for the second hash.  This table stores the salt values for that second
    // hash.
    std::vector<int> m_salts;

    // NOTE: this stores both the key and the value.  This is to handle the
    // situation where a key is searched for which doesn't exist in the table.
    // That key will still hash to some valid index, so we need to detect that
    // it isn't the right key for that index.  If you are never going to look for
    // nonexistant keys, then you can "throw away the keys" and only store the
    // values.  That can be a nice memory savings.
    std::vector<TKeyValuePair> m_data;

    // useful values
    size_t m_numItems;
    size_t m_numBuckets;
    size_t m_bucketMask;
};

// MurmurHash code was taken from https://sites.google.com/site/murmurhash/
//=================================================================================
// MurmurHash2, by Austin Appleby
 
// Note - This code makes a few assumptions about how your machine behaves -
 
// 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4
 
// And it has a few limitations -
 
// 1. It will not work incrementally.
// 2. It will not produce the same results on little-endian and big-endian
//    machines.
 
unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
 
    const unsigned int m = 0x5bd1e995;
    const int r = 24;
 
    // Initialize the hash to a 'random' value
 
    unsigned int h = seed ^ len;
 
    // Mix 4 bytes at a time into the hash
 
    const unsigned char * data = (const unsigned char *)key;
 
    while(len >= 4)
    {
        unsigned int k = *(unsigned int *)data;
 
        k *= m; 
        k ^= k >> r; 
        k *= m; 
         
        h *= m; 
        h ^= k;
 
        data += 4;
        len -= 4;
    }
     
    // Handle the last few bytes of the input array
 
    switch(len)
    {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0];
            h *= m;
    };
 
    // Do a few final mixes of the hash to ensure the last few
    // bytes are well-incorporated.
 
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
 
    return h;
}


//=================================================================================
void WaitForEnter ()
{
    printf("Press Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
int main (int argc, char **argv)
{
    // Read the data entries from a file.  Use the line as the key, and the line
    // number as the data.  Limit it to 100,000 entries.
    printf("Loading Data...");
    CPerfectHashTable<int> table;
    decltype(table)::TKeyValueVector data;
    std::ifstream file("words.txt");
    std::string str;
    int i = 0;
    while (std::getline(file, str) && i < 100000)
    {
        data.push_back(std::make_pair(str, i));
        ++i;
    }
    printf("Donenn");

    // make the perfect hash table
    printf("Generating Minimal Perfect Hash Table...");
    table.Calculate(data);
    printf("Donenn");

    // Verify results
    printf("Verifying results...");
    for (decltype(table)::TKeyValuePair& keyValuePair : data) {
        int* value = table.GetValue(keyValuePair.first.c_str());
        assert(value != nullptr);
        if (value == nullptr)
        {
            printf("Error, could not find data for key "%s"n", keyValuePair.first.c_str());
            WaitForEnter();
            return 0;
        }
        assert(*value == keyValuePair.second);
        if (*value != keyValuePair.second)
        {
            printf("  table["%s"] = %in", keyValuePair.first.c_str(), *value);
            printf("Incorrect value detected, should have gotten %i!n", keyValuePair.second);
            WaitForEnter();
            return 0;
        }
    }
    printf("Donenn");

    WaitForEnter();
    return 0;
}

Links & More

I learned the details of this algorithm from this page: Steve Hanov’s Blog: Throw away the keys: Easy, Minimal Perfect Hashing, after hearing about the technique mentioned occasionally by colleagues.

There are other ways to do minimal perfect hashing however. For instance give this a read: Minimal Perfect Hashing

One place that method is better than this one, is that in this one, when doing a lookup you have to hash the key twice. In the technique described by the last technique, you only have to hash the key once, and use that hash to combine the results of two lookups. The two lookups are not dependant so can be re-ordered or happen concurrently, which makes it faster on modern CPUs.

Apparently there are also some techniques for generating the minimal perfect hash on a large number of items by breaking them apart into smaller sets, which can then be paralelized across threads.

I also wanted to mention that a large part of the in memory representation of this data structure can come from storing the keys along with the data, to verify that you have indeed found the item you are looking for after doing the O(1) lookup. If you are in a situation where you are only ever going to be searching for keys that you know are in the table, you can actually forgo storing the keys, to bring the memory requirements down a significant amount.

Also, The example code implements this as a hash table, but you could also use this as a set, if you wanted fast membership tests. You could either store the keys, to be able to resolve when things were not part of the set, or, you could make a hash table of string to bool, where the bool specified whether something was in the set. Which one is better depends on your specific needs and whether you plan to search for unknown keys or not.

Lastly, as a byproduct of using a data structure like this, you can get a unique ID per key, which is the index that it appears in the data array. This can be super helpful if you have something like a list of filenames, where you would rather work with integers instead of specific file names, since integers are quicker and easier to compare, copy around, etc.

You could even make this data structure support lookups by either key or by unique id. This way, you could do a by key lookup the first time you needed to find something, and then could store off the ID to get it by ID from then on for a faster lookup. Heck, you could even do all your lookups “offline” (at tool time, not when the app is running) and for instance convert all file references in data files to be the each file’s unique ID instead. Then, you could toss out the keys of your data structure, only storing an array of data, and using the unique file ID to look into that table whenever you wanted to read or write meta data associated with that file. That would make lookups faster, and also decrease the memory requirements of memory resident data files.

It’s pretty cool stuff, and a pretty useful algorithm. Hopefully you find it useful as well (:

Update – 12/22/15

Interestingly, this post has had something like 15,000 readers since it was posted. That is by far the most read post on this blog 😛

Anyways, I wanted to add some more info I’ve found recently.

Here are three tools for doing minimal perfect hashing that are very likely to give you better results than the algorithm I describe above:

Here’s a conversation talking about gperf and the alternative applications, and pros and cons for each:
Stack Overflow: Making a perfect hash (all consecutive buckets full), gperf or alternatives?

Here’s a research paper on gperf by Doug Schmidt: GPERF – A Perfect Hash Function Generator

I had a thought that maybe there was some play here by using “logical synthesis” to come up with some algorithm to map the inputs (the keys of the hash table) to the outputs (collision free output indices).

I started looking into Karnaugh maps and then the Quine–McCluskey algorithm, and then espresso and espresso-exact (mincov). Where the first two things are decent at solving multi bit input to single bit output, the second two things are decent at solving multi bit input to multi bit output, allowing operations to be shared among bits.

While I haven’t found anyone using those specific algorithms to solve the problem, people have, and definitely are still, trying to also look into the ability to generate code without lookups. From what I’ve read so far, it sounds like finding such a function takes a lot longer to find and also that it runs more slowly in practice than a less perfect solution which has lookups.

Either way, this is still an active area of research, and plenty of folks are working on it so I’m going to leave it to them.

I also sort of have the feeling that if you are in need of minimal perfect hashing, you may be “doing it wrong”. For instance, if you are at all able to, you probably are likely to be better off having a pre-agreed on set of unique IDs per “thing” you want to look up. These unique IDs can be used directly as array indices for the magical always O(1) lookup that minimal perfect hashing is going for, and is actually a quicker lookup in practice since you don’t need to jump through special hoops to calculate the array index.

The only exceptions I can think of are:

  1. Real world requirements and not being able to reach the ideal situation – this is the boat I’m in right now. Ideally, systems would be asking about things by an integer ID, but I can’t get there in a single step, so the perfect hashing is a temporary bridge til I can get there.
  2. Even with IDs, sparse arrays are still problematic. You may have an ID per file that could be asked about, but say that you have 1,000,000 files, but you want to make an array of data for only 10,000 of them. How do you take the unique ID of a file and do a lookup into that smaller table? Minimal perfect hashing seems to be useful in this situation, but there may be other decent or comparable alternatives if you are already using integer IDs.

Hiding a Lookup Table in a Modulus Operation

Lookup tables are a tool found in every programmer’s tool belt.

Lookup tables let you pre-calculate a complex calculation in advance, store the results in a table (an array), and then during performance critical parts of your program, you access that table to get quick answers to the calculations, without having to do the complex calculation on the fly.

In this post I’ll show a way to embed a lookup table inside of a single (large) number, where you extract values from that lookup table by taking a modulus of that number with different, specific values.

This technique is slower and takes more memory than an actual lookup table, but it’s conceptually interesting, so I wanted to share.

Also, I stumbled on this known technique while working on my current paper. The paper will make this technique a bit more practical, and I’ll share more info as soon as I am able, but for now you can regard this as a curiosity 😛

Onto the details!

1 Bit Input, 1 Bit Output: Pass Through

Let’s learn by example and start with a calculation that takes in an input bit, and gives that same value for an output bit. It’s just a 1 bit pass through lookup table.

\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 0 & 0 \\ 1 & 1 \\ \end{array}

To be able to convert that to something we can decode with modulus we have to solve the following equations:

x \% k_0 = 0 \\ x \% k_1 = 1

x is the number that represents our lookup table. k_0 and k_1 are the values that we modulus x against to get our desired outputs out.

It looks as if we have two equations and three unknowns – which would be unsolvable – but in reality, x is the only unknown. The k values can be whatever values it takes to make the equations true.

I wrote a blog post on how to solve equations like these in a previous post: Solving Simultaneous Congruences (Chinese Remainder Theorem).

You can also use this chinese remainder theorem calculator, which is handy: Chinese Remainder Theorem Calculator

The short answer here is that the k values can be ANY numbers, so long as they are pairwise co-prime to each other – AKA they have a greatest common divisor of 1.

If we pick 3 and 4 for k0 and k1, then using the chinese remainder theorem we find that x can equal 9 and the equations are true. Technically the answer is 9 mod 12, so 21, 33, 45 and many other numbers are also valid values of x, but we are going to use the smallest answer to keep things smaller, and more manageable.

So, in this case, the value representing the lookup table would be 9. If you wanted to know what value it gave as output when you plugged in the value 0, you would modulus the lookup table (9) against k0 (3) to get the output. If you wanted to know what value it gave as output when you plugged in the value 1, you would modulus the lookup table (9) against k1 (4) to get the output. The table below shows that it passes through the value in both cases like it should:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 9 \% 3 & 0\\ 1 & x \% k_1 & 9 \% 4 & 1\\ \end{array}

1 Bit Input, 1 Bit Output: Not Gate

Let’s do something a little more interesting. Let’s make the output bit be the reverse of the input bit. The equations we’ll want to solve are this:

x \% k_0 = 1 \\ x \% k_1 = 0

We can use 3 and 4 for k0 and k1 again if we want to. Using the Chinese remainder theorem to solve the equations gives us a value of 4 for x. Check the truth table below to see how this works:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 4 \% 3 & 1\\ 1 & x \% k_1 & 4 \% 4 & 0\\ \end{array}

1 Bit Input, 1 Bit Output: Output Always 1

What if we wanted the output bit to always be 1 regardless of input bit?

x \% k_0 = 1 \\ x \% k_1 = 1

Using 3 and 4 for our k values again, we solve and get a value of 1 for x. Check the truth table to see it working below:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 1 \% 3 & 1\\ 1 & x \% k_1 & 1 \% 4 & 1\\ \end{array}

Hopefully one bit input to one bit output makes sense now. Let’s move on (:

2 Bit Input, 1 Bit Output: XOR Gate

Things get a little more interesting when we bump the number of input bits up to 2. If we want to make a number which represents XOR, we now have 4 equations to solve.

x \% k_{00} = 0 \\ x \% k_{01} = 1 \\ x \% k_{10} = 1 \\ x \% k_{11} = 0

In general we will have 2^N equations, where N is the number of input bits.

You might have noticed that I use subscripts for k corresponding to the input bits that the key represents. This is a convention I’ve found useful when working with this stuff. Makes it much easier to see what’s going on.

Now with four equations, we need 4 pairwise coprime numbers – no number has a common factor with another number besides 1.

Let’s pull them out of the air. Umm… 3, 4, 5, 7

Not too hard with only two bits of input, but you can see how adding input bits makes things a bit more complex. If you wanted to make something that took in two 16 bit numbers as input for example, you would need 2^32 co-prime numbers, since there was a total of 32 bits of input!

When we solve those four equations, we get a value of 21 for x.

Notice how x is larger now that we have more input bits? That is another added complexity as you add more input bits. The number representing your program can get very, very large, and require you to use “multi precision integer” math libraries to store and decode the programs, when the numbers get larger than what can be held in a 64 bit int.

Boost has a decent library for this, check out boost::multiprecision::cpp_int, it’s what I use. You can download boost from here: http://www.boost.org/doc/libs/1_59_0/more/getting_started/windows.html

Anyhow, let’s check the truth table to see if our values work:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x \% k_{00} & 21 \% 3 & 0 \\ 01 & x \% k_{01} & 21 \% 4 & 1 \\ 10 & x \% k_{10} & 21 \% 5 & 1 \\ 11 & x \% k_{11} & 21 \% 7 & 0 \end{array}

Woot, it worked.

2 Bit Input, 2 Bit Output: OR, AND

What happens when we add another bit of output? Basically we just treat each output bit as it’s own lookup table. This means that if we have two output bits, we will have two numbers representing our program (one for each bit), and that this is true regardless of how many input bits we have.

Let’s make the left output bit (x_0 ) be the OR of the input bits and the right output bit (x_1 ) be the AND of the input bits.

That give us these two sets of equations to solve:

x_0 \% k_{00} = 0 \\ x_0 \% k_{01} = 1 \\ x_0 \% k_{10} = 1 \\ x_0 \% k_{11} = 1 \\ \\ x_1 \% k_{00} = 0 \\ x_1 \% k_{01} = 0 \\ x_1 \% k_{10} = 0 \\ x_1 \% k_{11} = 1 \\

We can use the same coprime numbers for our k values as we used in the last section (3,4,5,7). Note that we use the same k values in each set of equations. This is intentional and required for things to work out!

If we solve each set of equations we get 141 for x0, and 120 for x1.

Let’s see if that worked:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x_0 \% k_{00}, x_1 \% k_{00} & 141 \% 3, 120 \% 3 & 00 \\ 01 & x_0 \% k_{01}, x_1 \% k_{01} & 141 \% 4, 120 \% 4 & 10 \\ 10 & x_0 \% k_{10}, x_1 \% k_{10} & 141 \% 5, 120 \% 5 & 10 \\ 11 & x_0 \% k_{11}, x_1 \% k_{11} & 141 \% 7, 120 \% 7 & 11 \end{array}

Hey, it worked again. Neat!

Example Code

Now that we have the basics worked out, here is some sample code.

The lookup table takes in 8 bits as input, mapping 0..255 to 0…2pi and gives the sine of that value as output in a float. So it has 8 bits of input and 32 bits of output.

#include <vector>
#include <boost/multiprecision/cpp_int.hpp>
#include <stdint.h>
#include <string.h>
#include <memory>

typedef boost::multiprecision::cpp_int TINT;
typedef std::vector<TINT> TINTVec;

const float c_pi = 3.14159265359f;

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
static TINT ExtendedEuclidianAlgorithm (TINT smaller, TINT larger, TINT &s, TINT &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }

    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<TINT, 2> remainders = { larger, smaller };
    std::array<TINT, 2> ss = { 1, 0 };
    std::array<TINT, 2> ts = { 0, 1 };
    size_t indexNeg2 = 0;
    size_t indexNeg1 = 1;

    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        TINT newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        TINT newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];

        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);

            // if t < 0, add the modulus divisor to it, to make it positive
            if (t < 0)
                t += smaller;
            return remainders[indexNeg1];
        }

        // calculate this round's s and t
        TINT newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        TINT newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];

        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;

        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}

//=================================================================================
void MakeKey (TINTVec &keys, TINT &keysLCM, size_t index)
{
    // if this is the first key, use 3
    if (index == 0)
    {
        keys[index] = 3;
        keysLCM = keys[index];
        return;
    }

    // Else start at the last number and keep checking odd numbers beyond that
    // until you find one that is co-prime.
    TINT nextNumber = keys[index - 1];
    while (1)
    {
        nextNumber += 2;
        if (std::all_of(
            keys.begin(),
            keys.begin() + index,
            [&nextNumber] (const TINT& v) -> bool
            {
                TINT s, t;
                return ExtendedEuclidianAlgorithm(v, nextNumber, s, t) == 1;
            }))
        {
            keys[index] = nextNumber;
            keysLCM *= nextNumber;
            return;
        }
    }
}

//=================================================================================
void CalculateLookupTable (
    TINT &lut,
    const std::vector<uint64_t> &output,
    const TINTVec &keys,
    const TINT &keysLCM,
    const TINTVec &coefficients,
    size_t bitMask
)
{
    // figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    lut = 0;
    for (size_t i = 0, c = keys.size(); i < c; ++i)
    {
        // we either want this term to be 0 or 1 mod the key.  if zero, we can multiply by zero, and
        // not add anything into the bit value!
        if ((output[i] & bitMask) == 0)
            continue;

        // if 1, use chinese remainder theorem
        TINT s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], keys[i], s, t);
        lut = (lut + ((coefficients[i] * t) % keysLCM)) % keysLCM;
    }
}

//=================================================================================
template <typename TINPUT, typename TOUTPUT, typename LAMBDA>
void MakeModulus (TINTVec &luts, TINTVec &keys, LAMBDA &lambda)
{
    // to keep things simple, input sizes are being constrained.
    // Do this in x64 instead of win32 to make size_t 8 bytes instead of 4
    static_assert(sizeof(TINPUT) < sizeof(size_t), "Input too large");
    static_assert(sizeof(TOUTPUT) < sizeof(uint64_t), "Output too large");

    // calculate some constants
    const size_t c_numInputBits = sizeof(TINPUT) * 8;
    const size_t c_numInputValues = 1 << c_numInputBits;
    const size_t c_numOutputBits = sizeof(TOUTPUT) * 8;

    // Generate the keys (coprimes)
    TINT keysLCM;
    keys.resize(c_numInputValues);
    for (size_t index = 0; index < c_numInputValues; ++index)
        MakeKey(keys, keysLCM, index);

    // calculate co-efficients for use in the chinese remainder theorem
    TINTVec coefficients;
    coefficients.resize(c_numInputValues);
    fill(coefficients.begin(), coefficients.end(), 1);
    for (size_t i = 0; i < c_numInputValues; ++i)
    {
        for (size_t j = 0; j < c_numInputValues; ++j)
        {
            if (i != j)
                coefficients[i] *= keys[j];
        }
    }

    // gather all the input to output mappings by permuting the input space
    // and storing the output for each input index
    std::vector<uint64_t> output;
    output.resize(c_numInputValues);
    union
    {
        TINPUT value;
        size_t index;
    } input;
    union
    {
        TOUTPUT value;
        size_t index;
    } outputConverter;

    for (input.index = 0; input.index < c_numInputValues; ++input.index)
    {
        outputConverter.value = lambda(input.value);
        output[input.index] = outputConverter.index;
    }

    // iterate through each possible output bit, since each bit is it's own lut
    luts.resize(c_numOutputBits);
    for (size_t i = 0; i < c_numOutputBits; ++i)
    {
        const size_t bitMask = 1 << i;
        CalculateLookupTable(
            luts[i],
            output,
            keys,
            keysLCM,
            coefficients,
            bitMask
        );
    }
}

//=================================================================================
int main (int argc, char **argv)
{
    // Look up tables encodes each bit, keys is used to decode each bit for specific
    // input values.
    TINTVec luts;
    TINTVec keys;

    // this is the function that it turns into modulus work
    typedef uint8_t TINPUT;
    typedef float TOUTPUT;
    auto lambda = [] (TINPUT input) -> TOUTPUT
    {
        return sin(((TOUTPUT)input) / 255.0f * 2.0f * c_pi);
    };

    MakeModulus<TINPUT, TOUTPUT>(luts, keys, lambda);

    // show last lut and key to show what kind of numbers they are
    std::cout << "Last Lut: " << *luts.rbegin() << "n";
    std::cout << "Last Key: " << *keys.rbegin() << "n";

    // Decode all input values
    std::cout << "n" << sizeof(TINPUT) << " bytes input, " << sizeof(TOUTPUT) << " bytes outputn";
    for (size_t keyIndex = 0, keyCount = keys.size(); keyIndex < keyCount; ++keyIndex)
    {
        union
        {
            TOUTPUT value;
            size_t index;
        } result;

        result.index = 0;

        for (size_t lutIndex = 0, lutCount = luts.size(); lutIndex < lutCount; ++lutIndex)
        {
            TINT remainder = luts[lutIndex] % keys[keyIndex];
            size_t remainderSizeT = size_t(remainder);
            result.index += (remainderSizeT << lutIndex);
        }

        TINT remainder = luts[0] % keys[keyIndex];
        std::cout << "i:" << keyIndex << " o:" << result.value << "n";
    }

    WaitForEnter();
    return 0;
}

Here is some output from the program. The first is to show what the last (largest) look up table and key look like. Notice how large the look up table number is!

Here it shows some sine values output from the program, using modulus against the large numbers calculated, to get the bits of the result out:

How to Get Lots of Pairwise Co-Prime Numbers?

You can generate a list of pairwise coprimes using brute force. Have an integer that you increment, and check if it’s pairwise co-prime to the existing items in the list. If it is, add it to the list! Rinse and repeat until you have as many as you want.

That is the most practical way to do it, but there are two other interesting ways I wanted to mention.

The first way is using Fermat numbers. The Fermat numbers are an infinite list of pairwise co-prime numbers and are calculated as 2^{2^n}+1 where n is an integer. Fermat numbers also have the benefit that you can get the nth item in the list without calculating the numbers that came before it. The only problem is that the numbers grow super huge very fast. The first 7 values are: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617. If Fermat numbers didn’t grow so quickly, they sure would be useful for things like this technique.

The second way is using something called Sylvester’s sequence. It too is an infinite list of pairwise co-prime numbers, and it too grows very large very quickly unfortunately. I also don’t believe there is a way to calculate the Nth item in the list directly. Every number is based on previous numbers, so you have to calculate them all from the beginning. No random access!

Beyond Binary

In this post I showed how to work in binary digits, but there is no reason why you have to encode single bits in the lookup tables.

Instead of encoding 0 or 1 in each modulus “lookup table”, you could also perhaps store base 10 numbers in the tables and have 0-9. Or, maybe you encode a byte per lookup table.

Encoding more than one bit effectively makes both your input and your output smaller, which helps the algorithm do more with less.

Your keys will need to be larger though, since the keys have to be larger than the value you plan to store, and your resulting lookup table will be a larger number as well. It might make the technique more worth while though.

I’ll leave that as an exercise for you. If try it and find neat stuff, post a comment and let us know, or drop me an email or something. It’d be neat to hear if people find any practical usage cases of this technique 😛

The End, For Now!

I want to point out that increasing the number of input bits in this technique is a pretty expensive thing to do, but increasing the number of output bits is a lot cheaper. It kind of makes sense in a way if you think about it. Input bits add information from the outside world that must be dealt with, while output bits are just fluff that can easily be diluted or concentrated by adding or removing bits that are associated with, and calculated from, the input bits.

Another problem you may have noticed with this technique is that if you have a really expensive calculation that you are trying to “flatten” into modulus math like this, that you have to run that calculation many, many times to know what values a lookup table would give you. You have to run it once per possible input to get every possible output. That is expected when making a lookup table, since you are paying a cost up front to make things faster later.

The paper I’m working on changes things a bit though. One of the things it does is it makes it so doing this technique only requires that you evaluate the function once, and it calculates all values simultaneously to give the end result that you can then do modulus against. It’s pretty cool IMO and I will share more details here as soon as I am able – and yes, i have actual working code that does that, believe it or not! I’m looking forward to being able to share it later on. Maybe someone will find some really cool usage case for it.

Quantum Computing For Programmers Part 2: Multiple Qubits

In part 1 (Quantum Computing For Programmers Part I: One Qubit) we looked at the basics of quantum computing and how to deal with a single qubit. Here we’ll talk about the interesting things that happen when you involve multiple qubits!

Multiple Qubit Probabilities & Possibilities

In the last post we used the analogy of a coin to describe a qubit. The coin can be either heads or tails, or flipping in the air, waiting to become either a heads or tails when it lands. The same is true of how a qubit works, where it can either be a 0 or a 1 or some superposition of both, only deciding which it is when observed. Furthermore, just like a real coin, there can be a bias towards becoming one value or the other.

We represented a coin by having a probability for each possible state (heads or tails), but represented a qubit by having an AMPLITUDE for each possible state (0 or 1), where you square an amplitude to get the probability.

Going back to the coin analogy, how would probabilities work if we had two coins?

The four possible outcomes with two coins are:
Heads/Heads
Heads/Tails
Tails/Heads
Tails/Tails

Each coin individually could either be sitting on the table heads or tails, or up in the air, waiting to become either a heads or a tails, with some probability of becoming a heads or a tails.

To figure out the probability of each possible outcome, you just multiply the probability of each coin’s outcome together.

For instance, let’s say that the first coin (coin A) is already sitting on the table as a heads, and the second coin (coin B) is flipping in the air, with a 1/3 chance of becoming heads and a 2/3 chance of becoming tails. Here is how you would calculate the probability of each possible state:

\begin{array}{c|c|c|c} \text{Outcome A/B} & \text{Coin A Probability} & \text{Coin B Probability} & \text{Outcome Probability (A*B)}\\ \hline heads / heads & 100\% & 33\% & 33\% \\ heads / tails & 100\% & 67\% & 67\% \\ tails / heads & 0\% & 33\% & 0\% \\ tails / tails & 0\% & 67\% & 0\% \\ \end{array}

Qubits actually work the same way! Using the same values as the coins, let’s say qubit A is a 0, and qubit B has a 1/3 chance of becoming a 0, and a 2/3 chance of becoming a 1. Converting those probabilities to amplitudes you could get A=[1,0], B=[\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}].

\begin{array}{c|c|c|c|c} \text{Outcome AB} & \text{Qubit A Amplitude} & \text{Qubit B Amplitude} & \text{Outcome Amplitude(A*B)} & \text{Outcome Probability}\\ \hline 00 & 1 & 1/\sqrt{3} & 1/\sqrt{3} & 33\% \\ 01 & 1 & \sqrt{2}/\sqrt{3} & \sqrt{2}/\sqrt{3} & 67\% \\ 10 & 0 & 1/\sqrt{3} & 0 & 0\%\\ 11 & 0 & \sqrt{2}/\sqrt{3} & 0 & 0\% \\ \end{array}

Note that in both cases, the probabilities add up to 100% like you’d expect.

In the case of qubits, the resulting vector is [\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}, 0, 0] , which is still normalized, and represents the amplitudes for the 4 possible states of those two qubits: 00, 01, 10 and 11.

When working with one qubit (or coin), there are 2 possible outcomes 0 or 1 (heads or tails). When working with two qubits (or coins), there are four possible outcomes 00, 01, 10, 11 (heads/heads, heads/tails, tails/heads, tails/tails). When working with three qubits or coins, there are eight possible outcomes: 000,001,010 … 111.

With both coins and qubits there are 2^N possibilities, where N is the number of qubits or coins you have.

If you had 8 qubits, you’d have to use a 256 dimensional vector to describe the possibilities, since 2^8 is 256. When performing quantum computing with 8 qubits, you only have to deal with the 8 qubits. When simulating quantum computing on a regular, classical computer, you have to deal with the 256 amplitudes. This kind of gives a glimpse at how quantum computers can be faster than regular computers at certain things. There is an economy of scale working against us on regular computers simulating quantum computing.

The method we used to combine the probabilities of single qubits into an amplitude vector representing multiple qubits is called the Kronecker product. That’s just a fancy way of saying we have to multiply everything from the first vector by everything from the second vector, to get a third vector that is bigger than the first two. You’ll see it represented like this: A \otimes B and while it’s similar to the “outer product” and even uses the same symbol (!), it gives a slightly different result versus if you did an outer product and then vectorized the matrix.

The Kronecker product of vectors works like this:

\begin{bmatrix} A_1 \\ A_2 \\ ... \\ A_M \\ \end{bmatrix} \otimes \begin{bmatrix} B_1 \\ B_2 \\ ... \\ B_N \\ \end{bmatrix} = \begin{bmatrix} A_1 B_1 \\ A_1 B_2 \\ ... \\ A_1 B_N \\ A_2 B_1 \\ A_2 B_2 \\ ... \\ A_2 B_N \\ ... \\ A_M B_1 \\ A_M B_2 \\ ... \\ A_M B_N \\ \end{bmatrix}

Let’s show an example involving two qubits.

The first qubit has a 75% chance of being heads so it’s amplitude is [\sqrt{3}/2,1/2]. The second qubit has a 2/3 chance of being heads so has an amplitude of [\sqrt{2}/\sqrt{3}, 1/\sqrt{3}].

To calculate the amplitude vector representing these two qubits, we do a kronecker product:
\begin{bmatrix} \cfrac{\sqrt{3}}{2} \\ \cfrac{1}{2} \\ \end{bmatrix} \otimes \begin{bmatrix} \cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{\sqrt{3}} \\ \end{bmatrix} = \begin{bmatrix} \cfrac{\sqrt{3}}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{\sqrt{3}}{2}\cfrac{1}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{1}{\sqrt{3}} \\ \end{bmatrix}

If you simplify that, you get:
\begin{bmatrix} \cfrac{1}{\sqrt{2}} \\ \cfrac{1}{2} \\ \cfrac{1}{\sqrt{6}} \\ \cfrac{1}{2\sqrt{3}} \\ \end{bmatrix}

or horizontally for easier reading:
\begin{bmatrix} \cfrac{1}{\sqrt{2}} & \cfrac{1}{2} & \cfrac{1}{\sqrt{6}} & \cfrac{1}{2\sqrt{3}} & \end{bmatrix}

Squaring those values to get the probabilities of each state, we get the below, which you might notice adds up to 100%, meaning the vector is still normalized!
\begin{bmatrix} 50\% & 25\% & 17\% & 8\% \end{bmatrix}

This generalizes for larger numbers of qubits, so you could use the Kronecker product to combine a vector describing 4 qubits with a vector describing 3 qubits, to come up with a vector that describes 7 qubits (which would have 128 amplitudes in it, since 2^7 = 128!).

The kronecker product also generalizes to matrices, which we will talk about shortly.

To be able to work with multiple qubits in a quantum circuit, you need to represent all qubits involved in a single vector. Doing this, while being the correct thing to do, also allows entanglement to happen, which we will talk about next!

Entanglement – Simpler Than You Might Think!

Time to demystify quantum entanglement!

Quantum entanglement is the property whereby two qubits – which can be separated by any distance, such as several light years – can be observed and come up with the same value (0 or 1). Whether they are 0s or 1s is random, but they will both agree, when entangled in the way that causes them to agree (you could alternately entangle them to always disagree).

Interestingly, in 1965 John Bell proved that the mechanism that makes this happen is NOT that the qubits share information in advance. You can read more about that here – it’s near the end: Ars Technica – A tale of two qubits: how quantum computers work.

A common misconception is that this means that we can have faster than light (instantaneous) communication across any distance. That turns out not to be true, due to the No-Communication-Theorem (Wikipedia).

Interestingly though, you can use separated entangled qubits in separated quantum circuits to do something called Quantum Pseudo Telepathy (Wikipedia), which lets you do some things that would otherwise be impossible – like reliably winning a specially designed game that would otherwise be a game of chance.

I don’t yet understand enough about Quantum Pseudo Telepathy to see why it isn’t considered communication. I also have no idea how entanglement is actually “implemented” in the universe, but nobody seems to (or if they do, they aren’t sharing!). How can it be that two coins flipped on opposite sides of the galaxy can be guaranteed to both land with the same side facing up?

Despite those mysteries, the math is DEAD SIMPLE, so let me share it with you, it’ll take like one short paragraph. Are you ready?

Quantum computing works by manipulating the probabilities of the states of qubits. If you have two qubits, there are four possible sets of values when you observe them: 00, 01, 10, 11. If you use quantum computing to set the probabilities of the 01 and 10 states to a 0% chance, and set the probabilities of the 00 and 11 states to a 50% chance each, you’ve now entangled the qubits such that when you observe them, they will always agree, because you’ve gotten rid of the chances that they could ever DISAGREE. Similarly, if instead of setting the 01 and 10 states to 0%, you set the probability of the 00 and 11 states to 0%, you’d have entangled qubits which would always disagree when you observed them, because you’ve gotten rid of the chances that they could ever AGREE.

That’s all there is to it. Strange how simple it is, isn’t it? The table below shows how the only possible outcomes are that the qubits agree, but it is a 50/50 chance whether they are a 0 or a 1:

\begin{array}{c|c|c} \text{Outcome} & \text{Amplitude} & \text{Probability}\\ \hline 00 & 1/\sqrt{2} & 50\% \\ 01 & 0 & 0\% \\ 10 & 0 & 0\% \\ 11 & 1/\sqrt{2} & 50\% \\ \end{array}

Entanglement isn’t limited to just two qubits, you can entangle any number of qubits together.

Entanglement has a special mathematical meaning. If you can represent a state of a group of qubits by a kronecker product, they are not entangled. If you CAN’T represent a state of a group of qubits by a kronecker product, they ARE entangled. These two things – entanglement and lack of kronecker product factorability (made that term up) – are the same thing.

As an example, what two vectors could you use the kronecker product on to get the entangled two qubit state 1/\sqrt{2}(|00\rangle+|11\rangle) (or in vector form [1/\sqrt{2}, 0, 0, 1/\sqrt{2}])? You’ll find there aren’t any! That state is the entangled state where the two qubits will always have the same value when you observe them.

Entangled qubits are nothing special in quantum circuits. You don’t have to take any special precautions when working with them. They are an interesting byproduct of quantum computing, and so basically are something neat that comes out of your quantum circuit, but they don’t require any extra thought when using them within your circuits. Don’t worry about them too much (:

Multi Qubit Gates

Let’s have a look at some multi qubit gates! (These are again from Wikipedia: Quantum Gate)

Swap Gate

Given two qubits, with possible states |00\rangle, |01\rangle, |10\rangle, |11\rangle, this gate swaps the amplitudes (probabilities) of |01\rangle,  |10\rangle and leaves the probabilities of the states |00\rangle and |11\rangle alone.

That might seem weird, but the probability of the |00\rangle state is the probability of the first qubit being 0 added to the probability of the second qubit being 0. If those probabilities swap, they still add to the same value, so this probability is unaffected. It’s the same situation for the |11\rangle state.

Here’s the matrix for the swap gate:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Square Root of Swap Gate

I don’t really understand a usage case for this gate myself, but it apparently does a half way swap between two qubits. That once again modifies the probabilities of the |01\rangle,  |10\rangle states, but leaves state |00\rangle and |11\rangle alone again, for the same reason as the swap gate.

Wikipedia says that if you have this gate, and the single qubit gates, that you can do universal quantum computation. In other words, you can build a fully functional quantum computer using just this and the single qubit gates.

Here’s the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \frac{1}{2}(1+i) & \frac{1}{2}(1-i) & 0 \\ 0 & \frac{1}{2}(1-i) & \frac{1}{2}(1+i) & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Controlled Not Gate

The controlled not (CNOT) gate flips the value of the second qubit if the first qubit is true. This is a logic / flow control type of gate.

Interestingly, this gate is also useful for creating entangled qubits, which you’ll be able to see lower down!

Here is the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}

The controlled not gate is also referred to as the quantum XOR gate. To see why, check out the truth table below for this gate:
\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 00 & 00 \\ 01 & 01 \\ 10 & 11 \\ 11 & 10 \\ \end{array}

If you look at the right most bit of the outputs, you’ll notice that it’s the XOR of the two input bits!

All quantum gates need to be reversible, and having these two qubits be the output of a hypothetical quantum XOR gate allows that to happen. If you look at the right most bit of the inputs, you’ll notice that it is also the XOR of the two output bits. It’s bidirectional, which is kind of weird and kind of interesting 😛

Generalized Control Gate

You can actually convert any single qubit gate into a controlled gate. That makes a 2 qubit gate which only does work on the second qubit if the first qubit is true.

How you do that is you make a 4×4 identity matrix, and make the lower 2×2 sub-matrix into the single qubit matrix you want to use.

In other words, if your single qubit matrix is this:
\begin{bmatrix} U_{00} & U_{01} \\ U_{10} & U_{11} \\ \end{bmatrix}

Then the controlled version of the matrix would be this:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & U_{00} & U_{01} \\ 0 & 0 & U_{10} & U_{11} \\ \end{bmatrix}

Pretty simple right?

Toffoli Gate

The Tofolli gate is a 3 qubit gate that is also known as the CCNOT gate or controlled controlled not gate. It flips the third qubit if the first two qubits are true.

It’s matrix looks pretty uninteresting:
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \end{bmatrix}

This is also known as the quantum AND gate. If you look at the truth table below, you’ll see that when the input has the right most qubit of 0, that in the output, the right most qubit will be the AND value of the two left qubits. We need the three bits of input mapping to three bits of output like this so that the gate is reversible.

\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 000 & 000 \\ 001 & 001 \\ 010 & 010 \\ 011 & 011 \\ 100 & 100 \\ 101 & 101 \\ 110 & 111 \\ 111 & 110 \\ \end{array}

Fredkin Gate

The Fredkin gate is a controlled swap gate. Here’s the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}

Just like we talked about how to make a generalized controlled gate acting on two qubits, you should be able to notice that this gate is just a specific version of a generalized controlled 3 qubit gate. You take an 8×8 identity matrix, and make the lower right 4×4 matrix be the 2 qubit gate that you want to add a control on. In this case, the lower right 4×4 matrix is just the swap gate we mentioned earlier in the post (:

Circuit To Entangle Qubits

The circuit to entangle qubits is pretty simple, so let’s start with that as the first quantum circuit we look at. It takes in two qubits. The first qubit is put through a Hadamard gate, and then both qubits are put through a controlled not gate. That circuit looks like this (image courtesy of Quantum Circuit Simulator):

The value you plug in for the second qubit determines what type of entanglement you get out. Setting the second qubit to 0, you will get entangled qubits which always agree when they are observed. Setting it to 1, you will get entangled qubits which always disagree when they are observed. The first qubit controls whether the phase of each state matches or mismatches. Note that these are the four Bell States (Wikipedia: Bell State).

\begin{array}{c|c|c} \text{Input} & \text{Output In Ket Notation} & \text{Output As Vector} \\ \hline 00 & \frac{1}{\sqrt{2}}(|00\rangle+|11\rangle) & [1/\sqrt{2},0,0,1/\sqrt{2}] \\ 01 & \frac{1}{\sqrt{2}}(|01\rangle+|10\rangle) & [0,1/\sqrt{2},1/\sqrt{2},0] \\ 10 & \frac{1}{\sqrt{2}}(|00\rangle-|11\rangle) & [1/\sqrt{2},0,0,-1/\sqrt{2}] \\ 11 & \frac{1}{\sqrt{2}}(|01\rangle-|10\rangle) & [0,1/\sqrt{2},-1/\sqrt{2},0]\\ \end{array}

In a quantum circuit, you can’t just apply a gate to an individual quabit at a time though. You have to make the matrix of your gate such that it applies to all qubits, applying the “identity” matrix to the qubits you don’t want to be affected by the gate.

So, how do we make a matrix that applies the Hadamard gate to qubit 1, and identity to qubit 2? You use the kronecker product!

Since we want to apply the Hadamard matrix to qubit 1 and identity to qubit 2, we are going to calculate H \otimes I (If we wanted to apply the Hadamard gate to qubit 2 and identity to qubit 1 we would calculate I \otimes H instead).

H \otimes I =  1/\sqrt{2}* \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix}

If you notice, the result of the kronecker product is just every value in the left matrix, multiplied by every value in the right matrix. Basically, the result is a 2×2 grid of identity matrices, where each of those identity matrices is multiplied by the corresponding value from the same cell in the left matrix. Since the left matrix has a 1 in all cells except the lower right, the same is true of the result… it’s a positive identity matrix in each cell, except the lower right one, which is a negative identity matrix. Hopefully that makes sense, it’s a lot easier than it sounds…

The second gate in the quantum circuit is the CNOT gate. We can actually multiply the gate we just made by the CNOT gate to represent the full quantum circuit as a single matrix. This uses regular matrix multiplication.

1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}

Lets plug some values into the circuit and see what comes out!

Lets start by plugging in 00. Our input qubits are [1, 0] and [1, 0]. The kronecker product of those two qubit vectors is [1, 0, 0, 0]. Now let’s multiply that vector by the matrix of our quantum circuit.

\begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix}

Comparing this to the table showing how our input maps to output, we got the right answer! If you plugged in the other values listed, you would get the rest of the bell state entanglements.

Unentangling Qubits

All gates are reversible, so all circuits are reversible. To get the reverse circuit for a given matrix, you just get the inverse of the matrix. Since quantum gates (and circuits) are unitary matrices, taking the inverse of one of these matrices just means taking the conjugate transpose of the matrix. In other words, you take the transpose of the matrix, and then just negate the imaginary component of any complex numbers. In this example, there are no imaginary numbers, so you just take the transpose of the matrix.

Since our circuit matrix is this:

\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}

That means that the inverse must be this, since this is the (conjugate) transpose.

\begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix}

Let’s try it out by putting the output from last section in and seeing what comes out the other side:

\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}

It worked! We got our original input back.

Making Matrices for More Complex Circuits

We talked about how to make single qubit gates apply to specific qubits by using the kronecker product with identity to move the gate to the right location.

If you have 4 qubits and you want to apply some single qubit gate (say Hadamard) to the 3rd qubit, you would just do this to get the matrix:
I \otimes I \otimes H \otimes I

What if you want to do complex things with multi qubit gates though? Like what if you want to do a controlled not gate where you want the 2nd qubit to flip it’s value if the 4th qubit was true?

The answer is actually pretty simple… you use swaps gates to flip the values of the qubits so that your qubit values line up with the inputs of the gate, then you do the gate, and finally undo all the swaps to get the qubit values back to the right positions.

We need to swap qubits so that the 4th qubit value is in the 1st qubit slot, and the 2nd qubit value is in the 2nd qubit slot. We can do that with the following swaps:

Once we have done those swaps, we can do our controlled not, then do the swaps in reverse order to return the qubits to their original positions.

Here’s what the circuit looks like:

You’ll see the simpler version in circuit diagrams, but at least now you’ll know how to make things that look like this:

Below is the mathematical way that you would get the matrix representing this gate.

I is the identity matrix, S is the swap gate, and C is the controlled not gate.

M = \\ (I \otimes I \otimes S) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (C \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (I \otimes I \otimes S) \\

Code

Here is some simple C++ to show both the entanglement circuit and the more complicated controlled not gate we described.

#include <stdio.h>
#include <vector>
#include <complex>
#include <assert.h>

typedef std::complex<float> TComplex;

//=================================================================================
struct SComplexVector
{
public:
    TComplex& Get (size_t i) { return v[i]; }

    const TComplex& Get (size_t i) const { return v[i]; }

    size_t Size() const
    {
        return v.size();
    }

    void Resize (size_t s)
    {
        v.resize(s);
    }

    void Print () const
    {
        printf("[");
        for (size_t i = 0, c = v.size(); i < c; ++i)
        {
            if (i > 0)
                printf(", ");
            const TComplex& val = Get(i);
            if (val.imag() == 0.0f)
            {
                if (val.real() == 0.0f)
                    printf("0");
                else if (val.real() == 1.0f)
                    printf("1");
                else
                    printf("%0.2f", val.real());
            }
            else
                printf("%0.2f + %0.2fi", val.real(), val.imag());
        }
        printf("]n");
    }

    std::vector<TComplex> v;
};

//=================================================================================
struct SComplexMatrix
{
public:
    TComplex& Get (size_t x, size_t y)
    {
        return v[y*Size() + x];
    }

    const TComplex& Get (size_t x, size_t y) const
    {
        return v[y*Size() + x];
    }

    // For an MxM matrix, this returns M
    size_t Size () const
    {
        size_t ret = (size_t)sqrt(v.size());
        assert(ret*ret == v.size());
        return ret;
    }

    // For an MxM matrix, this sets M
    void Resize(size_t s)
    {
        v.resize(s*s);
    }

    void Print() const
    {
        const size_t size = Size();

        for (size_t y = 0; y < size; ++y)
        {
            printf("[");
            for (size_t x = 0; x < size; ++x)
            {
                if (x > 0)
                    printf(", ");
                
                const TComplex& val = Get(x, y);
                if (val.imag() == 0.0f)
                {
                    if (val.real() == 0.0f)
                        printf("0");
                    else if (val.real() == 1.0f)
                        printf("1");
                    else
                        printf("%0.2f", val.real());
                }
                else
                    printf("%0.2f + %0.2fi", val.real(), val.imag());
            }
            printf("]n");
        }
    }

    std::vector<TComplex> v;
};

//=================================================================================
static const SComplexVector c_qubit0 = { { 1.0f, 0.0f } };  // false aka |0>
static const SComplexVector c_qubit1 = { { 0.0f, 1.0f } };  // true aka |1>

// 2x2 identity matrix
static const SComplexMatrix c_identity2x2 =
{
    {
        1.0f, 0.0f,
        0.0f, 1.0f,
    }
};

// Given the states |00>, |01>, |10>, |11>, swaps the |01> and |10> state
// If swapping the probabilities of two qubits, it won't affect the probabilities
// of them both being on or off since those add together.  It will swap the odds of
// only one of them being on.
static const SComplexMatrix c_swapGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 1.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    }
};

// Controlled not gate
// If the first qubit is true, flips the value of the second qubit
static const SComplexMatrix c_controlledNotGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f,
        0.0f, 0.0f, 1.0f, 0.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const SComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
SComplexVector KroneckerProduct (const SComplexVector& a, const SComplexVector& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexVector ret;
    ret.Resize(aSize*bSize);

    for (size_t i = 0, ic = aSize; i < ic; ++i)
    {
        for (size_t j = 0, jc = bSize; j < jc; ++j)
        {
            size_t n = i * bSize + j;
            ret.Get(n) = a.Get(i)*b.Get(j);
        }
    }
    return ret;
}

//=================================================================================
SComplexMatrix KroneckerProduct (const SComplexMatrix& a, const SComplexMatrix& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexMatrix ret;
    ret.Resize(aSize*bSize);

    for (size_t ax = 0; ax < aSize; ++ax)
    {
        for (size_t ay = 0; ay < aSize; ++ay)
        {
            const TComplex& aValue = a.Get(ax, ay);

            for (size_t bx = 0; bx < bSize; ++bx)
            {
                for (size_t by = 0; by < bSize; ++by)
                {
                    const TComplex& bValue = b.Get(bx, by);

                    size_t nx = ax*bSize + bx;
                    size_t ny = ay*bSize + by;

                    ret.Get(nx,ny) = aValue * bValue;
                }
            }
        }
    }

    return ret;
}

//=================================================================================
SComplexMatrix operator* (const SComplexMatrix& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexMatrix ret;
    ret.Resize(size);

    for (size_t nx = 0; nx < size; ++nx)
    {
        for (size_t ny = 0; ny < size; ++ny)
        {
            TComplex& val = ret.Get(nx, ny);
            val = 0.0f;
            for (size_t i = 0; i < size; ++i)
                val += a.Get(i, ny) * b.Get(nx, i);
        }
    }

    return ret;
}

//=================================================================================
SComplexVector operator* (const SComplexVector& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexVector ret;
    ret.Resize(size);

    for (size_t i = 0; i < size; ++i)
    {
        TComplex& val = ret.Get(i);
        val = 0;
        for (size_t j = 0; j < size; ++j)
            val += a.Get(j) * b.Get(i, j);
    }

    return ret;
}

//=================================================================================
int main (int argc, char **argv) {

    // 2 qubit entanglement circuit demo
    {
        // make the circuit
        const SComplexMatrix H1 = KroneckerProduct(c_hadamardGate, c_identity2x2);
        const SComplexMatrix circuit = H1 * c_controlledNotGate;

        // display the circuit
        printf("Entanglement circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(c_qubit0, c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit0, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }

    // 4 qubit demo: flip the second qubit if the fourth qubit is true
    {
        // make the circuit
        const SComplexMatrix cnot4qubit = KroneckerProduct(KroneckerProduct(c_controlledNotGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap12 = KroneckerProduct(KroneckerProduct(c_swapGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap23 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_swapGate), c_identity2x2);
        const SComplexMatrix swap34 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_identity2x2), c_swapGate);
        const SComplexMatrix circuit = 
            swap34 *
            swap23 *
            swap12 *
            swap23 *
            cnot4qubit *
            swap23 *
            swap12 *
            swap23 *
            swap34;

        // display the circuit
        printf("nFlip 2nd qubit if 4th qubit true circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }


    WaitForEnter();

    return 0;
}

Here is the first part of the output, which shows the results of the entangling circuit. You can see that the input gave the expected output Bell states:

Below is the second part of the output, which is the circuit that flips the 2nd qubit if the 4th qubit is true.

Each entry in the input and output vectors is the amplitude (probability) that the state it represents is true. The states start at the left with 0000, then 0001, then 0010 and continue until the very right most value which represents 1111.

If you look at the second input/output pair the input is [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0], which means it has a 100% chance of the 4 qubits being 0001 (aka the qubits represent the number 1). The output vector is [0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0], which means that it has a 100% chance of being 0101 (aka the qubits represent the number 5). Since the input did have the 4th bit set (4th from the left), it flipped the 2nd bit. So, you can see that it worked!

If you check all the input/output pairs, you’ll see that they all follow this rule.

We used only whole, real numbers, no using fractional probabilities, or imaginary amplitudes. What it does in those situations is a little bit harder to get intuition for, but rest assured that it does “the right thing” in those situations as well.

Next Up

Now that we have the basics of quantum computing down pretty well, it’s time to analyze a couple quantum algorithms to see how they work!

Quantum Computing For Programmers Part Ib: Bloch Sphere

If you read anything on quantum computing you are extremely likely to see the Bloch sphere, so it’s probably important to explain what it is and how it works.

Image from wikipedia: Wikipedia: Qubit

The Bloch sphere is a way of visually representing a qubit.

The north pole is the |0\rangle state, and the south pole is the |1\rangle state. The Z axis is vertical and runs from the |0\rangle to the |1\rangle state. A qubit’s state is represented by a point on the sphere.

If you read the last post, you might remember that there was a Pauli-Z gate which changed the phase of a qubit without changing it’s probability, as well as a more generalized version of the phase changing gate which also just rotated around the Z axis. Looking at the bloch sphere, you can see why that is true! Rotating a point around the Z axis moves it around the sphere horizontally, but it doesn’t make it any closer or farther to either the north or south pole. Since the distance to the poles is preserved, the probability of being one or the other stays the same, even though the phase changes.

The not gate, aka the Pauli-X gate, rotated a point around the X axis 180 degrees. Looking at the sphere, you could see how that would change a |1\rangle to a |0\rangle, or otherwise flip the probabilities & amplitudes of the states.

The Pauli-Y gate is a little more complex though (pun intended!) as it mapped |0\rangle to i|1\rangle and |1\rangle to -i|0\rangle by rotating 180 degrees around the Y axis. You may wonder how |1\rangle and i|1\rangle can both refer to the south pole, but the situation there is that every point EXCEPT the poles have a unique representation. So, they really do both refer to the south pole.

To find where a qubit sits on the sphere, the first step is to figure out the spherical coordinates theta (\theta) and phi (\phi) values, which you can then convert to a 3d point.

Start with whatever qubit you have in this form, where you know what the values for alpha (\alpha) and beta (\beta) are:
|\psi\rangle = \alpha|0\rangle + \beta|1\rangle

You can also write a qubit as:
|\psi\rangle = \cos\left(\tfrac{\theta}{2}\right) |0 \rangle \, + \, e^{i \phi}  \sin\left(\tfrac{\theta}{2}\right) |1 \rangle

Or like this, with less euler, and more trig:
|\psi\rangle = \cos\left(\tfrac{\theta}{2}\right) |0 \rangle \, +  \, ( \cos \phi + i \sin \phi) \, \sin\left(\tfrac{\theta}{2}\right) |1 \rangle

You can then set your known alpha and beta values to the trig (or euler) based equations:
\alpha = \cos\left(\tfrac{\theta}{2}\right)
\beta = \, ( \cos \phi + i \sin \phi) \, \sin\left(\tfrac{\theta}{2}\right)

and then solve for theta and phi (I wave my hands a bit here).

Once you have the theta and phi values figured out, you can convert that to a unit distance 3d point representing a point on the sphere by using the below, which describes the X,Y,Z components of the point:
(\sin \theta \cos \phi, \; \sin \theta  \sin \phi, \; \cos \theta)

Hopefully this explanation makes enough sense that if you encounter the bloch sphere, or things talking about rotations on the bloch sphere, you won’t be too lost 😛

Ok… time to move on to multiple qubit quantum circuits so we can do more interesting things and perhaps analyze the most well known quantum algorithms!

Quantum Computing For Programmers Part I: One Qubit

Are you a programmer? Do you have interest in learning how quantum computing works? Does hard core math and crazy abstract “philosophical” type physics questions about the nature of reality make you feel like you could never understand quantum computing?

If so, me too, welcome to the club! We are programmers, not mathematicians or physicists, but we are masters of our realm – computation and logic. With that, I say that quantum computing is ultimately meant for us, since it is computing and programming, so lets figure it out and do some interesting stuff 😛

Before we start, I want to mention that these posts should be able to be read stand alone, but if you want to get into the deeper details, you probably want to give this list of references I’m using a read at some point. Perhaps before reading my posts, or after, or both!
Quantum Computing References

Lastly, to be clear, quantum computers seem like they are a long way off from being common place objects. These posts will show how quantum computers work, and allow you to simulate quantum computers on your own computer with some simple C++. The code won’t run as fast as it would on a real quantum computer (obviously!), but it should let you better understand how Quantum Computing (QC) works and let you explore and experiment on your own machine.

Probabilities (aka SuperPositons)

At the core of QC is the qubit. A qubit is like a regular bit, in that it can have the value of 0 or 1, but unlike a regular bit, it can also be somewhere in the middle where it has a probability of being in each state.

That might sound strange, but think of a qubit like a coin. It can be heads up, or it can be heads down, or it can be in the air, waiting to land heads up or heads down.

When the coin is in the air, if it is perfectly balanced, it has a 50/50 chance of becoming heads or tails.

If the coin is not perfectly balanced (no real coin can be perfectly balanced!), it will have some bias towards heads or towards tails so will not be a pure 50/50 chance of heads or tails.

While the coin is on the ground, it is either a heads (0) or a tails (1), but while it’s in the air, you can imagine that it is a superposition of 0 and 1. When it lands, that superposition will collapse into a 0 or 1 value, which is also what happens when you “observe” or measure a qubit.

One interesting thing about quantum computing, and where part of it’s power comes from is that you can perform logic on superpositional values in a way that when you have your final superpositional result, you can measure it and get the result from your logic circuit as if the values had been fixed all the way through the system, instead of being in superpositions.

The bummer about this though is that since it’s all based on probabilities, your answer has a chance of being not the value you wanted to get. Quantum circuits / quantum algorithms will try to combat this by doing operations iteratively to make the probability of the desired answer go up, and the probability of the undesired answers go down.

They can also run the quantum algorithm multiple times, to take multiple samples and be more sure about their results.

You still with me? Pretty simple so far right?

Before moving on I also wanted to mention that regular computers – like the one you are using right now – also have a greater than zero chance that they will give you the wrong answer to a calculation. The error rate can increase when the components are too hot, but even otherwise, a stray particle of background radiation could hit your cpu and flip a bit, giving you the wrong answer. You could fight this by doing every calculation 3 times and going with majority rules, but even then, if two bits get flipped, the majority will be wrong. Nothing you can do can completely eliminate the random chance that your calculations will come up with the wrong answer, but the chances are low enough that we rely on computers to do the right things, even in the most critical of situations (like, not starting a thermonuclear war).

With quantum computers you could get that same level of assurance. You can never completely eliminate the chance that a calculation is wrong, but you can definitely be sure with as much accuracy as you can with a standard computer, so that shouldn’t really be an issue in practice, for the cases where you need “certainty”.

Probability Vectors (aka Amplitude Vectors)

Using the coin flip scenario above, we could describe the probability of a coin landing heads or tails as a vector where the first element is the chance that it will land heads, and the second element is the chance that it will land tails.

A perfectly balanced coin would look like this for instance: [0.5,0.5]

If the coin was 75% likely to land heads, and only 25% likely to land tails the vector would look like this: [0.75,0.25]

You might notice that the elements of the vector have to add up to 1. This makes sense because there are only two options and one of the two MUST happen (unless it lands perfectly on edge), so they better add up to 1! The technical term for this is that the vector must have an L1 Norm of 1.

Qubits describe their probabilities a little bit differently though. Instead of storing the probability of them being 0 or 1, they store what’s called the amplitude, which when squared gives the probability. Doing this lets them do something really important that I’ll talk about lower down, but for now, you can hopefully just accept that this is how it works.

Consider a qubit having a 50% chance that it’s a 0, and a 50% chance that it’s a 1. We need some number that when squared gives 0.5 aka 1/2. That value is 1/\sqrt{2}.

So, for a qubit that has an even chance of being a 0 or 1, a vector to represent it is [1/\sqrt{2}, 1/\sqrt{2}].

Then, a vector describing a qubit that has a 75% chance being a 0 and a 25% chance of being a 1 would be [\sqrt{3}/2, 1/2]. If you square the first number you can see that you get 3/4 (75% chance) and squaring the second number gives you 1/4 (25% chance).

I got those vectors by solving the simple equation:
x^2 = P

Where P is the desired probability from 0 to 1.

You might now notice that the length of a qubit amplitude vector has to be 1. In other words, it’s a normalized vector. The technical term for this is that the vector must have an L2 norm of 1.

Still nothing too crazy going on…

Ket Notation

There is some notation you’ll come across a lot when reading about quantum computing / mechanics / etc that looks like this:
1/\sqrt{2}(|0\rangle + |1\rangle)

That is “Ket Notation” and comes from “Bra-Ket Notation”. It’s bark is worse than it’s bite.

If you multiply it out, you get this:
1/\sqrt{2} * |0\rangle + 1/\sqrt{2} * |1\rangle

All that means is that the “0” state of the vector has an amplitude of 1/\sqrt{2} and so does the “1” state. Writing that as a vector, we get what we already saw above for the qubit which had 50/50 odds of being a 0 or a 1:
[1/\sqrt{2}, 1/\sqrt{2}]

Here is the ket notation for the qubit which had a 75% chance of being a 0 and a 25% chance of being a 1:
1/2(\sqrt{3}|0\rangle + |1\rangle)

You can multiply that out to get this:
\sqrt{3}/2*|0\rangle + 1/2*|1\rangle

Which translates to the vector we saw before:
[\sqrt{3}/2, 1/2]

Lastly, you might see it shown like this:
|0\rangle

All that means is that the “0” state is 1. Since the “1” state isn’t listed, it has a value of 0. That is the same as this vector:
[1, 0]

Ket notation is used because you only have to list the states that actually have values, and not the ones that have zeros, making a more compact representation. That will be more useful when we move into a larger number of qubits.

Note that the mapping of which state belongs to which spot in the vector is completely arbitrary. I chose to make the left value be the “0” state and the right value be the “1” state, but it really doesn’t matter how you define it, so long as you are consistent.

Phase

Phase is the reason that qubits store amplitude, which square to probabilities, instead of storing probabilities themselves.

I said that the amplitude vector for a qubit which has a 50/50 chance of being a 0 or a 1 was this:
[1/\sqrt{2}, 1/\sqrt{2}]

I lied a bit though, and that is only one possible answer. Here’s another that has a 50/50 chance:
[-1/\sqrt{2}, 1/\sqrt{2}]

This gets interesting when you add two vectors together. First lets add two amplitude vectors which have the same phases for each state:
[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, 1/\sqrt{2}] = [\sqrt{2}, \sqrt{2}]

Each component added together.

Now, let’s add two amplitude vectors where the |1\rangle state has opposite phase in one of the vectors:
[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, -1/\sqrt{2}] = [\sqrt{2}, 0]

You can see that the |0\rangle states added together while the |1\rangle states subtracted and resulted in zero.

This is called destructive interference, and is one of the things that makes quantum computing powerful. We can change a qubit’s phase without affecting it’s value (or probability of values).

I left out another important amplitude vector for describing a qubit that has a 50/50 chance of being a 0 or a 1:
[1/\sqrt{2}, 1/\sqrt{-2}]

Which can also be written as:
[1/\sqrt{2}, 1/(\sqrt{2}*i)]

Yep, that is an imaginary number in the 1 state!

Since amplitude squared is probability, it means we can also use imaginary numbers for amplitude vector values. If you know that the term phase relates to angles and rotations, you might want to have a look at these two things to follow the rabbit hole a bit deeper. Totally optional (;
Using Imaginary Numbers To Rotate 2D Vectors
Wikipedia: Bloch Sphere

However, since imaginary numbers can be involved, squaring amplitudes to get probabilities isn’t enough. Like for example with the vector [0,i], if you square the |1\rangle state to get the probability, you get -1, or -100%. That isn’t right!

The real operation to getting probability from amplitude is you multiply the amplitude by it’s complex conjugate. In other words, if you have a complex number, you multiply the imaginary part by -1 to get the conjugate, and multiply by that.

As an example, if you have 3+5i, the complex conjugate is 3-5i.

In the example of the vector [0,i], you get the probability of the |1\rangle state by multiplying i by -i to get 1, or 100%.

Complex conjugate sounds scary, but hopefully you can see it isn’t as scary as it sounds. You just flip the sign of the imaginary component of the complex number.

Common One Qubit Logic Gates

It’s finally time to dive into some logic gates so we can actually do some quantum programming!

Real quickly I want to mention that we are going to be simulating quantum computing by multiplying the qubit amplitude vectors by matrices. The matrices themselves are unitary matrices, which preserves the length of the vectors (keeping them normalized).

One requirement of quantum gates is that they must be reversible, and unitary matrices are also reversible.

The matrices are square matrices and their dimensions will always be 2^N by 2^N where N is the number of qubits involved in the quantum circuit. With a single qubit, that means we will be working with 2×2 matrices which is fairly small.

When working with more qubits the size of the matrix grows very quickly though. Quantum computing is very fast at doing these “matrix multiplies” and is part of where their power comes from. Large matrix multiplies will be slow for us on regular (classical) computers, but the quantum computers would be able to do them very quickly. That’s where the difference is in our simulations versus the real deal.

Anyways, let’s get onto some specific logic gates! (These are from Wikipedia: Quantum Gate)

Not Gate (Pauli-X Gate)

This is a not gate, but is also described as a 180 degree rotation around the x axis of the bloch sphere.

\begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}

You might notice that it looks like a backwards identity matrix. That is exactly how it works too… if you multiply a 2 element vector by that matrix, it will flip the elements in that vector!

In the case of our qubit amplitude vectors, this has the result of swapping the probability that the qubit will be 0, with the probability that the qubit will be 1.

If the qubit is not in a superposition, and instead is actually just a 0 or a 1, it will act exactly like a traditional NOT gate and flip it from one value to the other.

If the qubit is in a superposition, it will just flip the probabilities of each state.

This maps the |0\rangle state to |1\rangle and vice versa.

Pauli-Y Gate

This is described as a 180 degree rotation around the y axis of the bloch sphere, and maps |0\rangle to i|1\rangle and |1\rangle to -i|0\rangle.

\begin{bmatrix} 0 & -i \\ i & 0 \\ \end{bmatrix}

It’s like a NOT but also adjusts phase.

Pauli-Z Gate

This is described as a 180 degree rotation around the z axis of the bloch sphere. All it does in practice is negate the phase of the |1\rangle state.

\begin{bmatrix} 1 & 0 \\ 0 & -1 \\ \end{bmatrix}

General Phase Adjustment Gates

You can adjust the phase of individual gates by using a matrix like this one, which is a more general version of the Pauli-Z gate, and adjusts the phase of of the |1\rangle state to whatever value is desired, without affecting the probabilities of the qubit values.

\begin{bmatrix} 1 & 0 \\ 0 & e^{i\theta} \\ \end{bmatrix}

The value e^{i\theta} ends up being a complex number, that can also be written like this:
cos(\theta)+i sin(\theta)

where \theta is any angle in radians.

You’ll find that circuits commonly only adjust the |1\rangle state. This is because the absolute phase of the individual states don’t matter (since they don’t affect probabilities), but what does matter is the difference in phase between the |0\rangle state and the |1\rangle state, since that can cause constructive or destructive interference. For this reason, you’ll often see the |0\rangle state without a phase value, and only the |1\rangle state will get it’s phase modified during calculations.

Hadamard Gate (Interesting Gate!)

Ok so the NOT gate is somewhat useful. You are probably thinking the others might come in handy if you need to adjust the phase of a qubit, but so far, nothing has been that spectacular for single qubit gates.

Well, here’s the more interesting gate.

What this gate does is if you pass it a pure 0 or 1, it will output a value that has a 50% chance of being a 0 or a 1.

The interesting part happens when you put that output through a Hadamard gate again – you get the original value of 0 or 1 out!

It can do this because it stores the info about the original value as phase information. When the input value is |0\rangle, the Hadamard gate outputs [1/\sqrt{2}, 1/\sqrt{2}] which has matching phases for each state. When the input value is |1\rangle, the gate outputs [1/\sqrt{2}, -1/\sqrt{2}] which has opposite phases for each state.

1/\sqrt{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix}

This gate is a crucial part in both Grover’s algorithm – which can search an unsorted list in O(\sqrt{N}), as well as in Shor’s algorithm, which can factor large numbers much faster than classical computers, putting certain types of cryptography at risk for being cracked by quantum computers.

It has some other uses too, which you can check out here:
Does a Hadamard Gate have uses outside of pure and evenly mixed states?

Code

Here is some example code to show some single qubit circuits in action.

#include <stdio.h>
#include <array>
#include <complex>

typedef std::array<std::complex<float>, 2> TQubit;
typedef std::array<std::complex<float>, 4> TComplexMatrix;
const float c_pi = 3.14159265359f;

//=================================================================================
static const TQubit c_qubit0 = { 1.0f, 0.0f };  // false aka |0>
static const TQubit c_qubit1 = { 0.0f, 1.0f };  // true aka |1>
static const TQubit c_qubit01_0deg = { 1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f) }; // 50% true. 0 degree phase
static const TQubit c_qubit01_180deg = { 1.0f / std::sqrt(2.0f), -1.0f / std::sqrt(2.0f) }; // 50% true. 180 degree phase

// A not gate AKA Pauli-X gate
// Flips false and true probabilities (amplitudes)
// Maps |0> to |1>, and |1> to |0>
// Rotates PI radians around the x axis of the Bloch Sphere
static const TComplexMatrix c_notGate =
{
    {
        0.0f, 1.0f,
        1.0f, 0.0f
    }
};
static const TComplexMatrix c_pauliXGate = c_notGate;

// Pauli-Y gate
// Maps |0> to i|1>, and |1> to -i|0>
// Rotates PI radians around the y axis of the Bloch Sphere
static const TComplexMatrix c_pauliYGate =
{
    {
        { 0.0f, 0.0f }, { 0.0f, -1.0f },
        { 0.0f, 1.0f }, { 0.0f, 0.0f }
    }
};

// Pauli-Z gate
// Negates the phase of the |1> state
// Rotates PI radians around the z axis of the Bloch Sphere
static const TComplexMatrix c_pauliZGate =
{
    {
        1.0f, 0.0f,
        0.0f, -1.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const TComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
TQubit ApplyGate (const TQubit& qubit, const TComplexMatrix& gate)
{
    // multiply qubit amplitude vector by unitary gate matrix
    return 
    {
        qubit[0] * gate[0] + qubit[1] * gate[1],
        qubit[0] * gate[2] + qubit[1] * gate[3]
    };
}

//=================================================================================
int ProbabilityOfBeingTrue (const TQubit& qubit)
{
    float prob = std::round((qubit[1] * std::conj(qubit[1])).real() * 100.0f);
    return int(prob);
}

//=================================================================================
TComplexMatrix MakePhaseAdjustmentGate (float radians)
{
    // This makes a gate like this:
    //
    // [ 1  0             ]
    // [ 0  e^(i*radians) ]
    //
    // The gate will adjust the phase of the |1> state by the specified amount.
    // A more general version of the pauli-z gate

    return
    {
        {
            1.0f, 0.0f,
            0.0f, std::exp(std::complex<float>(0.0f,1.0f) * radians)
        }
    };
}

//=================================================================================
void Print (const TQubit& qubit)
{
    printf("[(%0.2f, %0.2fi), (%0.2f, %0.2fi)] %i%% true",
        qubit[0].real(), qubit[0].imag(),
        qubit[1].real(), qubit[1].imag(),
        ProbabilityOfBeingTrue(qubit));
}

//=================================================================================
int main (int argc, char **argv)
{
    // Not Gate
    {
        printf("Not gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn");
    }

    // Pauli-y gate
    {
        printf("Pauli-y gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn");
    }

    // Pauli-z gate
    {
        printf("Pauli-z gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn");
    }

    // 45 degree phase adjustment gate
    {
        printf("45 degree phase gate:n  ");
        TComplexMatrix gate = MakePhaseAdjustmentGate(c_pi / 4.0f);

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn");
    }

    // Hadamard gate
    {
        printf("Hadamard gate round trip:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn");
    }

    // 1 bit circuit
    // Hadamard -> Pauli-Z -> Hadamard
    {
        printf("Circuit Hadamard->Pauli-Z->Hadamard:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n");
    }

    WaitForEnter();

    return 0;
}

Here’s the output of the program, showing the qubit circuits in action:

Next Up

In the next part, we’ll look at how multiple qubits work together in quantum circuits so we can do more interesting things. We might need a quick post explaining the Bloch sphere before that though. It’s a fairly important thing, but this post is already enough to digest, so I didn’t include it.

Quantum Computing References

I’m in the middle of some research to better understand quantum computing so that I can write a short series of blog posts entitled “Quantum Computing for Programmers”.

These posts will be light on – and possibly completely absent of – hardcore math and physics, and instead speak more to programmers. It will aim to show the rules of quantum computing, explain which operations are fast and which aren’t, show the basic building blocks of quantum logic, combine those simple gates into more complicated quantum circuits, and most importantly it will have simple sample C++ code which shows this stuff in action, so you can do your own (simulated) quantum computing experimentation and exploration on your own computer.

This page is a list of the references I’ve found so far, and I’ll keep updating it as I find new, useful references.

Gentle Introduction

Ars Technica – A tale of two qubits: how quantum computers work

Technical Details, Very Well Explained

Twisted Oak Studios – What Quantum Computers Do Faster, with Caveats

Twisted Oak Studios – Grover’s Quantum Search Algorithm

A Bit More Advanced, Still Very Readable

Scott Aaronson – Lecture 9: Quantum

Scott Aaronson – Lecture 10: Quantum Computing

Good Info, More Mathy

Math ∩ Programming – A Motivation for Quantum Computing

Math ∩ Programming – The Quantum Bit

Math ∩ Programming – Multiple Qubits and the Quantum Circuit

Useful Wikipedia Pages

Wikipedia – Commonly Used Quantum Gates

Wikipedia – Quantum Circuit

Wikipedia – Deutsch–Jozsa algorithm

Quantum Algorithm Stuff

A Quantum Network Flow Puzzle

Building your own Quantum Fourier Transform

An interactive page that shows you how quantum pseudo telepathy is not communication

Twisted Oak Studios – Implementing Quantum Pseudo-Telepathy

Other

An article on some people who are working on emulating (not simulating) quantum computers using sound waves

A Photonic C-NOT Gate Breakthrough for Quantum Computing

Wolfram Alpha Demonstrations: Generating Entangled Qubits

Wikipedia: Simon’s problem

Quantum Circuit Simulator in a Web Page

Solving Nested Modulus Equations

x \% 2 = 1

The above equation is pretty easy to solve, it just means that x is any value that when you divide by 2, gives a remainder of 1. x is all odd numbers. The more formal answer is:

x \equiv 1 + 2\mathbb{Z}

That reads as “x is equivalent to 1 plus 2 times any integer”. x has infinitely many solutions, so long as they fit that constraint.

That’s easy enough to figure out, but how would you solve this equation?

(x \% 5) \%2 = 1

or this one?

((x \% 7) \% 5) \%2 = 1

One Level (Simple Case)

There is something called the quotient remainder theorem that lets us change an equation like this:

A \% B = C

Into one like this:

A \equiv C + B \mathbb{Z}

The Z means “all integers” which implies that there are infinitely many solutions to A. This reads as “A is equivelant to C plus B times any integer”.

Let’s work out a couple examples.

x \% 5 = 3

That transforms to:

x \equiv 3 + 5 \mathbb{Z}

If you try plugging any positive or negative integer in place of Z, you’ll see that you get a number that satisfies the first equation.

Here’s another one:

x \% 538 = 211

That transforms to:

x \equiv 211 + 538 \mathbb{Z}

Once again you can plug any positive or negative integer in for Z and get a value for x that satisfies the original equation.

Pretty easy and pretty handy right? Let’s get a little more complex.

Two Levels

Let’s say you want to solve this equation from the intro.

(x \% 5) \%2 = 1

The first thing you might do is transform it to the below.

x \% 5 \equiv 1 + 2 N_1 where N_1 \in \mathbb{Z}.

What now? Can we transform it again? If we do, we get this:

x \equiv 1 + 2 N_1+ 5 N_2 where N_i \in \mathbb{Z}.

plugging 1 in for the N1 and N2, we get 8, which does satisfy the original equation.

If we plug 3 in for N1 and 1 in for N2 though, we get 12 which DOESN’T satisfy the original equation. What gives??

Well it turns out there are subtle restrictions on our equations that got lost in the shuffle. Let’s start over…

(x \% 5) \%2 = 1

In effect, what that equation says is “Something divided by two gives a remainder of 1” where the something happens to be x % 5, but we don’t really care about what the something actually is right now.

The equation also implies that the right side of the equation is a value modulo two, and it’s only valid values are {0,1}. Another way to write this is to say that the right side \in \mathbb Z_2 which reads as “in all integers mod 2”.

1 is obviously in the set of valid values {0,1}, so we don’t need any special notation just yet, but let’s keep it in mind as we move onto the next step.

x \% 5 \equiv 1 + 2 \mathbb Z

The above says that x divided by 5 gives a remainder that is 1 plus two times all integers. However there is a catch. There right side of the equation is a mod 5 value, which means it’s only valid values are {0,1,2,3,4}. In other words, the right side of the equation is \in \mathbb Z_5.

This has an effect of implicitly limiting the valid values that we can plug in to Z. It limits it to values of Z where 1+2*Z is in {0,1,2,3,4}. Let’s write this out a little more formally.

x \% 5 \equiv a where a = 1 + 2 \mathbb Z and a \in \mathbb Z_5

Let’s transform the equation again to solve for x:

x \equiv a + 5\mathbb Z where a = 1 + 2 \mathbb Z and a \in \mathbb Z_5

Note that there is no modulus on the left side of the equation now, so the right side of the equation is unlimited into the valid values it can have.

How do we use this resulting formula though to find valid values of x?

First we figure out the valid values of a. a = 1 + 2 \mathbb Z where a \in \mathbb Z_5, so it’s only valid values have to be a subset of {0,1,2,3,4}.

Plugging a -1 in for Z, we get a value for a of -1, which isn’t a valid answer so we throw it away.

Plugging a 0 in for Z, we get a value for a of 1. That is a valid answer so we keep it!

Plugging a 1 in gives us 3 which is valid, so we keep that too.

Plugging a 2 in gives us a value of 5, which isn’t valid, so we throw it away and know that we are done.

Our values for a are {1,3}.

The solution we got was this:
x \equiv a + 5\mathbb Z

Plugging in each value of a means that we get two equations as a solution for x. Valid values of x are the union of these two lists – both equations give valid answers.

x \equiv 1 + 5\mathbb Z
x \equiv 3 + 5\mathbb Z

The first equation gives us values of {…, -4, 1, 6, 11, …}
The second equation gives us values of {…, -2, 3, 8, 13, …}

Taking the union of those, our solutions for x are:
{…, -4, -2, 1, 3, 6, 8, 11, 13, …}

If you plug those into the original equation (x \% 5) \%2 = 1, you’ll see that they are valid solutions! Those equations also represent ALL solutions, so they aren’t only valid, they are also complete.

Let’s step it up just one more notch.

Three Levels (Boss Mode)

Let’s solve the hardest equation from the intro:

((x \% 7) \% 5) \%2 = 1

This starts off just like the two leveled equation. Something on the left mod 2 is equation to 1. The 1 on the right side is \in \mathbb Z_2, but since that’s obviously true for this constant, we don’t need to do anything special. So, we transform it to the below:

(x \% 7) \% 5 \equiv a
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5

Then we do another transformation:
x \% 7 \equiv b
b = a + 5 \mathbb Z
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5
b \in \mathbb Z_7

Then for the last transformation, we get this:
x \equiv b + 7 \mathbb Z
b = a + 5 \mathbb Z
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5
b \in \mathbb Z_7

Now that we have our equations worked out, we need to start finding out what the values actually are. We start with a because it’s what b and x are based on, and is made up of constants. We get {1,3} again as the valid values of a. That gives us:
x \equiv b + 7 \mathbb Z
b = a + 5 \mathbb Z
a = [1,3]
b \in \mathbb Z_7

Now we want to plug each value of a into b and find all the valid values for b. When we plug 1 into b for a, it becomes b = 1 + 5 \mathbb Z, b \in \mathbb Z_7. The valid values for that are {1,6}.

When we plug 3 into b for a, it becomes b = 3 + 5 \mathbb Z, b \in \mathbb Z_7. The only valid value there is {3}.

That means that our valid values for b are {1,3,6}. It’s the union of the valid values we found for each value of a.

That makes our equations into this:
x \equiv b + 7 \mathbb Z
b = [1,3,6]

If we plug those values of b into the equation for x, we get these three equations:

x \equiv 1 + 7 \mathbb Z
x \equiv 3 + 7 \mathbb Z
x \equiv 6 + 7 \mathbb Z

The first equation gives us some x values of {…, -6, 1, 8, 15, …}. The second equation gives us x values of {…, -4, 3, 10, 17, …}. The third equation gives us x values of {…, -1, 6, 13, 20, …}.

Taking the union of all of those, we get…
{…, -6, -4, -1, 1, 3, 6, 8, 10, 13, 15, 17, 20, …}

If you plug those numbers into the original equation ((x \% 7) \% 5) \%2 = 1, you can see that they are valid solutions!

When you are performing this algorithm, if you ever hit a case where there are no valid answers in one of the steps – like say, there were no valid answers for equation b in the above – that means that there is no solution to your equation.

Sample Code

Since solving these equations can be pretty manual and tedious, here is some code that can solve these equations for you. If you were confused at all by the explanation above, the code may also be able to show you how it works better, especially if you step through it and see what it’s doing.

#include <stdio.h>
#include <array>
#include <vector>

//=================================================================================
void WaitForEnter()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
void AddSolutions(std::vector<int>& solutions, int add, int multiply, int mod)
{
    int Z = 0;
    while (1)
    {
        int value = multiply * Z + add;
        if (value < mod)
            solutions.push_back(value);
        else
            return;
        ++Z;
    }
}

//=================================================================================
void AddSolutionsFromSolutions (std::vector<int>& solutions, const std::vector<int>& adds, int multiply, int mod)
{
    std::for_each(
        adds.begin(),
        adds.end(),
        [&solutions, multiply, mod] (int add)
        {
            AddSolutions(solutions, add, multiply, mod);
        }
    );
}

//=================================================================================
int main(int argc, char **argv)
{
    // a,b,c,d...
    // ((x % a) % b) % c = d
    // etc.
    //const int c_values[] = {7, 5, 2, 1};
    const int c_values[] = { 12, 9, 7, 5, 3, 2 };
    const size_t c_numValues = sizeof(c_values) / sizeof(c_values[0]);
    
    // print the equation
    printf("Solving for x:n");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf("(");
    printf("x");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf(" %% %i)", c_values[i]);
    printf(" = %inn", c_values[c_numValues-1]);

    // print the solution equations
    for (size_t i = 0; i < c_numValues - 2; ++i)
    {
        char eqn = i > 0 ? 'A' + i - 1 : 'x';
        char eq = i > 0 ? '=' : 0xF0;

        printf("%c %c %c + %i*Z", eqn, eq, 'B' + i - 1, c_values[i]);
        if (i > 0)
            printf(" (in Z_%i)n", c_values[i-1]);
        else
            printf(" (in Z)n");
    }
    printf("%c = %i + %i*Z (in Z_%i)nn", 'A' + c_numValues - 2 - 1, c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);

    // gather up the permutation of solutions for each equation, starting with the lowest equation which has only constants
    std::array<std::vector<int>, c_numValues - 2> solutions;
    AddSolutions(solutions[c_numValues - 3], c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);
    for (size_t i = c_numValues - 3; i > 0; --i)
        AddSolutionsFromSolutions(solutions[i-1], solutions[i], c_values[i], c_values[i-1]);

    // Detect empty set
    if (solutions[0].size() == 0)
    {
        printf("No solutions!n");
        WaitForEnter();
        return 0;
    }

    // Print the more specific solution equations
    printf("x = ");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        if (i < c - 1)
            printf("%c U ", 'a' + i);
        else
            printf("%cn", 'a' + i);
    }
    std::sort(solutions[0].begin(), solutions[0].end());
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
        printf("%c = %i + %iZn", 'a' + i, solutions[0][i], c_values[0]);
  
    // Print specific examples of solutions (first few numbers in each)
    std::vector<int> xValues;
    printf("n");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        printf("%c = {..., ", 'a' + i);
        for (int z = 0; z < 3; ++z)
        {
            printf("%i, ", solutions[0][i] + z * c_values[0]);
            xValues.push_back(solutions[0][i] + z * c_values[0]);
        }
        printf("...}n");
    }

    // Show the list of specific values of X
    std::sort(xValues.begin(), xValues.end());
    printf("nx = {..., ");
    std::for_each(xValues.begin(), xValues.end(), [](int v) {printf("%i, ", v); });
    printf("...}n");

    // Test the solutions to verify that they are valid!
    bool valuesOK = true;
    for (size_t i = 0, c = xValues.size(); i < c; ++i)
    {
        int value = xValues[i];
        for (size_t j = 0; j < c_numValues - 1; ++j)
            value = value % c_values[j];

        if (value != c_values[c_numValues - 1])
        {
            printf("Solution %i is invalid!!n", xValues[i]);
            valuesOK = false;
        }
    }
    if (valuesOK)
        printf("nAll solutions shown tested valid!n");

    WaitForEnter();
    return 0;
}

Here’s an example run, where it solves a 5 level equation.

Links

If you know of a better way to solve this type of equation let me know. This is what I found when trying to figure this stuff out, but it doesn’t mean it’s the only way to do it.

The one thing I don’t like about this technique is that when you find solutions, it’s very manual, and very specific to the values used. Imagine if one of the modulus divisors was a variable instead of them all being constants. How would you solve it then? I’m not really sure…

Have some links!

Math Stack Exchange: How to solve nested congruences?

Khan Academy: The Quotient Remainder Theorem

Solving Simultaneous Congruences (Chinese Remainder Theorem)

x \equiv 2 \pmod{3}

The equation above is a congruence. What it says is that x % 3 is 2.

The equals sign with three bars means “is equivalent to”, so more literally what the equation says is “x is equivalent to 2, when we are looking at only the integers mod 3”.

5 is a solution, so is 8, so is 11 and so is -1!

The general solution for x is x = 2 +3*N where N is any integer, aka N \in \mathbb{Z}.

What if you were trying to find an integer value x that satisfied the multiple congruences below? How would you solve it?

x \equiv 2 \pmod{3}
x \equiv 2 \pmod{4}
x \equiv 1 \pmod{5}

You could use brute force, but you’d have to check every value from 0 to 60 (not including 60) since 60 is the least common multiple (In this case, you calculate LCM 3*4*5 = 60, since the numbers have a greatest common divisor of 1, aka they are co-prime).

As the number of equations grew, or the mod values were larger, brute force could grow to be a large problem very quickly.

Well luckily there is a better way called the Chinese Remainder Theorem.

BTW – the answer is 26. Or 26 mod 60 more correctly, or 26 + 60*N where N is any integer.

The Chinese Remainder Theorem

The CRT was first published sometime in the 3rd-5th centuries by Sun Tzu – but not the Sun Tzu that wrote “The Art of War”, that was a different person. It was later proven to be true by Gauss in 1801.

When trying to learn the CRT, I found this amazing video that explains it extremely well. You should give it a look: The Chinese Remainder Theorem made easy.

In that video, it talks about modular inversion. You can read about the details of that in the last post I made: Modular Multiplicative Inverse.

An Example

Let’s work through an example that you can follow along with just to make sure you understand it, since I pointed you elsewhere for explanations 😛

Here is our set of congruences:
x \equiv 5 \pmod{7}
x \equiv 8 \pmod{11}
x \equiv 2 \pmod{3}

First up, we want to break it into 3 sections, combined with addition.
x = \_\_\_\_\_ + \_\_\_\_\_ + \_\_\_\_\_

In the above, the first empty space will be the solution to the first equation, the second empty space will be the solution to the second equation, and the third empty space will be the solution to the third equation.

Each section is going to get a co-efficient which is the mods for every equation multiplied together, except the one we are trying to solve. We still have an unknown though per term that we are going to solve in a moment.

x = 11*3*N_1 + 7*3*N_2 + 7*11*N_3
or
x = 33*N_1 + 21*N_2 + 77*N_3

The reason we make those coefficients is because it isolates our terms so that we can solve each equation individually then combine them to get the combined solution.

This works because you can see that in the first term, no matter what we end up choosing for N_1, it will always be a perfect multiple of 3 and 11, which means that the value will be zero in the other terms / other equations, so won’t affect whatever value we come up for them.

Next up, we need to solve for the N’s.

The first term needs to have an N_1 such that 33*N_1 \pmod 7 = 5.

How do we solve that? We could use brute force, testing every number 0 through 6 (otherwise known as 7-1), but in the last post we showed a better way to do it using the extended euclidean algorithm. So, that link once again is: Modular Multiplicative Inverse.

The answer when solving using inversion comes out to be 15. Note that our answer is in “mod 7” space, so you could use any value 1+7*I where I \in \mathbb{Z}, but we can stick to using 15 to make it easier to follow.

That gives us this:
x = 33*15 + 21*N_2 + 77*N_3
or
x = 495 + 21*N_2 + 77*N_3

The value of N2 and N3 end up being -8 and -2 respectively. That gives us:
x = 33*15 + 21*-8 + 77*-2
or
x = 495 - 168 - 154
or
x = 173

Since this is just one of many solutions, the real answer is:
x = 173 \pmod{231}
or
x = 173 + 231*N where N \in \mathbb{Z}

Let’s check our result and see if we got it right…

173 % 7 is indeed 5.
173 % 11 is 8.
173 % 3 is 2.

Woot, it worked!

Example Code

#include <stdio.h>
#include <algorithm>
#include <array>

// x = a mod m
struct SEquation
{
    int a;
    int m;
};
 
//=================================================================================
unsigned int ExtendedEuclidianAlgorithm (int smaller, int larger, int &s, int &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }
 
    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<int, 2> remainders = { larger, smaller };
    std::array<int, 2> ss = { 1, 0 };
    std::array<int, 2> ts = { 0, 1 };
    int indexNeg2 = 0;
    int indexNeg1 = 1;
 
    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        int newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        int newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];
 
        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);
            return remainders[indexNeg1];
        }
 
        // calculate this round's s and t
        int newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        int newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];
 
        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;
 
        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}
 
//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}
 
//=================================================================================
int main(int argc, char **argv)
{
    const SEquation equations[] =
    {
        { 2, 3 },
        { 2, 4 },
        { 1, 5 }
    };

    const int c_numEquations = sizeof(equations) / sizeof(equations[0]);

    // print out the equations
    printf("Solving for x:n");
    for (int i = 0; i < c_numEquations; ++i)
        printf("eq %i:  x = %i mod %in", i, equations[i].a, equations[i].m);
    printf("n");

    // make sure the m's are pairwise co-prime
    for (int i = 0; i < c_numEquations; ++i)
    {
        for (int j = i + 1; j < c_numEquations; ++j)
        {
            int s, t;
            int gcd = ExtendedEuclidianAlgorithm(equations[i].m, equations[j].m, s, t);
            if (gcd != 1)
            {
                printf("%i and %i are not co-prime (index %i and %i)n", equations[i].m, equations[j].m, i, j);
                WaitForEnter();
                return 0;
            }
        }
    }

    // calculate the coefficients for each term
    std::array < int, c_numEquations > coefficients;
    coefficients.fill(1);
    for (int i = 0; i < c_numEquations; ++i)
    {
        for (int j = 0; j < c_numEquations; ++j)
        {
            if (i != j)
                coefficients[i] *= equations[j].m;
        }
    }

    // now figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    int result = 0;
    for (int i = 0; i < c_numEquations; ++i)
    {
        int s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], equations[i].m, s, t);
        coefficients[i] *= t * equations[i].a;
    }

    // calculate result and simplify it to the smallest positive integer mod lcm
    // lcm is the product when they are pairwise coprime, as the gcd of any two is 1.
    int lcm = 1;
    for (int i = 0; i < c_numEquations; ++i)
    {
        lcm *= equations[i].m;
        result += coefficients[i];
    }
    result = result % lcm;
    if (result < 0)
        result += lcm;

    // print out the answer
    printf("x = %i mod %inor...n", result, lcm);
    printf("x = %i + %i*NnWhere N is any positive or negative integer.nn", result, lcm);

    // verify that our result is correct
    printf("Verifying Equations:n");
    for (int i = 0; i < c_numEquations; ++i)
        printf("eq %i:  %i mod %i = %i (%s)n", i, result, equations[i].m, result % equations[i].m, (result % equations[i].m == equations[i].a) ? "PASS" : "!!FAIL!!");

    WaitForEnter();
    return 0;
}

Here’s the output of the program:

Links

One final note. If the program can’t find an answer, it doesn’t necessarily mean that there is no answer. For instance, you could imagine multiplying everything by 2, which would make the modulus values not be co-prime (they would have 2 as a common denominator). That would make this algorithm fail, even though there was a valid answer.

You can try the method of successive substitution as an alternate method when that happens.

Wikipedia: The Chinese Remainder Theorem

Wikipedia: Method Of Successive Substitution

Also, it looks like Khan Academy has a good bit on modular math!

Khan Academy: What is Modular Arithmetic?

Modular Multiplicative Inverse

This post is a pre-requisite for the next thing I want to talk about so may not make a whole lot of sense or be all that interesting until shown in that context.

Say you have a function like this:

(a*x) \mod m = n

If you know the values of a, m and n, how do you solve for x? Note in this post we are only dealing with integers, so we are looking for the integer solution for x.

It might be hard to visualize with so many symbols, so here it is with some constants:

(5*x) \mod 7 = 3

How would you solve that for x? In other words, what do you need to multiply 5 by, so that when you divide the result by 7, that you get 3 as the remainder?

One way to solve for x would be brute force. We could try plugging every value from 0 to 6 into x (every value from 0 to n-1), and see if any gives us the result we are looking for.

Brute force can be a challenge if the numbers are really large, like in some cryptographic situations.

Interestingly, there might not even be a valid answer for x that satisfies the equation! The below has no answer for instance:

(2*x) \mod 8 = 5

Better Than Brute Force

There’s something called the “Modular Multiplicative Inverse” which looks eerily familiar:

(a*x) \mod m = 1

Where a and m are known, and the inverse itself is the value of x.

Using the same constants we did above, that gives us this:

(5*x) \mod 7 = 1

In this case, the inverse (x) is 3. You can verify that by seeing that (5*3) % 7 is 1.

Once you have the inverse, if you wanted to solve the original equation where the modulus end up being 3, you just multiply the inverse by the desired modulus amount. Since the inverse is 3 and the desired modulus value is 3, you multiply them together and get 9. Plugging the numbers in, we can see that (5*9) % 7 = 3.

Pretty cool, but how to calculate the inverse? You can calculate it by using something called the “Extended Euclidean Algorithm”.

The regular Euclidean algorithm is in a post here: Programmatically Calculating GCD and LCM.

The extended euclidean algorithm is explained really well on wikipedia: Wikipedia: Extended Euclidean Algorithm.

Sample Code

Here’s some sample code that asks the user for input and solves these style of equations for x. Below the code I’ll show some example runs and talk about a few more things.

#include <stdio.h>
#include <algorithm>
#include <array>

//=================================================================================
unsigned int ExtendedEuclidianAlgorithm (int smaller, int larger, int &s, int &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }

    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<int, 2> remainders = { larger, smaller };
    std::array<int, 2> ss = { 1, 0 };
    std::array<int, 2> ts = { 0, 1 };
    int indexNeg2 = 0;
    int indexNeg1 = 1;

    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        int newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        int newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];

        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);
            return remainders[indexNeg1];
        }

        // calculate this round's s and t
        int newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        int newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];

        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;

        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
int main(int argc, char **argv)
{
    // get user input
    int a, m, n;
    printf("Given a, m and n, solves for X.n(a * X) %% m = nnn");
    printf("a = ");
    scanf("%i", &a);
    printf("m = ");
    scanf("%i", &m);
    printf("n = ");
    scanf("%i", &n);

    // show details of what they entered
    printf("n(%i * X) mod %i = %in", a, m, n);

    // Attempt brute force
    printf("nBrute Force Testing X from 0 to %i:n", (m-1));
    for (int i = 0; i < m; ++i) {
        if ((a*i) % m == n)
        {
            printf("  X = %in", i);
            printf("  %i mod %i = %in", a*i, m, (a*i) % m);
            break;
        }
        else if (i == (m - 1))
        {
            printf("  No solution!n");
        }
    }

    // Attempt inverse via Extended Euclidean Algorithm
    printf("nExtended Euclidean Algorithm:n");
    int s, t;
    int GCD = ExtendedEuclidianAlgorithm(a, m, s, t);

    // report failure if we couldn't do inverse
    if (GCD != 1)
    {
        printf("  Values are not co-prime, cannot invert! GCD = %in", GCD);
    }
    // Else report details of inverse and show that it worked
    else
    {
        printf("  Inverse = %in", t);
        printf("  X = Inverse * n = %in", t*n);
        printf("  %i mod %i = %in", a*t*n, m, (a*t*n) % m);
    }

    WaitForEnter();
    return 0;
}

Example Runs

Here is a normal run that solves (7*x) \mod 9 = 2, to come up with a value of 8 for x.

Here is a run that solves (5*x) \mod 7 = 3. Brute force gives us a value of 2 for x, while the inverse gives us a value of 9. Both are valid, and in fact are equivalent since 9 % 7 = 2. This shows that getting the inverse and then multiplying it to get the desired answer doesn’t always give you the smallest possible value of x.

Here is a large number run that solves (7*x) \mod 1000001 = 538. Brute force gives a value of 571,506 for x, while using the inversion method gives us a value of 230,571,736.

Lastly, here is a run that solves (8*x) \mod 6 = 4. Brute force gives us a value of 2 for x, but interestingly, it isn’t invertible, so the inversion based solution can’t even find us an answer!

This happens when a and m are not co-prime. In other words, if they have a GCD that isn’t 1, they aren’t coprime, and the modulus can’t be inverted.

Links

You can read more about the modular multiplicative inverse here: Wikipedia: Modular Multiplicative Inverse.