Who Cares About Dynamic Array Growth Strategies?

Let’s say that you’ve allocated an array of 20 integers and have used them all. Now it’s time to allocate more, but you aren’t quite sure how many integers you will need in the end. What do you do?

Realloc is probably what you think of first for solving this problem, but let’s ignore that option for the moment. (If you haven’t used realloc before, give this a read! Alloca and Realloc – Useful Tools, Not Ancient Relics)

Without realloc you are left with allocating a new buffer of memory, copying the old buffer to the new buffer, and then freeing the old buffer.

The question remains though, how much memory should you allocate for this new, larger buffer?

You could double your current buffer size whenever you ran out of space. This would mean that as the buffer grew over time, you would do fewer allocations but would have more wasted (allocated but unused) memory.

You could also go the other way and just add 10 more int’s every time you ran out of space. This would mean that you would do a larger number of allocations (more CPU usage, possibly more fragmentation), but you’d end up with less wasted space.

Either way, it obviously depends entirely on usage patterns and it’s all subjective and situational.

…Or is it?

A Surprising Reason For Caring

Believe it or not, growth strategies can make a huge difference. Below we will explore the difference between the seemingly arbitrary rules of making a buffer twice as big, or 1.5 times as big.

Let’s say that we have a bunch of free memory starting at address 0. Let’s analyze what happens as we resize arrays in each scenario.

2x Buffer Size

First let’s see what happens when we double a buffer’s size when it gets full.

We start by allocating 16 bytes. The allocator gives us address 0 for our pointer.

When the buffer gets full, we allocate 32 bytes (at address 16), copy the 16 bytes into it and then free our first 16 byte buffer.

When that buffer gets full, we allocate 64 bytes (at address 48), copy the 32 bytes into it and then free our 32 byte buffer.

Lastly, that buffer gets full, so we allocate 128 bytes (at address 112), copy the 64 bytes into it and then free our 64 byte buffer.

As you can see, doubling the buffer size causes our pointer to keep moving further down in address space, and a free piece of memory before it will never be large enough to hold a future allocation.

1.5x Buffer Size

Let’s see what happens when we make a buffer 1.5x as large when it gets full.

We start by allocating 16 bytes. The allocator gives us address 0 for our pointer.

When the buffer gets full, we allocate 24 bytes (at address 16), copy the 16 bytes into it and then free our first 16 byte buffer.

When that buffer gets full, we allocate 36 bytes (at address 40), copy the 24 bytes into it and free the 24 byte buffer.

When that buffer gets full, we allocate 54 bytes (at address 76), copy the 36 bytes into it and free the 36 byte buffer.

When that buffer gets full, we allocate 81 bytes (at address 130), copy the 54 bytes into it and free the 54 byte buffer.

Lastly, when that buffer gets full, we need to allocate 122 bytes (we rounded it up). In this case, there is 130 bytes of unused memory starting at address 0, so we can just allocate 122 of those bytes, copy our 81 bytes into it and free the 81 byte buffer.

Our allocations have folded back into themselves. Our pattern of resizing hasn’t created an ever moving / ever growing memory fragmentation monster, unlike the buffer size doubling, which has!

Small Print

The above does decrease memory fragmentation, by encouraging an allocation to tend to stay in one spot in memory, but it comes at a cost. That cost is that since it’s allocating less extra memory when it runs out, that you will end up having to do more allocations to reach the same level of memory usage.

That might be a benefit though, depending on your specific needs. Another way of looking at that is that you will end up with fewer bytes of wasted memory. By wasted memory I mean allocated bytes which are not actually used to store anything.

Realloc Makes This Moot Right?

You may be thinking “well if I use realloc, I don’t need to care about this right?”

That isn’t exactly true. If realloc is unable to give you more memory at the current pointer location, it will allocate a new buffer, copy the old data to the new buffer, free the old buffer, and return you the pointer to the new buffer. This is exactly the case that happens when you don’t use realloc.

Using the above growth strategy with realloc makes realloc work even better. It’s a good thing!

Caveat: exotic allocator behavior may not actually benefit from using this strategy with realloc, so have a look for yourself if you are in doubt!

Links

Here’s a discussion on the topic:
What is the ideal growth rate for a dynamically allocated array?

From the link above, apparently the ideal factor to use when upsizing a buffer in general (when worrying about fragmentation like this), is the golden ratio 1.618. Weird, huh?

Have other information about why various growth factors matter? Leave a comment or drop me a line (:

Thanks to Tom for mentioning this concept. Pretty interesting and surprising IMO.

Shamir’s Quest: Collect Any 3 Keys To Unlock The Secret!

This post is on something called Shamir’s Secret Sharing. It’s a technique where you can break a secret number up into $M$ different pieces, where if you have any $N$ of those $M$ pieces, you are able to figure out the secret.

Thinking of it in video game terms, imagine there are 10 keys hidden in a level, but you can escape the level whenever you find any 7 of them. This is what Shamir’s Secret Sharing enables you to set up cryptographically.

Interestingly in this case, the term sharing in “secret sharing” doesn’t mean sharing the secret with others. It means breaking the secret up into pieces, or SHARES. Secret sharing means that you make shares out of a secret, such that if you have enough of the shares, you can recover the secret.

How Do You Share (Split) The Secret?

The basic idea of how it works is actually really simple. This is good for us trying to learn the technique, but also good to show it’s security since there are so few moving parts.

It relies on something called the Unisolvence Theorem which is a fancy label meaning these things:

If you have a linear equation, it takes two (x,y) points to uniquely identify that line. No matter how you write a linear equation, if it passes through those same two points, it’s mathematically equivelant.
If you have a quadratic equation, it takes three (x,y) points to uniquely identify that quadratic curve. Again, no matter how you write a quadratic equation, if it passes through those same three points, it’s mathematically equivalent.
The pattern continues for equations of any degree. Cubic equations require four points to be uniquely identified, Quartic equations require five points, and so on.

At a high level, how this technique works is that the number of shares (keys) you want someone to collect ( $N$ ) defines the degree of an equation.

You use random numbers as the coefficients of the powers of $x$ in that equation, but use your secret number as the constant term.

You then create $M$ data points of the form $(x,y)$ aka $(x,f(x))$ . Those are your shares. You then give individual shares to people, or go hide them in your dungeon or do whatever you are going to do with them.

As soon as any one person has $N$ of those $M$ shares (data points), they will be able to figure out the equation of the curve and thus get the secret.

The secret number is the constant term of the polynomial, which is also just $f(0)$ .

This image below from wikipedia is great for seeing how you may have two points of a cubic curve, but without a third point you can’t be sure what the quadratic equation is. In fact, there are an infinite number of quadratic curves that pass through any two points! Because of that, it takes the full number of required shares for you to be able to unlock the secret.

Example: Sharing (Splitting) The Secret

First you decide how many shares you want it to take to unlock the secret. This determines the degree of your equation.

Let’s say you wanted a person to have to have four shares to unlock the secret. This means our equation will be a cubic equation, since it takes four points to uniquely define a cubic equation.

Our equation is:

$f(x) = R_1x^3 + R_2x^2 + R_3x + S$

Where the $R_i$ values are random numbers, and $S$ is the secret value.

Let’s say that our secret value is 435, and that we picked some random numbers for the equation, making the below:

$f(x) = 28x^3 + 64x^2 + 9x + 435$

We now have a function that is uniquely identifiable by any 4 points of data on it’s curve.

Next we decide how many pieces we are going to create total. We need at least 4 so that it is in fact solvable. Let’s make 6 shares.

To do this, you just plug in 6 different values of x and pair each x value with it’s y value. Let’s do that:

$\begin{array}{c|c} x & f(x) \\ \hline 1 & 536 \\ 2 & 933 \\ 3 & 1794 \\ 4 & 3287 \\ 5 & 5580 \\ 6 & 8841 \\ \end{array}$

When doing this part, remember that the secret number is $f(0)$ , so make sure and not share what the value of the function is when x is 0!

You could then distribute the shares (data pairs) as you saw fit. Maybe some people are more important, so you give them more than one share, requiring a smaller amount of cooperation with them to unlock the secret.

Share distribution details are totally up to you, but we now have our shares, whereby if you have any of the 4 of the 6 total shares, you can unlock the secret.

How Do You Join The Secret?

Once you have the right number of shares and you know the degree of the polynomial (pre-shared “public” information), unlocking the secret is a pretty straightforward process too. To unlock the secret, you just need to use ANY method available for creating an equation of the correct degree from a set of data points.

This can be one of several different interpolation techniques, but the most common one to use seems to be Lagrange interpolation, which is something I previously wrote up that you can read about here: Lagrange Interpolation.

Once you have the equation, you can either evaluate $f(0)$ , or you can write the equation in polynomial form and the constant term will be the secret value.

Example: Joining the Secret

Let’s say that we have these four shares and are ready to get the cubic function and then unlock the secret number:

$\begin{array}{c|c} x & y \\ \hline 1 & 536 \\ 2 & 933 \\ 4 & 3287 \\ 6 & 8841 \\ \end{array}$

We could bust out some Lagrange interpolation and figure this out, but let’s be lazy… err efficient I mean. Wolfram alpha can do this for us!

Wolfram Alpha: cubic fit (1, 536), (2, 933), (4, 3287), (6, 8841)

That gives us this equation, saying that it is a perfect fit (which it is!)
$28x^3 + 64x^2 + 9x + 435$

You can see that our constant term (and $f(0)$ ) is the correct secret value of 435.

Daaaayummm Bru… that is lit AF! We just got hacked by wolfram alpha 😛

A Small Complication

Unfortunately, the above has a weakness. The weakness is that each share you get gives you a little bit more information about the secret value. You can read more about this in the links section at the end if you want to know more details.

Ideally, you wouldn’t have any information about the secret value until you had the full number of shares required to unlock the secret.

To address this problem, we are going to choose some prime number $k$ and instead of shares being $(x,y)$ data points on the curve, they are going to be $(x,y \bmod k)$ . In technical terms we are going to be using points on a finite field, or a Galois field.

The value we choose for $k$ needs to be larger than any of the coefficients of our terms (the random numbers) as well as larger than our secret value and larger than the number of shares we want to create. The larger the better besides that, because a larger $k$ value means a larger “brute force” space to search.

If you want to use this technique in a situation which has real needs for security, please make sure and read more on this technique from more authoritative sources. I’m glossing over the details of security quite a bit, and just trying to give an intuitive understanding of this technique (:

Source Code

Below is some sample source code that implements Shamir’s Secret Sharing in C++.

I use 64 bit integers, but if you were going to be using this in a realistic situation you could very well overflow 64 bit ints and get the wrong answers. I hit this problem for instance when trying to require more than about 10 shares, using a prime of 257, and generating 50 shares. If you hit the limit of 64 bit ints you can use a multi precision math library instead to have virtually unlimited sized ints. The boost multiprecision header library is a decent choice for multi precision integers, specifically cpp_int.

#include <stdio.h>
#include <array>
#include <vector>
#include <math.h>
#include <random>
#include <assert.h>
#include <stdint.h>
#include <inttypes.h>

typedef int64_t TINT;
typedef std::array<TINT, 2> TShare;
typedef std::vector<TShare> TShares;

class CShamirSecretSharing
{
public:
    CShamirSecretSharing (size_t sharesNeeded, TINT prime)
        : c_sharesNeeded(sharesNeeded), c_prime(prime)
    {
        // There needs to be at least 1 share needed
        assert(sharesNeeded > 0);
    }

    // Generate N shares for a secretNumber
    TShares GenerateShares (TINT secretNumber, TINT numShares) const
    {
        // calculate our curve coefficients
        std::vector<TINT> coefficients;
        {
            // store the secret number as the first coefficient;
            coefficients.resize((size_t)c_sharesNeeded);
            coefficients[0] = secretNumber;

            // randomize the rest of the coefficients
            std::array<int, std::mt19937::state_size> seed_data;
            std::random_device r;
            std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
            std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
            std::mt19937 gen(seq);
            std::uniform_int_distribution<TINT> dis(1, c_prime - 1);
            for (TINT i = 1; i < c_sharesNeeded; ++i)
                coefficients[(size_t)i] = dis(gen);
        }

        // generate the shares
        TShares shares;
        shares.resize((size_t)numShares);
        for (size_t i = 0; i < numShares; ++i)
            shares[i] = GenerateShare(i + 1, coefficients);
        return shares;
    }

    // use lagrange polynomials to find f(0) of the curve, which is the secret number
    TINT JoinShares (const TShares& shares) const
    {
        // make sure there is at elast the minimum number of shares
        assert(shares.size() >= size_t(c_sharesNeeded));

        // Sigma summation loop
        TINT sum = 0;
        for (TINT j = 0; j < c_sharesNeeded; ++j)
        {
            TINT y_j = shares[(size_t)j][1];

            TINT numerator = 1;
            TINT denominator = 1;

            // Pi product loop
            for (TINT m = 0; m < c_sharesNeeded; ++m)
            {
                if (m == j)
                    continue;

                numerator = (numerator * shares[(size_t)m][0]) % c_prime;
                denominator = (denominator * (shares[(size_t)m][0] - shares[(size_t)j][0])) % c_prime;
            }

            sum = (c_prime + sum + y_j * numerator * modInverse(denominator, c_prime)) % c_prime;
        }
        return sum;
    }

    const TINT GetPrime () const { return c_prime; }
    const TINT GetSharesNeeded () const { return c_sharesNeeded; }

private:

    // Generate a single share in the form of (x, f(x))
    TShare GenerateShare (TINT x, const std::vector<TINT>& coefficients) const
    {
        TINT xpow = x;
        TINT y = coefficients[0];
        for (TINT i = 1; i < c_sharesNeeded; ++i) {
            y += coefficients[(size_t)i] * xpow;
            xpow *= x;
        }
        return{ x, y % c_prime };
    }

    // Gives the decomposition of the gcd of a and b.  Returns [x,y,z] such that x = gcd(a,b) and y*a + z*b = x
    static const std::array<TINT, 3> gcdD (TINT a, TINT b) {
        if (b == 0)
            return{ a, 1, 0 };

        const TINT n = a / b;
        const TINT c = a % b;
        const std::array<TINT, 3> r = gcdD(b, c);

        return{ r[0], r[2], r[1] - r[2] * n };
    }

    // Gives the multiplicative inverse of k mod prime.  In other words (k * modInverse(k)) % prime = 1 for all prime > k >= 1 
    static TINT modInverse (TINT k, TINT prime) {
        k = k % prime;
        TINT r = (k < 0) ? -gcdD(prime, -k)[2] : gcdD(prime, k)[2];
        return (prime + r) % prime;
    }

private:
    
    // Publically known information
    const TINT          c_prime;
    const TINT          c_sharesNeeded;
};

void WaitForEnter ()
{
    printf("Press Enter to quit");
    fflush(stdin);
    getchar();
}

int main (int argc, char **argv)
{
    // Parameters
    const TINT c_secretNumber = 435;
    const TINT c_sharesNeeded = 7;
    const TINT c_sharesMade = 50;
    const TINT c_prime = 439;   // must be a prime number larger than the other three numbers above

    // set up a secret sharing object with the public information
    CShamirSecretSharing secretSharer(c_sharesNeeded, c_prime);

    // split a secret value into multiple shares
    TShares shares = secretSharer.GenerateShares(c_secretNumber, c_sharesMade);

    // shuffle the shares, so it's random which ones are used to join
    std::array<int, std::mt19937::state_size> seed_data;
    std::random_device r;
    std::generate_n(seed_data.data(), seed_data.size(), std::ref(r));
    std::seed_seq seq(std::begin(seed_data), std::end(seed_data));
    std::mt19937 gen(seq);
    std::shuffle(shares.begin(), shares.end(), gen);

    // join the shares
    TINT joinedSecret = secretSharer.JoinShares(shares);

    // show the public information and the secrets being joined
    printf("%" PRId64 " shares needed, %i shares maden", secretSharer.GetSharesNeeded(), shares.size());
    printf("Prime = %" PRId64 "nn", secretSharer.GetPrime());
    for (TINT i = 0, c = secretSharer.GetSharesNeeded(); i < c; ++i)
        printf("Share %" PRId64 " = (%" PRId64 ", %" PRId64 ")n", i+1, shares[i][0], shares[i][1]);

    // show the result
    printf("nJoined Secret = %" PRId64 "nActual Secret = %" PRId64 "nn", joinedSecret, c_secretNumber);
    assert(joinedSecret == c_secretNumber);
    WaitForEnter();
    return 0;
}

Example Output

Here is some example output of the program:

Links

Wikipedia: Shamir’s Secret Sharing (Note: for some reason the example javascript implementation here only worked for odd numbered keys required)
Wikipedia: Finite Field
Cryptography.wikia.com: Shamir’s Secret Sharing
Java Implementation of Shamir’s Secret Sharing (Note: I don’t think this implementation is correct, and neither is the one that someone posted to correct them!)

When writing this post I wondered if maybe you could use the coefficients of the other terms as secrets as well. These two links talk about the details of that:
Cryptography Stack Exchange: Why only one secret value with Shamir’s secret sharing?
Cryptography Stack Exchange: Coefficients in Shamir’s Secret Sharing Scheme

Now that you understand this, you are probably ready to start reading up on elliptic curve cryptography. Give this link below a read if you are interested in a gentle introduction on that!
A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography

Turning a Truth Table Into A digital Circuit (ANF)

In this post I’m going to show how you turn a truth table into a digital logic circuit that uses XOR and AND gates.

My Usage Case

My specific usage case for this is in my investigations into homomorphic encryption, which as you may recall is able to perform computation on encrypted data. This lets encrypted data be operated on by an untrusted source, given back to you, and then you can decrypt your data to get a result.

Lots of use cases if this can ever get fast enough to become practical, such as doing cloud computing with private data. However, when doing homomorphic encryption (at least currently, for the techniques I’m using), you only have XOR and AND logic operations.

So, I’m using the information in this post to be able to turn a lookup table, or a specific boolean function, into a logic circuit that I can feed into a homomorphic encryption based digital circuit.

Essentially I want to figure out how to do a homomorphic table lookup to try and make some simple as possible circuits, that will in turn be as fast and lean as possible.

If you want to know more about homomorphic encryption, here’s a post I wrote which explains a very simple algorithm: Super Simple Symmetric Leveled Homomorphic Encryption Implementation

Algebraic Normal Form

Algebraic normal form (ANF) is a way of writing a boolean function using only XOR and AND.

Since it’s a normal form, two functions that do the same thing will be the same thing in ANF.

There are other forms for writing boolean logic, but ANF suits me best for my homomorphic encryption circuit needs!

An example of boolean logic in ANF is the below:

$f(x_1, x_2, x_3, x_4) = x_1 x_2 \oplus x_1 x_3 \oplus x_1 x_4$

It is essentially a boolean polynomial, where AND is like multiplication, and XOR is like addition. It even factors the same way. In fact, ANF is not always the smallest circuit possible, you’d have to factor common ANDs to find the smallest way you could represent the circuit, like the below:

$f(x_1, x_2, x_3, x_4) = x_1 (x_2 \oplus x_3 \oplus x_4)$

That smaller form does 1 AND and 2 XORs, versus the ANF which does 3 ANDs and 2 XORs. In homomorphic encryption, since AND is so much more costly than XOR, minimizing the ANDs is a very nice win, and worth the effort.

Wikipedia has some more info about ANF here: Wikipedia: Algebraic normal form

Truth Tables and Lookup Tables

A truth table is just where you specify the inputs into a boolean function and the output of that boolean function for the given input:

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array}$

A lookup table is similar in functionality, except that it has multi bit output. When dealing with digital circuits, you can make a lookup table by making a truth table per output bit. For instance, the above truth table might just be the low bit of the lookup table below, which is just a truth table for addition of the input bits.

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, x_3) \\ \hline 0 & 0 & 0 & 00 \\ 0 & 0 & 1 & 01 \\ 0 & 1 & 0 & 01 \\ 0 & 1 & 1 & 10 \\ 1 & 0 & 0 & 01 \\ 1 & 0 & 1 & 10 \\ 1 & 1 & 0 & 10 \\ 1 & 1 & 1 & 11 \\ \end{array}$

Converting Truth Table to ANF

When I first saw the explanation for converting a truth table to ANF, it looked pretty complicated, but luckily it turns out to be pretty easy.

The basic idea is that you make a term for each possible combination of x inputs, ANDing a term by each constant, and then solving for those constants.

Let’s use the truth table from the last section:

For three inputs, the starting equation looks like this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3 \\ \oplus a_{123} x_1 x_2 x_3$

Now we have to solve for the a values.

To solve for $a_{123}$ , we just look in the truth table for function $f(x_1, x_2, x_3)$ to see if we have an odd or even number of ones in the output of the function. If there is an even number, it is 0, else it is a 1.

Since we have an even number of ones, the value is 0, so our equation becomes this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3 \\ \oplus 0 \land x_1 x_2 x_3$

Note that $\land$ is the symbol for AND. I’m showing it explicitly because otherwise the equation looks weird, and a multiplication symbol isn’t correct.

Since 0 ANDed with anything else is 0, and also since n XOR 0 = n, that whole last term disappears, leaving us with this equation:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{12} x_1 x_2 \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3$

Next up, to solve for $a_{12}$ , we need to limit our truth table to $f(x_1, x_2, 0)$ . That truth table is below, made from the original truth table, but throwing out any row where $x_{3}$ is 1.

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, x_2, 0) \\ \hline 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 1 & 0 & 0 \\ \end{array}$

We again just look at whether there are an odd or even number of ones in the function output, and use that to set $a_{12}$ appropriately. In this case, there are an even number, so we set it to 0, which makes that term disappear again. Our function is now down to this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3 \\ \oplus a_{13} x_1 x_3 \oplus a_{23} x_2 x_3$

If we look at $f(x_1,0,x_3)$ , we find that it also has an even number of ones, making $a_{13}$ become 0 and making that term disappear.

Looking at $f(0,x_2,x_3)$ , it also has an even number of ones, making $a_{23}$ become 0 and making that term disappear as well.

That leaves us with this equation:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus a_1 x_1 \oplus a_2 x_2 \oplus a_3 x_3$

To solve for $a_1$ , we look at the truth table for $f(x_1,0,0)$ , which is below:

$\begin{array}{c|c|c|c} x_1 & x_2 & x_3 & f(x_1, 0, 0) \\ \hline 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \\ \end{array}$

There are an odd number of ones in the output, so $a_1$ becomes 1. Finally, we get to keep a term! The equation is below:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus 1 \land x_1 \oplus a_2 x_2 \oplus a_3 x_3$

Since 1 AND n = n, we can drop the explicit 1 to become this:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus x_1 \oplus a_2 x_2 \oplus a_3 x_3$

If you do the same process for $a_2$ and $a_3$ , you’ll find that they also have odd numbers of ones in the output so also become ones. That puts our equation at:

$f(x_1, x_2, x_3) = \\ a_0 \\ \oplus x_1 \oplus x_2 \oplus x_3$

Solving for $a_0$ , is just looking at whether there are an odd or even number of ones in the function $f(0,0,0)$ which you can look up directly in the lookup table. It’s even, so $a_0$ becomes 0, which makes our full final equation into this:

$f(x_1, x_2, x_3) = x_1 \oplus x_2 \oplus x_3$

We are done! This truth table can be implemented with 3 XORs and 0 ANDs. A pretty efficient operation!

You can see this is true if you work it out with the truth table. Try it out and see!

Sample Code

Here is some sample code that lets you define a lookup table by implementing an integer function, and it generates the ANF for each output bit for the truth table. It also verifies that the ANF gives the correct answer. It shows you how to use this to make various circuits: bit count, addition, multiplication, division and modulus.

#include <stdio.h>
#include <array>
#include <vector>

#define PRINT_TRUTHTABLES() 0
#define PRINT_NUMOPS() 1
#define PRINT_ANF() 1

void WaitForEnter ()
{
    printf("Press Enter to quit");
    fflush(stdin);
    getchar();
}

template <size_t NUM_INPUT_BITS>
bool LookupEntryPassesMask (size_t entry, size_t mask)
{
    for (size_t i = 0; i < NUM_INPUT_BITS; ++i)
    {
        const size_t bitMask = 1 << i;
        const bool allowOnes = (mask & bitMask) != 0;
        const bool bitPassesMask = allowOnes || (entry & bitMask) == 0;
        if (!bitPassesMask)
            return false;
    }
    return true;
}

template <size_t NUM_INPUT_BITS>
bool ANFHasTerm (const std::array<size_t, 1 << NUM_INPUT_BITS> &lookupTable, size_t outputBitIndex, size_t termMask)
{
    const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;

    int onesCount = 0;
    for (size_t i = 0; i < c_inputValueCount; ++i)
    {
        if (LookupEntryPassesMask<NUM_INPUT_BITS>(i, termMask) && ((lookupTable[i] >> outputBitIndex) & 1) != 0)
            onesCount++;
    }

    return (onesCount & 1) != 0;
}

template <size_t NUM_INPUT_BITS>
void MakeANFTruthTable (const std::array<size_t, 1 << NUM_INPUT_BITS> &lookupTable, std::array<size_t, 1 << NUM_INPUT_BITS> &reconstructedLookupTable, size_t outputBitIndex)
{
    const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;
    printf("-----Output Bit %u-----rn", outputBitIndex);

    // print truth table if we should
    #if PRINT_TRUTHTABLES()
        for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
            printf("  [%u] = %urn", inputValue, ((lookupTable[inputValue] >> outputBitIndex) & 1) ? 1 : 0);
        printf("rn");
    #endif

    // find each ANF term
    std::vector<size_t> terms;
    for (size_t termMask = 0; termMask < c_inputValueCount; ++termMask)
    {
        if (ANFHasTerm<NUM_INPUT_BITS>(lookupTable, outputBitIndex, termMask))
            terms.push_back(termMask);
    }

    // print function params
    #if PRINT_ANF()
        printf("f(");
        for (size_t i = 0; i < NUM_INPUT_BITS; ++i)
        {
            if (i > 0)
                printf(",");
            printf("x%i",i+1);
        }
        printf(") = rn");
    #endif

    // print ANF and count XORs and ANDs
    size_t numXor = 0;
    size_t numAnd = 0;
    if (terms.size() == 0)
    {
        #if PRINT_ANF()
        printf("0rn");
        #endif
    }
    else
    {
        for (size_t termIndex = 0, termCount = terms.size(); termIndex < termCount; ++termIndex)
        {
            if (termIndex > 0) {
                #if PRINT_ANF()
                printf("XOR ");
                #endif
                ++numXor;
            }

            size_t term = terms[termIndex];
            if (term == 0)
            {
                #if PRINT_ANF()
                printf("1");
                #endif
            }
            else
            {
                bool firstProduct = true;
                for (size_t bitIndex = 0; bitIndex < NUM_INPUT_BITS; ++bitIndex)
                {
                    const size_t bitMask = 1 << bitIndex;
                    if ((term & bitMask) != 0)
                    {
                        #if PRINT_ANF()
                        printf("x%i ", bitIndex + 1);
                        #endif
                        if (firstProduct)
                            firstProduct = false;
                        else
                            ++numAnd;
                    }
                }
            }
            #if PRINT_ANF()
            printf("rn");
            #endif
        }
    }
    #if PRINT_ANF()
    printf("rn");
    #endif

    #if PRINT_NUMOPS()
    printf("%u XORs, %u ANDsrnrn", numXor, numAnd);
    #endif

    // reconstruct a bit of the reconstructedLookupTable for each entry to be able to verify correctness
    const size_t c_outputBitMask = 1 << outputBitIndex;
    for (size_t valueIndex = 0; valueIndex < c_inputValueCount; ++valueIndex)
    {
        bool xorSum = false;
        for (size_t termIndex = 0, termCount = terms.size(); termIndex < termCount; ++termIndex)
        {
            size_t term = terms[termIndex];
            if (term == 0)
            {
                xorSum = 1 ^ xorSum;
            }
            else
            {
                bool andProduct = true;
                for (size_t bitIndex = 0; bitIndex < NUM_INPUT_BITS; ++bitIndex)
                {
                    const size_t bitMask = 1 << bitIndex;
                    if ((term & bitMask) != 0)
                    {
                        if ((valueIndex & bitMask) == 0)
                            andProduct = false;
                    }
                }
                xorSum = andProduct ^ xorSum;
            }
        }
        if (xorSum)
            reconstructedLookupTable[valueIndex] |= c_outputBitMask;
    }
}

template <size_t NUM_INPUT_BITS, size_t NUM_OUTPUT_BITS, typename LAMBDA>
void MakeANFLookupTable (const LAMBDA& lambda)
{
    // make lookup table
    const size_t c_outputValueMask = (1 << NUM_OUTPUT_BITS) - 1;
    const size_t c_inputValueCount = 1 << NUM_INPUT_BITS;
    std::array<size_t, c_inputValueCount> lookupTable;
    for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
        lookupTable[inputValue] = lambda(inputValue, NUM_INPUT_BITS, NUM_OUTPUT_BITS) & c_outputValueMask;

    // make the anf for each truth table (each output bit of the lookup table)
    std::array<size_t, c_inputValueCount> reconstructedLookupTable;
    std::fill(reconstructedLookupTable.begin(), reconstructedLookupTable.end(), 0);
    for (size_t outputBitIndex = 0; outputBitIndex < NUM_OUTPUT_BITS; ++outputBitIndex)
        MakeANFTruthTable<NUM_INPUT_BITS>(lookupTable, reconstructedLookupTable, outputBitIndex);

    // verify that our anf expressions perfectly re-create the lookup table
    for (size_t inputValue = 0; inputValue < c_inputValueCount; ++inputValue)
    {
        if (lookupTable[inputValue] != reconstructedLookupTable[inputValue])
            printf("ERROR: expression / lookup mismatch for index %urn", inputValue);
    }
    printf("expression / lookup verification complete.rnrn");
}

size_t CountBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
    // Count how many bits there are
    int result = 0;
    while (inputValue)
    {
        if (inputValue & 1)
            result++;
        inputValue = inputValue >> 1;
    }
    return result;
}

size_t AddBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
    // break the input bits in half and add them
    const size_t bitsA = numInputBits / 2;
    const size_t mask = (1 << bitsA) - 1;

    size_t a = inputValue & mask;
    size_t b = inputValue >> bitsA;
    
    return a+b;
}

size_t MultiplyBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
    // break the input bits in half and add them
    const size_t bitsA = numInputBits / 2;
    const size_t mask = (1 << bitsA) - 1;

    size_t a = inputValue & mask;
    size_t b = inputValue >> bitsA;

    return a * b;
}

size_t DivideBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
    // break the input bits in half and add them
    const size_t bitsA = numInputBits / 2;
    const size_t mask = (1 << bitsA) - 1;

    size_t a = inputValue & mask;
    size_t b = inputValue >> bitsA;

    // workaround for divide by zero
    if (b == 0)
        return 0;

    return a / b;
}

size_t ModulusBits (size_t inputValue, size_t numInputBits, size_t numOutputBits)
{
    // break the input bits in half and add them
    const size_t bitsA = numInputBits / 2;
    const size_t mask = (1 << bitsA) - 1;

    size_t a = inputValue & mask;
    size_t b = inputValue >> bitsA;

    // workaround for divide by zero
    if (b == 0)
        return 0;

    return a % b;
}

int main (int argc, char **argv)
{
    //MakeANFLookupTable<3, 2>(CountBits);    // Output bits needs to be enough to store the number "input bits"
    //MakeANFLookupTable<4, 3>(AddBits);      // Output bits needs to be (InputBits / 2)+1
    //MakeANFLookupTable<4, 4>(MultiplyBits); // Output bits needs to be same as input bits
    //MakeANFLookupTable<4, 2>(DivideBits);   // Output bits needs to be half of input bits (rounded down)
    //MakeANFLookupTable<4, 2>(ModulusBits);  // Output bits needs to be half of input bits (rounded down)
    //MakeANFLookupTable<10, 5>(DivideBits);  // 5 bit vs 5 bit division is amazingly complex!
    MakeANFLookupTable<4, 2>(ModulusBits);  // Output bits needs to be half of input bits (rounded down)
    WaitForEnter();
    return 0;
}

Sample Code Runs

Here is the program output for a “bit count” circuit. It counts the number of bits that are 1, in the 3 bit input, and outputs the answer as 2 bit output. Note that the bit 0 output is the same functionality as the example we worked through by hand, and you can see that it comes up with the same answer.

Here is the program output for an adder circuit. It adds two 2 bit numbers, and outputs a 3 bit output.

Here is the program output for a multiplication circuit. It multiplies two 2 bit numbers, and outputs a 4 bit number.

Here is the program output for a division circuit. It divides a 2 bit number by another 2 bit number and outputs a 2 bit number. When higher bit counts are involved, the division circuit gets super complicated, it’s really crazy! 5 bit divided by 5 bit is several pages of output for instance. Note that it returns 0 whenever it would divide by 0.

Lastly, here is the program output for a modulus circuit. It divides a 2 bit number by another 2 bit number and outputs the remainder as a 2 bit number.

Closing and Links

While the above shows you how to turn a single bit truth table into ANF, extending this to a multi bit lookup table is super simple; you just do the same process for each output bit in the lookup table.

Here are a few links in case anything above is unclear, or you want more information.

Finding Boolean/Logical Expressions for truth tables in algebraic normal form(ANF)

Finding Boolean/Logical Expressions for truth tables

Simplifying Boolean Expressions With Karnaugh Maps

Karnaugh maps are a formalized way of turning a truth table into a fairly minimal logical expression.

It’s fairly minimal in that it’s the minimal “sum of products” representation, but that might not be the minimal representation of the logic circuit.

The sum of products representation just means that you OR (sum) multiple terms that are ANDed (product) together. For instance the below is a sum of products expression:

$Out = \overline{A}B + AB$

With multiplication being AND, addition being OR, and a line over a letter being a NOT, the above could be written in C++ code like this:

bool out = (!A && B) || (A && B);

It would be real easy to write code like that especially if you were adding onto an existing condition, and you might not even notice that it isn’t the minimal Boolean expression to make the desired output.

Using Boolean algebra, you can do the following simplifications:
$Out = \overline{A}B + AB\\ = B*(\overline{A}+A)\\ = B*1\\ = B$

Which simplifies the C++ code to just this:

bool out = B;

Using Boolean algebra to simplify, you’d have to remember (or derive) the identity that $A+\overline{A}=1$ , and all the other identities to help you simplify equations.

Karnaugh maps make this easier because you will be able to see visually what can be combined (simplified) and what can’t.

Again though, while they give you the smallest possible sum of products representation of the logic circuit, that may not be the smallest possible representation of the circuit.

Let’s get to it!

Two Variable Karnaugh Map: Basics

Going with the example above, it takes two Boolean variables as input (A and B), and gives one Boolean variable as output. Having two input variables means we need a two variable Karnaugh map.

The first step to building the Karnaugh map is having a truth table for the input to output mappings. For our example we’ll use this truth table. This is one of many truth tables that satisfies our equation, so we are working backwards a bit, but hopefully it still makes sense. Usually you would start with the truth table and get a Boolean equation, not the other way around.
$\begin{array}{c|c|c} A & B & Out\\ \hline 0 & 0 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 0 \\ 1 & 1 & 1 \\ \end{array}$

The next thing we do is make our karnaugh map by making a square 2×2 grid where one side of the square is the possible A values, the other side is the possible B values, and the contents of the grid cells are the values we want the formula to come out to be:

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 0 & 1 \\ A & 0 & 1 \\ \end{array}$

Next, what you do is circle all 1’s that are in groups of a power of two (one item or two items). Doing that, we get this:

$\begin{array}{ccc} & \overline{B} & B \\ \cline{3-3} \overline{A} & 0 & \multicolumn{1}{|c|}{1} \\ A & 0 & \multicolumn{1}{|c|}{1} \\ \cline{3-3} \end{array}$

Looking at what we circled, we can see that both values of $A$ are involved in our group, so $A$ doesn’t matter to the output. We can also see that $\overline{B}$ is not involved in any places where there is a 1, so we can ignore that too. All that leaves is $B$ , which is our final, and most minimal answer.

That agrees with the Boolean algebra solution, but came without having to remember any identities. Hooray!

Two Variable Karnaugh Map: Overlaps

If there were multiple groups, you would combine each group with OR to get the final answer. Groups are also allowed to overlap! For instance, let’s look at this truth table to start out:

$\begin{array}{c|c|c} A & B & Out\\ \hline 0 & 0 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \\ \end{array}$

Turning that into a Karnaugh map, we get this:

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 0 & 1 \\ A & 1 & 1 \\ \end{array}$

Next when it’s time to circle our groups, we have two groups, and they overlap! Here is the first group, which is the same as before:

$\begin{array}{ccc} & \overline{B} & B \\ \cline{3-3} \overline{A} & 0 & \multicolumn{1}{|c|}{1} \\ A & 1 & \multicolumn{1}{|c|}{1} \\ \cline{3-3} \end{array}$

That group can be expressed as just B.

The other group is this:

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 0 & 1 \\ \cline{2-3} A & \multicolumn{1}{|c}{1} & \multicolumn{1}{c|}{1} \\ \cline{2-3} \end{array}$

That group can be expressed as just A.

Lastly, we OR our groups together, aka we sum our products, and we get $A+B$ as an answer, which in other words is just A OR B. Check out the truth table and you can see that it is indeed a truth table for OR!

Two Variable Karnaugh Map: Single Sized Groups

What if we don’t have groups of two though, what if we only have groups of one? Let’s explore that real quick with the following truth table:

$\begin{array}{c|c|c} A & B & Out\\ \hline 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 1 & 1 \\ \end{array}$

That becomes this Karnaugh map:

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 1 & 0 \\ A & 0 & 1 \\ \end{array}$

We once again have two groups, but they are each of size one, which is totally ok!

The upper left group is expressed as $\overline{AB}$ , while the lower right group is expressed as $AB$ . We OR those two groups together to get the answer: $\overline{AB} + AB$ . That’s all there is to it.

Four Variable Karnaugh Map: Don’t Care Values

Let’s get a little more sophisticated and see how we would handle four input variables (We could go to three variables next but after learning two and four it’ll be easy to see how to do three). We will start with a truth table, but our truth table will only contain the input values we care about. We’ll omit the ones we don’t care about.

$\begin{array}{c|c} ABCD & Out\\ \hline 0001 & 0 \\ 0011 & 1 \\ 0111 & 0 \\ 1111 & 1 \\ \end{array}$

We’ll put 1’s and 0’s into the Karnaugh map to match our truth table, but put x’s where the output wasn’t listed. These are “don’t care” values, where they could either be 0’s or 1’s, depending on whichever is more convinient for us when simplifying. We are also going to change how we label the map a bit.

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & x & 0 & 1 & x \\ AB & 01 & x & x & 0 & x \\ & 11 & x & x & 1 & x \\ & 10 & x & x & x & x \\ \end{array}$

In this case, even with the wild card don’t care values, we still just have two 1 item groups, that we OR together to get the answer:

$\overline{AB}CD+ABCD$

Note that we could factor out the CD and make it into the below, but then it would no longer be in a sum of products form.

$CD(\overline{AB}+AB)$

You might also have noticed the strange ordering of the values in the table: 00, 01, 11, 10. Normally doesn’t 2 (10) come before 3 (11)? It does, except in this case, we only want to change one variable at a time between neighboring cells. Going from 01 to 11 means that the first bit changed, while going from 01 to 10 means that two bits changed, so isn’t useful for us finding groups. The order that the numbers are in is actually called Gray Code, named after Frank Grey (Wikipedia: Gray Code).

Four Variable Karnaugh Map: Larger Groups

When dealing with karnaugh maps, like I said before, the groups have to be a size of a power of two, but interestingly it can be a power of two on each axis. So valid groups include 4×1, 2×1, 2×2, 4×2 and others. Let’s take a look at one where we encounter a 2×2 group.

Let’s start with the truth table:

$\begin{array}{c|c} ABCD & Out\\ \hline 0000 & 0 \\ 0001 & 0 \\ 0010 & 0 \\ 0011 & 0 \\ 0100 & 0 \\ 0101 & 1 \\ 0110 & 0 \\ 0111 & 1 \\ 1000 & 0 \\ 1001 & 0 \\ 1010 & 0 \\ 1011 & 1 \\ 1100 & 0 \\ 1101 & 1 \\ 1110 & 0 \\ 1111 & 1 \\ \end{array}$

That gives us the Karnaugh map:

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & 0 & 0 & 0 & 0 \\ AB & 01 & 0 & 1 & 1 & 0 \\ & 11 & 0 & 1 & 1 & 0 \\ & 10 & 0 & 0 & 1 & 0 \\ \end{array}$

There are two groups there. The first is the 2×2 group below and is the intersection of where B is 1, and D is 1, so can be represented as BD.

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & 0 & 0 & 0 & 0 \\ \cline{4-5} AB & 01 & 0 & \multicolumn{1}{|c}{1} & \multicolumn{1}{c|}{1} & 0 \\ & 11 & 0 & \multicolumn{1}{|c}{1} & \multicolumn{1}{c|}{1} & 0 \\ \cline{4-5} & 10 & 0 & 0 & 1 & 0 \\ \end{array}$

The second group is a 1×2 group that overlaps the first, and is where A,C and D are 1, but B can be either 0 or 1. That makes it able to be represented as ACD.

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & 0 & 0 & 0 & 0 \\ AB & 01 & 0 & 1 & 1 & 0 \\ \cline{5-5} & 11 & 0 & 1 & \multicolumn{1}{|c|}{1} & 0 \\ & 10 & 0 & 0 & \multicolumn{1}{|c|}{1} & 0 \\ \cline{5-5} \end{array}$

We combine those groups with OR to get the answer:

$BD+ACD$

Four Variable Karnaugh Map: Wrap Around

Interestingly, you can make groups by wrapping around the edges of the Karnaugh map, either horizontally or vertically. Let’s start with a truth table:

$\begin{array}{c|c} ABCD & Out\\ \hline 0000 & 0 \\ 0001 & 0 \\ 0010 & 0 \\ 0011 & 1 \\ 0100 & 0 \\ 0101 & 0 \\ 0110 & 0 \\ 0111 & 0 \\ 1000 & 0 \\ 1001 & 0 \\ 1010 & 0 \\ 1011 & 1 \\ 1100 & 0 \\ 1101 & 0 \\ 1110 & 0 \\ 1111 & 0 \\ \end{array}$

That gives us the Karnaugh map:

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & 0 & 0 & 1 & 0 \\ AB & 01 & 0 & 0 & 0 & 0 \\ & 11 & 0 & 0 & 0 & 0 \\ & 10 & 0 & 0 & 1 & 0 \\ \end{array}$

Here is the group highlighted below, which is represented as $\overline{B}CD$ , which is also the answer:

$\begin{array}{cccccc} & & \multicolumn{4}{c}{CD} \\ & & 00 & 01 & 11 & 10 \\ & 00 & 0 & 0 & \multicolumn{1}{|c|}{1} & 0 \\ \cline{5-5} AB & 01 & 0 & 0 & 0 & 0 \\ & 11 & 0 & 0 & 0 & 0 \\ \cline{5-5} & 10 & 0 & 0 & \multicolumn{1}{|c|}{1} & 0 \\ \end{array}$

Two Variable Karnaugh Map: Handling Redundant Info

If you are like me, you might be wondering – If Karnaugh maps can give you the minimal sum of products expression for a truth table, how does it deal with redundant information or solutions that are of equal size, so it’s ambiguous which to choose?

For instance, Let’s go with the truth table table below. All other inputs not listed are “don’t care” values.

$\begin{array}{c|c} AB & Out\\ \hline 00 & 0 \\ 11 & 1 \\ \end{array}$

It’s obvious that the output bit corresponds exactly to both A and B separately. Which one does it choose, or does it make some more complex expression that involves both?

Here is the Karnaugh map:

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 0 & x \\ A & x & 1 \\ \end{array}$

Well, the disambiguation comes up now that you – the pattern finder in the Karnaugh map – chooses between the two possible groups.

One answer, which is perfectly valid is the below, which is just $A$ .

$\begin{array}{ccc} & \overline{B} & B \\ \overline{A} & 0 & x \\ \cline{2-3} A & \multicolumn{1}{|c}{x} & \multicolumn{1}{c|}{1} \\ \cline{2-3} \end{array}$

The other answer, which is also perfectly valid, is the below, which is just $B$

$\begin{array}{ccc} & \overline{B} & B \\ \cline{3-3} \overline{A} & 0 & \multicolumn{1}{|c|}{x} \\ A & x & \multicolumn{1}{|c|}{1} \\ \cline{3-3} \end{array}$

So, the disambiguation / simplification is left up to the user to choose, but yes, it still comes up with a minimal sum of products answer, and doesn’t try to incorporate both bits into a more complex logic operation.

Other Notes

The act of turning a truth table into a logical expression is called logical synthesis, if you want to read more along those lines.

You might be wondering if because there is a sum of products form, if there is also a product of sums form? There is in fact, and you can get that form from Karnaugh maps as well. It may result in a more optimal logical expression. More info on that here: Is Karnaugh Map possible for Maxterms?.

You might be tempted to bring a higher number of variables into the mix. Be warned… adding a 5th variable makes the Karnaugh map into a 4x4x2 3d shape. Adding a 6th variable makes it into a 4x4x4 cube. Adding a 7th variable makes it into a 4x4x4x2 hypercube, and the pattern continues.

For higher numbers of inputs, people will often use a different algorithm instead, that I hope to write a post on before too long. You can read about it here: Wikipedia: Quine–McCluskey algorithm

Lastly, you might be wondering, what do i do if i have M input bits, and N output bits? How can I make a circuit or a set of instructions to generate a minimal logical expression to encompass that?

Well, one simple way is to handle each bit separately and have N Karnaugh maps each having M variables. A problem there though is that computers do operations on multiple bits at the same time with most operations, so having each bit do it’s calculations without considering sharing instructions with another bit leaves some efficiency on the table.

I’m not sure of any better algorithms currently, but I’ve asked on stack exchange so there may be some more info there by the time you read this:
Algorithms for logical synthesis of multiple output bits?

O(1) Data Lookups With Minimal Perfect Hashing

Hash tables are great in that you can hash a key and then use that hash as an index into an array to get the information associated with that key. That is very fast, so long as you use a fast hash function.

The story doesn’t end there though because hash functions can have collisions – multiple things can hash to the same value even though they are different. This means that in practice you have to do things like store a list of objects per hash value, and when doing a lookup, you need to do a slower full comparison against each item in the bucket to look for a match, instead of being able to only rely on the hash value comparisons.

What if our hash function didn’t have collisions though? What if we could take in N items, hash them, and get 0 to N-1 as output, with no collisions? This post will talk about how to make that very thing happen, with simple sample C++ code as well, believe it or not!

Minimal Perfect Hashing

Perfect hashing is a hash function which has no collisions. You can hash N items, and you get out N different hash values with no collisions. The number of items being hashed has to be smaller than or equal to the possible values your hash can give as output though. For instance if you have a 4 bit hash which gives 16 possible different hash values, and you hash 17 different items, you are going to have at least one collision, it’s unavoidable. But, if you hash 16 items or fewer, it’s possible in that situation that you could have a hash which had no collisions for the items you hashed.

Minimal perfect hashing takes this a step further. Not only are there no collisions, but when you hash N items, you get 0 to N-1 as output.

You might imagine that this is possible – because you could craft limitless numbers of hashing algorithms, and could pass any different salt value to it to change the output values it gives for inputs – but finding the specific hashing algorithm, and the specific salt value(s) to use sounds like a super hard brute force operation.

The method we are going to talk about today is indeed brute force, but it cleverly breaks the problem apart into smaller, easier to solve problems, which is pretty awesome. It’s actually a pretty simple algorithm too.

Here is how you create a minimal perfect hash table:

Hash the items into buckets – there will be collisions at this point.
Sort the buckets from most items to fewest items.
Find a salt value for each bucket such that when all items in that bucket are hashed, they claim only unclaimed slots. Store this array of salt values for later. it will be needed when doing a data lookup.
If a bucket only has one item, instead of searching for a salt value, you can just find an unclaimed index and store -(index+1) into the salt array.

Once you have your minimal perfect hash calculated, here is how you do a lookup:

Hash the key, and use that hash to find what salt to use.
If the salt is positive (or zero), hash the key again using the salt. The result of that hash will be an index into the data table.
If the salt is negative, take the absolute value and subtract one to get the index in the data table.
Since it’s possible the key being looked for isn’t in the table, compare the key with the key stored at that index in the table. If they match, return the data at that index. If they don’t match, the key was not in the table.

Pretty simple isn’t it?

Algorithm Characteristics

This perfect minimal hash algorithm is set up to be slow to generate, but fast to query. This makes it great for situations where the keys are static and unchanging, but you want fast lookup times – like for instance loading a data file for use in a game.

Interestingly though, while you can’t add new keys, you could make it so you could delete keys. You would just remove the key / value pair from the data set, and then when doing a lookup you’d find an empty slot.

Also, there is nothing about this algorithm that says you can’t modify the data associated with the keys, at runtime. Modifying the data associated with a key doesn’t affect where the key/value pair is stored in the table, so you can modify the data all you want.

If you wanted to be able to visit the items in a sorted order, when searching for the perfect minimal hash, you could also make the constraint that when looking for the salt values, that not only did the items in the bucket map to an unclaimed slot, you could make sure they mapped to the correct slot that they should be in to be in sorted order. That would increase the time it took to generate the table, and increase the chances that there was no valid solution for any salt values used, but it is possible if you desire being able to know the items in some sorted order.

Interestingly, the generation time of the minimal perfect hash apparently grows linearly with the number of items it acts on. That makes it scale well. On my own computer for instance, I am able to generate the table for 100,000 items in about 4.5 seconds.

Also, in my implementation, if you have N items, it has the next lower power of two number of salt values. You could decrease the number of salt values used if you wanted to use less memory, but that would again come at the cost of increased time to generate the table, as well as increase the chances that there was no valid solution for any salt values used.

Example Code

Below is a simple implementation of the algorithm described.

The main point of the code (besides functionality) is readability so it isn’t optimized as well as it could be, but still runs very fast (100,000 items processed in about 4.5 seconds on my machine). Debug is quite a bit slower than release for me though – I gave up on those same 100,000 items after a few minutes running in debug.

The code below uses MurmurHash2, but you could drop in whatever hashing algorithm you wanted.

The data file for this code is words.txt and comes to us courtesy of English Wordlists.

#include <vector>
#include <algorithm>
#include <assert.h>
#include <fstream>
#include <string>

unsigned int MurmurHash2(const void * key, int len, unsigned int seed);

//=================================================================================
template <typename VALUE>
class CPerfectHashTable {
public:

    typedef std::pair<std::string, VALUE> TKeyValuePair;
    typedef std::vector<TKeyValuePair> TKeyValueVector;
    struct SKeyValueVectorBucket {
        TKeyValueVector m_vector;
        size_t          m_bucketIndex;
    };
    typedef std::vector<struct SKeyValueVectorBucket> TKeyValueVectorBuckets;

    // Create the perfect hash table for the specified data items
    void Calculate (const TKeyValueVector& data) {

        // ----- Step 1: hash each data item and collect them into buckets.
        m_numItems = data.size();
        m_numBuckets = NumBucketsForSize(m_numItems);
        m_bucketMask = m_numBuckets - 1;
        m_salts.resize(m_numBuckets);
        m_data.resize(m_numItems);
        TKeyValueVectorBuckets buckets;
        buckets.resize(m_numBuckets);

        for (size_t i = 0; i < m_numBuckets; ++i)
            buckets[i].m_bucketIndex = i;

        for (const TKeyValuePair& pair : data) {
            size_t bucket = FirstHash(pair.first.c_str());
            buckets[bucket].m_vector.push_back(pair);
        }

        // ----- Step 2: sort buckets from most items to fewest items
        std::sort(
            buckets.begin(),
            buckets.end(),
            [](const SKeyValueVectorBucket& a, const SKeyValueVectorBucket& b) {
                return a.m_vector.size() > b.m_vector.size();
            }
        );

        // ----- Step 3: find salt values for each bucket such that when all items
        // are hashed with their bucket's salt value, that there are no collisions.
        // Note that we can stop when we hit a zero sized bucket since they are sorted
        // by length descending.
        std::vector<bool> slotsClaimed;
        slotsClaimed.resize(m_numItems);
        for (size_t bucketIndex = 0, bucketCount = buckets.size(); bucketIndex < bucketCount; ++bucketIndex) {
            if (buckets[bucketIndex].m_vector.size() == 0)
                break;
            FindSaltForBucket(buckets[bucketIndex], slotsClaimed);
        }
    }

    // Look up a value by key.  Get a pointer back.  null means not found.
    // You can modify data if you want, but you can't add/remove/modify keys without recalculating.
    VALUE* GetValue (const char* key) {

        // do the first hash lookup and get the salt to use
        size_t bucket = FirstHash(key);
        int salt = m_salts[bucket];

        // if the salt is negative, it's absolute value minus 1 is the index to use.
        size_t dataIndex;
        if (salt < 0)
            dataIndex = (size_t)((salt * -1) - 1);
        // else do the second hash lookup to get where the key value pair should be
        else
            dataIndex = MurmurHash2(key, strlen(key), (unsigned int)salt) % m_data.size();

        // if the keys match, we found it, else it doesn't exist in the table
        if (m_data[dataIndex].first.compare(key) == 0)
            return &m_data[dataIndex].second;
        return nullptr;
    }

private:

    unsigned int FirstHash (const char* key) {
        return MurmurHash2(key, strlen(key), 435) & m_bucketMask;
    }

    void FindSaltForBucket (const SKeyValueVectorBucket& bucket, std::vector<bool>& slotsClaimed) {

        // if the bucket size is 1, instead of looking for a salt, just encode the index to use in the salt.
        // store it as (index+1)*-1 so that we can use index 0 too.
        if (bucket.m_vector.size() == 1) {
            for (size_t i = 0, c = slotsClaimed.size(); i < c; ++i)
            {
                if (!slotsClaimed[i])
                {
                    slotsClaimed[i] = true;
                    m_salts[bucket.m_bucketIndex] = (i + 1)*-1;
                    m_data[i] = bucket.m_vector[0];
                    return;
                }
            }
            // we shouldn't ever get here
            assert(false);
        }

        // find the salt value for the items in this bucket that cause these items to take
        // only unclaimed slots
        for (int salt = 0; ; ++salt) {
            // if salt ever overflows to a negative number, that's a problem.
            assert(salt >= 0);
            std::vector<size_t> slotsThisBucket;
            bool success = std::all_of(
                bucket.m_vector.begin(),
                bucket.m_vector.end(),
                [this, &slotsThisBucket, salt, &slotsClaimed](const TKeyValuePair& keyValuePair) -> bool {
                    const char* key = keyValuePair.first.c_str();
                    unsigned int slotWanted = MurmurHash2(key, strlen(key), (unsigned int)salt) % m_numItems;
                    if (slotsClaimed[slotWanted])
                        return false;
                    if (std::find(slotsThisBucket.begin(), slotsThisBucket.end(), slotWanted) != slotsThisBucket.end())
                        return false;
                    slotsThisBucket.push_back(slotWanted);
                    return true;
                }
            );

            // When we find a salt value that fits our constraints, remember the salt
            // value and also claim all the buckets.
            if (success)
            {
                m_salts[bucket.m_bucketIndex] = salt;
                for (size_t i = 0, c = bucket.m_vector.size(); i < c; ++i)
                {
                    m_data[slotsThisBucket[i]] = bucket.m_vector[i];
                    slotsClaimed[slotsThisBucket[i]] = true;
                }
                return;
            }
        }
    }

    static size_t NumBucketsForSize (size_t size) {
        // returns how many buckets should be used for a specific number of elements.
        // Just uses the power of 2 lower than the size, or 1, whichever is bigger.
        if (!size)
            return 1;

        size_t ret = 1;
        size = size >> 1;
        while (size) {
            ret = ret << 1;
            size = size >> 1;
        }
        return ret;
    }

    // When doing a lookup, a first hash is done to find what salt to use
    // for the second hash.  This table stores the salt values for that second
    // hash.
    std::vector<int> m_salts;

    // NOTE: this stores both the key and the value.  This is to handle the
    // situation where a key is searched for which doesn't exist in the table.
    // That key will still hash to some valid index, so we need to detect that
    // it isn't the right key for that index.  If you are never going to look for
    // nonexistant keys, then you can "throw away the keys" and only store the
    // values.  That can be a nice memory savings.
    std::vector<TKeyValuePair> m_data;

    // useful values
    size_t m_numItems;
    size_t m_numBuckets;
    size_t m_bucketMask;
};

// MurmurHash code was taken from https://sites.google.com/site/murmurhash/
//=================================================================================
// MurmurHash2, by Austin Appleby
 
// Note - This code makes a few assumptions about how your machine behaves -
 
// 1. We can read a 4-byte value from any address without crashing
// 2. sizeof(int) == 4
 
// And it has a few limitations -
 
// 1. It will not work incrementally.
// 2. It will not produce the same results on little-endian and big-endian
//    machines.
 
unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
 
    const unsigned int m = 0x5bd1e995;
    const int r = 24;
 
    // Initialize the hash to a 'random' value
 
    unsigned int h = seed ^ len;
 
    // Mix 4 bytes at a time into the hash
 
    const unsigned char * data = (const unsigned char *)key;
 
    while(len >= 4)
    {
        unsigned int k = *(unsigned int *)data;
 
        k *= m; 
        k ^= k >> r; 
        k *= m; 
         
        h *= m; 
        h ^= k;
 
        data += 4;
        len -= 4;
    }
     
    // Handle the last few bytes of the input array
 
    switch(len)
    {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0];
            h *= m;
    };
 
    // Do a few final mixes of the hash to ensure the last few
    // bytes are well-incorporated.
 
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
 
    return h;
}


//=================================================================================
void WaitForEnter ()
{
    printf("Press Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
int main (int argc, char **argv)
{
    // Read the data entries from a file.  Use the line as the key, and the line
    // number as the data.  Limit it to 100,000 entries.
    printf("Loading Data...");
    CPerfectHashTable<int> table;
    decltype(table)::TKeyValueVector data;
    std::ifstream file("words.txt");
    std::string str;
    int i = 0;
    while (std::getline(file, str) && i < 100000)
    {
        data.push_back(std::make_pair(str, i));
        ++i;
    }
    printf("Donenn");

    // make the perfect hash table
    printf("Generating Minimal Perfect Hash Table...");
    table.Calculate(data);
    printf("Donenn");

    // Verify results
    printf("Verifying results...");
    for (decltype(table)::TKeyValuePair& keyValuePair : data) {
        int* value = table.GetValue(keyValuePair.first.c_str());
        assert(value != nullptr);
        if (value == nullptr)
        {
            printf("Error, could not find data for key "%s"n", keyValuePair.first.c_str());
            WaitForEnter();
            return 0;
        }
        assert(*value == keyValuePair.second);
        if (*value != keyValuePair.second)
        {
            printf("  table["%s"] = %in", keyValuePair.first.c_str(), *value);
            printf("Incorrect value detected, should have gotten %i!n", keyValuePair.second);
            WaitForEnter();
            return 0;
        }
    }
    printf("Donenn");

    WaitForEnter();
    return 0;
}

Links & More

I learned the details of this algorithm from this page: Steve Hanov’s Blog: Throw away the keys: Easy, Minimal Perfect Hashing, after hearing about the technique mentioned occasionally by colleagues.

There are other ways to do minimal perfect hashing however. For instance give this a read: Minimal Perfect Hashing

One place that method is better than this one, is that in this one, when doing a lookup you have to hash the key twice. In the technique described by the last technique, you only have to hash the key once, and use that hash to combine the results of two lookups. The two lookups are not dependant so can be re-ordered or happen concurrently, which makes it faster on modern CPUs.

Apparently there are also some techniques for generating the minimal perfect hash on a large number of items by breaking them apart into smaller sets, which can then be paralelized across threads.

I also wanted to mention that a large part of the in memory representation of this data structure can come from storing the keys along with the data, to verify that you have indeed found the item you are looking for after doing the O(1) lookup. If you are in a situation where you are only ever going to be searching for keys that you know are in the table, you can actually forgo storing the keys, to bring the memory requirements down a significant amount.

Also, The example code implements this as a hash table, but you could also use this as a set, if you wanted fast membership tests. You could either store the keys, to be able to resolve when things were not part of the set, or, you could make a hash table of string to bool, where the bool specified whether something was in the set. Which one is better depends on your specific needs and whether you plan to search for unknown keys or not.

Lastly, as a byproduct of using a data structure like this, you can get a unique ID per key, which is the index that it appears in the data array. This can be super helpful if you have something like a list of filenames, where you would rather work with integers instead of specific file names, since integers are quicker and easier to compare, copy around, etc.

You could even make this data structure support lookups by either key or by unique id. This way, you could do a by key lookup the first time you needed to find something, and then could store off the ID to get it by ID from then on for a faster lookup. Heck, you could even do all your lookups “offline” (at tool time, not when the app is running) and for instance convert all file references in data files to be the each file’s unique ID instead. Then, you could toss out the keys of your data structure, only storing an array of data, and using the unique file ID to look into that table whenever you wanted to read or write meta data associated with that file. That would make lookups faster, and also decrease the memory requirements of memory resident data files.

It’s pretty cool stuff, and a pretty useful algorithm. Hopefully you find it useful as well (:

Update – 12/22/15

Interestingly, this post has had something like 15,000 readers since it was posted. That is by far the most read post on this blog 😛

Anyways, I wanted to add some more info I’ve found recently.

Here are three tools for doing minimal perfect hashing that are very likely to give you better results than the algorithm I describe above:

Here’s a conversation talking about gperf and the alternative applications, and pros and cons for each:
Stack Overflow: Making a perfect hash (all consecutive buckets full), gperf or alternatives?

Here’s a research paper on gperf by Doug Schmidt: GPERF – A Perfect Hash Function Generator

I had a thought that maybe there was some play here by using “logical synthesis” to come up with some algorithm to map the inputs (the keys of the hash table) to the outputs (collision free output indices).

I started looking into Karnaugh maps and then the Quine–McCluskey algorithm, and then espresso and espresso-exact (mincov). Where the first two things are decent at solving multi bit input to single bit output, the second two things are decent at solving multi bit input to multi bit output, allowing operations to be shared among bits.

While I haven’t found anyone using those specific algorithms to solve the problem, people have, and definitely are still, trying to also look into the ability to generate code without lookups. From what I’ve read so far, it sounds like finding such a function takes a lot longer to find and also that it runs more slowly in practice than a less perfect solution which has lookups.

Either way, this is still an active area of research, and plenty of folks are working on it so I’m going to leave it to them.

I also sort of have the feeling that if you are in need of minimal perfect hashing, you may be “doing it wrong”. For instance, if you are at all able to, you probably are likely to be better off having a pre-agreed on set of unique IDs per “thing” you want to look up. These unique IDs can be used directly as array indices for the magical always O(1) lookup that minimal perfect hashing is going for, and is actually a quicker lookup in practice since you don’t need to jump through special hoops to calculate the array index.

The only exceptions I can think of are:

Real world requirements and not being able to reach the ideal situation – this is the boat I’m in right now. Ideally, systems would be asking about things by an integer ID, but I can’t get there in a single step, so the perfect hashing is a temporary bridge til I can get there.
Even with IDs, sparse arrays are still problematic. You may have an ID per file that could be asked about, but say that you have 1,000,000 files, but you want to make an array of data for only 10,000 of them. How do you take the unique ID of a file and do a lookup into that smaller table? Minimal perfect hashing seems to be useful in this situation, but there may be other decent or comparable alternatives if you are already using integer IDs.

Hiding a Lookup Table in a Modulus Operation

Lookup tables are a tool found in every programmer’s tool belt.

Lookup tables let you pre-calculate a complex calculation in advance, store the results in a table (an array), and then during performance critical parts of your program, you access that table to get quick answers to the calculations, without having to do the complex calculation on the fly.

In this post I’ll show a way to embed a lookup table inside of a single (large) number, where you extract values from that lookup table by taking a modulus of that number with different, specific values.

This technique is slower and takes more memory than an actual lookup table, but it’s conceptually interesting, so I wanted to share.

Also, I stumbled on this known technique while working on my current paper. The paper will make this technique a bit more practical, and I’ll share more info as soon as I am able, but for now you can regard this as a curiosity 😛

Onto the details!

1 Bit Input, 1 Bit Output: Pass Through

Let’s learn by example and start with a calculation that takes in an input bit, and gives that same value for an output bit. It’s just a 1 bit pass through lookup table.

$\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 0 & 0 \\ 1 & 1 \\ \end{array}$

To be able to convert that to something we can decode with modulus we have to solve the following equations:

$x \% k_0 = 0 \\ x \% k_1 = 1$

$x$ is the number that represents our lookup table. $k_0$ and $k_1$ are the values that we modulus x against to get our desired outputs out.

It looks as if we have two equations and three unknowns – which would be unsolvable – but in reality, x is the only unknown. The k values can be whatever values it takes to make the equations true.

I wrote a blog post on how to solve equations like these in a previous post: Solving Simultaneous Congruences (Chinese Remainder Theorem).

You can also use this chinese remainder theorem calculator, which is handy: Chinese Remainder Theorem Calculator

The short answer here is that the k values can be ANY numbers, so long as they are pairwise co-prime to each other – AKA they have a greatest common divisor of 1.

If we pick 3 and 4 for k0 and k1, then using the chinese remainder theorem we find that x can equal 9 and the equations are true. Technically the answer is 9 mod 12, so 21, 33, 45 and many other numbers are also valid values of x, but we are going to use the smallest answer to keep things smaller, and more manageable.

So, in this case, the value representing the lookup table would be 9. If you wanted to know what value it gave as output when you plugged in the value 0, you would modulus the lookup table (9) against k0 (3) to get the output. If you wanted to know what value it gave as output when you plugged in the value 1, you would modulus the lookup table (9) against k1 (4) to get the output. The table below shows that it passes through the value in both cases like it should:

$\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 9 \% 3 & 0\\ 1 & x \% k_1 & 9 \% 4 & 1\\ \end{array}$

1 Bit Input, 1 Bit Output: Not Gate

Let’s do something a little more interesting. Let’s make the output bit be the reverse of the input bit. The equations we’ll want to solve are this:

$x \% k_0 = 1 \\ x \% k_1 = 0$

We can use 3 and 4 for k0 and k1 again if we want to. Using the Chinese remainder theorem to solve the equations gives us a value of 4 for x. Check the truth table below to see how this works:

$\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 4 \% 3 & 1\\ 1 & x \% k_1 & 4 \% 4 & 0\\ \end{array}$

1 Bit Input, 1 Bit Output: Output Always 1

What if we wanted the output bit to always be 1 regardless of input bit?

$x \% k_0 = 1 \\ x \% k_1 = 1$

Using 3 and 4 for our k values again, we solve and get a value of 1 for x. Check the truth table to see it working below:

$\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 1 \% 3 & 1\\ 1 & x \% k_1 & 1 \% 4 & 1\\ \end{array}$

Hopefully one bit input to one bit output makes sense now. Let’s move on (:

2 Bit Input, 1 Bit Output: XOR Gate

Things get a little more interesting when we bump the number of input bits up to 2. If we want to make a number which represents XOR, we now have 4 equations to solve.

$x \% k_{00} = 0 \\ x \% k_{01} = 1 \\ x \% k_{10} = 1 \\ x \% k_{11} = 0$

In general we will have $2^N$ equations, where N is the number of input bits.

You might have noticed that I use subscripts for k corresponding to the input bits that the key represents. This is a convention I’ve found useful when working with this stuff. Makes it much easier to see what’s going on.

Now with four equations, we need 4 pairwise coprime numbers – no number has a common factor with another number besides 1.

Let’s pull them out of the air. Umm… 3, 4, 5, 7

Not too hard with only two bits of input, but you can see how adding input bits makes things a bit more complex. If you wanted to make something that took in two 16 bit numbers as input for example, you would need 2^32 co-prime numbers, since there was a total of 32 bits of input!

When we solve those four equations, we get a value of 21 for x.

Notice how x is larger now that we have more input bits? That is another added complexity as you add more input bits. The number representing your program can get very, very large, and require you to use “multi precision integer” math libraries to store and decode the programs, when the numbers get larger than what can be held in a 64 bit int.

Boost has a decent library for this, check out boost::multiprecision::cpp_int, it’s what I use. You can download boost from here: http://www.boost.org/doc/libs/1_59_0/more/getting_started/windows.html

Anyhow, let’s check the truth table to see if our values work:

$\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x \% k_{00} & 21 \% 3 & 0 \\ 01 & x \% k_{01} & 21 \% 4 & 1 \\ 10 & x \% k_{10} & 21 \% 5 & 1 \\ 11 & x \% k_{11} & 21 \% 7 & 0 \end{array}$

Woot, it worked.

2 Bit Input, 2 Bit Output: OR, AND

What happens when we add another bit of output? Basically we just treat each output bit as it’s own lookup table. This means that if we have two output bits, we will have two numbers representing our program (one for each bit), and that this is true regardless of how many input bits we have.

Let’s make the left output bit ( $x_0$ ) be the OR of the input bits and the right output bit ( $x_1$ ) be the AND of the input bits.

That give us these two sets of equations to solve:

$x_0 \% k_{00} = 0 \\ x_0 \% k_{01} = 1 \\ x_0 \% k_{10} = 1 \\ x_0 \% k_{11} = 1 \\ \\ x_1 \% k_{00} = 0 \\ x_1 \% k_{01} = 0 \\ x_1 \% k_{10} = 0 \\ x_1 \% k_{11} = 1 \\$

We can use the same coprime numbers for our k values as we used in the last section (3,4,5,7). Note that we use the same k values in each set of equations. This is intentional and required for things to work out!

If we solve each set of equations we get 141 for x0, and 120 for x1.

Let’s see if that worked:

$\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x_0 \% k_{00}, x_1 \% k_{00} & 141 \% 3, 120 \% 3 & 00 \\ 01 & x_0 \% k_{01}, x_1 \% k_{01} & 141 \% 4, 120 \% 4 & 10 \\ 10 & x_0 \% k_{10}, x_1 \% k_{10} & 141 \% 5, 120 \% 5 & 10 \\ 11 & x_0 \% k_{11}, x_1 \% k_{11} & 141 \% 7, 120 \% 7 & 11 \end{array}$

Hey, it worked again. Neat!

Example Code

Now that we have the basics worked out, here is some sample code.

The lookup table takes in 8 bits as input, mapping 0..255 to 0…2pi and gives the sine of that value as output in a float. So it has 8 bits of input and 32 bits of output.

#include <vector>
#include <boost/multiprecision/cpp_int.hpp>
#include <stdint.h>
#include <string.h>
#include <memory>

typedef boost::multiprecision::cpp_int TINT;
typedef std::vector<TINT> TINTVec;

const float c_pi = 3.14159265359f;

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
static TINT ExtendedEuclidianAlgorithm (TINT smaller, TINT larger, TINT &s, TINT &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }

    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<TINT, 2> remainders = { larger, smaller };
    std::array<TINT, 2> ss = { 1, 0 };
    std::array<TINT, 2> ts = { 0, 1 };
    size_t indexNeg2 = 0;
    size_t indexNeg1 = 1;

    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        TINT newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        TINT newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];

        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);

            // if t < 0, add the modulus divisor to it, to make it positive
            if (t < 0)
                t += smaller;
            return remainders[indexNeg1];
        }

        // calculate this round's s and t
        TINT newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        TINT newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];

        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;

        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}

//=================================================================================
void MakeKey (TINTVec &keys, TINT &keysLCM, size_t index)
{
    // if this is the first key, use 3
    if (index == 0)
    {
        keys[index] = 3;
        keysLCM = keys[index];
        return;
    }

    // Else start at the last number and keep checking odd numbers beyond that
    // until you find one that is co-prime.
    TINT nextNumber = keys[index - 1];
    while (1)
    {
        nextNumber += 2;
        if (std::all_of(
            keys.begin(),
            keys.begin() + index,
            [&nextNumber] (const TINT& v) -> bool
            {
                TINT s, t;
                return ExtendedEuclidianAlgorithm(v, nextNumber, s, t) == 1;
            }))
        {
            keys[index] = nextNumber;
            keysLCM *= nextNumber;
            return;
        }
    }
}

//=================================================================================
void CalculateLookupTable (
    TINT &lut,
    const std::vector<uint64_t> &output,
    const TINTVec &keys,
    const TINT &keysLCM,
    const TINTVec &coefficients,
    size_t bitMask
)
{
    // figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    lut = 0;
    for (size_t i = 0, c = keys.size(); i < c; ++i)
    {
        // we either want this term to be 0 or 1 mod the key.  if zero, we can multiply by zero, and
        // not add anything into the bit value!
        if ((output[i] & bitMask) == 0)
            continue;

        // if 1, use chinese remainder theorem
        TINT s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], keys[i], s, t);
        lut = (lut + ((coefficients[i] * t) % keysLCM)) % keysLCM;
    }
}

//=================================================================================
template <typename TINPUT, typename TOUTPUT, typename LAMBDA>
void MakeModulus (TINTVec &luts, TINTVec &keys, LAMBDA &lambda)
{
    // to keep things simple, input sizes are being constrained.
    // Do this in x64 instead of win32 to make size_t 8 bytes instead of 4
    static_assert(sizeof(TINPUT) < sizeof(size_t), "Input too large");
    static_assert(sizeof(TOUTPUT) < sizeof(uint64_t), "Output too large");

    // calculate some constants
    const size_t c_numInputBits = sizeof(TINPUT) * 8;
    const size_t c_numInputValues = 1 << c_numInputBits;
    const size_t c_numOutputBits = sizeof(TOUTPUT) * 8;

    // Generate the keys (coprimes)
    TINT keysLCM;
    keys.resize(c_numInputValues);
    for (size_t index = 0; index < c_numInputValues; ++index)
        MakeKey(keys, keysLCM, index);

    // calculate co-efficients for use in the chinese remainder theorem
    TINTVec coefficients;
    coefficients.resize(c_numInputValues);
    fill(coefficients.begin(), coefficients.end(), 1);
    for (size_t i = 0; i < c_numInputValues; ++i)
    {
        for (size_t j = 0; j < c_numInputValues; ++j)
        {
            if (i != j)
                coefficients[i] *= keys[j];
        }
    }

    // gather all the input to output mappings by permuting the input space
    // and storing the output for each input index
    std::vector<uint64_t> output;
    output.resize(c_numInputValues);
    union
    {
        TINPUT value;
        size_t index;
    } input;
    union
    {
        TOUTPUT value;
        size_t index;
    } outputConverter;

    for (input.index = 0; input.index < c_numInputValues; ++input.index)
    {
        outputConverter.value = lambda(input.value);
        output[input.index] = outputConverter.index;
    }

    // iterate through each possible output bit, since each bit is it's own lut
    luts.resize(c_numOutputBits);
    for (size_t i = 0; i < c_numOutputBits; ++i)
    {
        const size_t bitMask = 1 << i;
        CalculateLookupTable(
            luts[i],
            output,
            keys,
            keysLCM,
            coefficients,
            bitMask
        );
    }
}

//=================================================================================
int main (int argc, char **argv)
{
    // Look up tables encodes each bit, keys is used to decode each bit for specific
    // input values.
    TINTVec luts;
    TINTVec keys;

    // this is the function that it turns into modulus work
    typedef uint8_t TINPUT;
    typedef float TOUTPUT;
    auto lambda = [] (TINPUT input) -> TOUTPUT
    {
        return sin(((TOUTPUT)input) / 255.0f * 2.0f * c_pi);
    };

    MakeModulus<TINPUT, TOUTPUT>(luts, keys, lambda);

    // show last lut and key to show what kind of numbers they are
    std::cout << "Last Lut: " << *luts.rbegin() << "n";
    std::cout << "Last Key: " << *keys.rbegin() << "n";

    // Decode all input values
    std::cout << "n" << sizeof(TINPUT) << " bytes input, " << sizeof(TOUTPUT) << " bytes outputn";
    for (size_t keyIndex = 0, keyCount = keys.size(); keyIndex < keyCount; ++keyIndex)
    {
        union
        {
            TOUTPUT value;
            size_t index;
        } result;

        result.index = 0;

        for (size_t lutIndex = 0, lutCount = luts.size(); lutIndex < lutCount; ++lutIndex)
        {
            TINT remainder = luts[lutIndex] % keys[keyIndex];
            size_t remainderSizeT = size_t(remainder);
            result.index += (remainderSizeT << lutIndex);
        }

        TINT remainder = luts[0] % keys[keyIndex];
        std::cout << "i:" << keyIndex << " o:" << result.value << "n";
    }

    WaitForEnter();
    return 0;
}

Here is some output from the program. The first is to show what the last (largest) look up table and key look like. Notice how large the look up table number is!

Here it shows some sine values output from the program, using modulus against the large numbers calculated, to get the bits of the result out:

How to Get Lots of Pairwise Co-Prime Numbers?

You can generate a list of pairwise coprimes using brute force. Have an integer that you increment, and check if it’s pairwise co-prime to the existing items in the list. If it is, add it to the list! Rinse and repeat until you have as many as you want.

That is the most practical way to do it, but there are two other interesting ways I wanted to mention.

The first way is using Fermat numbers. The Fermat numbers are an infinite list of pairwise co-prime numbers and are calculated as $2^{2^n}+1$ where n is an integer. Fermat numbers also have the benefit that you can get the nth item in the list without calculating the numbers that came before it. The only problem is that the numbers grow super huge very fast. The first 7 values are: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617. If Fermat numbers didn’t grow so quickly, they sure would be useful for things like this technique.

The second way is using something called Sylvester’s sequence. It too is an infinite list of pairwise co-prime numbers, and it too grows very large very quickly unfortunately. I also don’t believe there is a way to calculate the Nth item in the list directly. Every number is based on previous numbers, so you have to calculate them all from the beginning. No random access!

Beyond Binary

In this post I showed how to work in binary digits, but there is no reason why you have to encode single bits in the lookup tables.

Instead of encoding 0 or 1 in each modulus “lookup table”, you could also perhaps store base 10 numbers in the tables and have 0-9. Or, maybe you encode a byte per lookup table.

Encoding more than one bit effectively makes both your input and your output smaller, which helps the algorithm do more with less.

Your keys will need to be larger though, since the keys have to be larger than the value you plan to store, and your resulting lookup table will be a larger number as well. It might make the technique more worth while though.

I’ll leave that as an exercise for you. If try it and find neat stuff, post a comment and let us know, or drop me an email or something. It’d be neat to hear if people find any practical usage cases of this technique 😛

The End, For Now!

I want to point out that increasing the number of input bits in this technique is a pretty expensive thing to do, but increasing the number of output bits is a lot cheaper. It kind of makes sense in a way if you think about it. Input bits add information from the outside world that must be dealt with, while output bits are just fluff that can easily be diluted or concentrated by adding or removing bits that are associated with, and calculated from, the input bits.

Another problem you may have noticed with this technique is that if you have a really expensive calculation that you are trying to “flatten” into modulus math like this, that you have to run that calculation many, many times to know what values a lookup table would give you. You have to run it once per possible input to get every possible output. That is expected when making a lookup table, since you are paying a cost up front to make things faster later.

The paper I’m working on changes things a bit though. One of the things it does is it makes it so doing this technique only requires that you evaluate the function once, and it calculates all values simultaneously to give the end result that you can then do modulus against. It’s pretty cool IMO and I will share more details here as soon as I am able – and yes, i have actual working code that does that, believe it or not! I’m looking forward to being able to share it later on. Maybe someone will find some really cool usage case for it.

Quantum Computing For Programmers Part 2: Multiple Qubits

In part 1 (Quantum Computing For Programmers Part I: One Qubit) we looked at the basics of quantum computing and how to deal with a single qubit. Here we’ll talk about the interesting things that happen when you involve multiple qubits!

Multiple Qubit Probabilities & Possibilities

In the last post we used the analogy of a coin to describe a qubit. The coin can be either heads or tails, or flipping in the air, waiting to become either a heads or tails when it lands. The same is true of how a qubit works, where it can either be a 0 or a 1 or some superposition of both, only deciding which it is when observed. Furthermore, just like a real coin, there can be a bias towards becoming one value or the other.

We represented a coin by having a probability for each possible state (heads or tails), but represented a qubit by having an AMPLITUDE for each possible state (0 or 1), where you square an amplitude to get the probability.

Going back to the coin analogy, how would probabilities work if we had two coins?

The four possible outcomes with two coins are:
Heads/Heads
Heads/Tails
Tails/Heads
Tails/Tails

Each coin individually could either be sitting on the table heads or tails, or up in the air, waiting to become either a heads or a tails, with some probability of becoming a heads or a tails.

To figure out the probability of each possible outcome, you just multiply the probability of each coin’s outcome together.

For instance, let’s say that the first coin (coin A) is already sitting on the table as a heads, and the second coin (coin B) is flipping in the air, with a 1/3 chance of becoming heads and a 2/3 chance of becoming tails. Here is how you would calculate the probability of each possible state:

$\begin{array}{c|c|c|c} \text{Outcome A/B} & \text{Coin A Probability} & \text{Coin B Probability} & \text{Outcome Probability (A*B)}\\ \hline heads / heads & 100\% & 33\% & 33\% \\ heads / tails & 100\% & 67\% & 67\% \\ tails / heads & 0\% & 33\% & 0\% \\ tails / tails & 0\% & 67\% & 0\% \\ \end{array}$

Qubits actually work the same way! Using the same values as the coins, let’s say qubit A is a 0, and qubit B has a 1/3 chance of becoming a 0, and a 2/3 chance of becoming a 1. Converting those probabilities to amplitudes you could get $A=[1,0], B=[\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}]$ .

$\begin{array}{c|c|c|c|c} \text{Outcome AB} & \text{Qubit A Amplitude} & \text{Qubit B Amplitude} & \text{Outcome Amplitude(A*B)} & \text{Outcome Probability}\\ \hline 00 & 1 & 1/\sqrt{3} & 1/\sqrt{3} & 33\% \\ 01 & 1 & \sqrt{2}/\sqrt{3} & \sqrt{2}/\sqrt{3} & 67\% \\ 10 & 0 & 1/\sqrt{3} & 0 & 0\%\\ 11 & 0 & \sqrt{2}/\sqrt{3} & 0 & 0\% \\ \end{array}$

Note that in both cases, the probabilities add up to 100% like you’d expect.

In the case of qubits, the resulting vector is $[\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}, 0, 0]$ , which is still normalized, and represents the amplitudes for the 4 possible states of those two qubits: 00, 01, 10 and 11.

When working with one qubit (or coin), there are 2 possible outcomes 0 or 1 (heads or tails). When working with two qubits (or coins), there are four possible outcomes 00, 01, 10, 11 (heads/heads, heads/tails, tails/heads, tails/tails). When working with three qubits or coins, there are eight possible outcomes: 000,001,010 … 111.

With both coins and qubits there are $2^N$ possibilities, where $N$ is the number of qubits or coins you have.

If you had 8 qubits, you’d have to use a 256 dimensional vector to describe the possibilities, since $2^8$ is 256. When performing quantum computing with 8 qubits, you only have to deal with the 8 qubits. When simulating quantum computing on a regular, classical computer, you have to deal with the 256 amplitudes. This kind of gives a glimpse at how quantum computers can be faster than regular computers at certain things. There is an economy of scale working against us on regular computers simulating quantum computing.

The method we used to combine the probabilities of single qubits into an amplitude vector representing multiple qubits is called the Kronecker product. That’s just a fancy way of saying we have to multiply everything from the first vector by everything from the second vector, to get a third vector that is bigger than the first two. You’ll see it represented like this: $A \otimes B$ and while it’s similar to the “outer product” and even uses the same symbol (!), it gives a slightly different result versus if you did an outer product and then vectorized the matrix.

The Kronecker product of vectors works like this:

$\begin{bmatrix} A_1 \\ A_2 \\ ... \\ A_M \\ \end{bmatrix} \otimes \begin{bmatrix} B_1 \\ B_2 \\ ... \\ B_N \\ \end{bmatrix} = \begin{bmatrix} A_1 B_1 \\ A_1 B_2 \\ ... \\ A_1 B_N \\ A_2 B_1 \\ A_2 B_2 \\ ... \\ A_2 B_N \\ ... \\ A_M B_1 \\ A_M B_2 \\ ... \\ A_M B_N \\ \end{bmatrix}$

Let’s show an example involving two qubits.

The first qubit has a 75% chance of being heads so it’s amplitude is $[\sqrt{3}/2,1/2]$ . The second qubit has a 2/3 chance of being heads so has an amplitude of $[\sqrt{2}/\sqrt{3}, 1/\sqrt{3}]$ .

To calculate the amplitude vector representing these two qubits, we do a kronecker product:
$\begin{bmatrix} \cfrac{\sqrt{3}}{2} \\ \cfrac{1}{2} \\ \end{bmatrix} \otimes \begin{bmatrix} \cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{\sqrt{3}} \\ \end{bmatrix} = \begin{bmatrix} \cfrac{\sqrt{3}}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{\sqrt{3}}{2}\cfrac{1}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{1}{\sqrt{3}} \\ \end{bmatrix}$

If you simplify that, you get:
$\begin{bmatrix} \cfrac{1}{\sqrt{2}} \\ \cfrac{1}{2} \\ \cfrac{1}{\sqrt{6}} \\ \cfrac{1}{2\sqrt{3}} \\ \end{bmatrix}$

or horizontally for easier reading:
$\begin{bmatrix} \cfrac{1}{\sqrt{2}} & \cfrac{1}{2} & \cfrac{1}{\sqrt{6}} & \cfrac{1}{2\sqrt{3}} & \end{bmatrix}$

Squaring those values to get the probabilities of each state, we get the below, which you might notice adds up to 100%, meaning the vector is still normalized!
$\begin{bmatrix} 50\% & 25\% & 17\% & 8\% \end{bmatrix}$

This generalizes for larger numbers of qubits, so you could use the Kronecker product to combine a vector describing 4 qubits with a vector describing 3 qubits, to come up with a vector that describes 7 qubits (which would have 128 amplitudes in it, since 2^7 = 128!).

The kronecker product also generalizes to matrices, which we will talk about shortly.

To be able to work with multiple qubits in a quantum circuit, you need to represent all qubits involved in a single vector. Doing this, while being the correct thing to do, also allows entanglement to happen, which we will talk about next!

Entanglement – Simpler Than You Might Think!

Time to demystify quantum entanglement!

Quantum entanglement is the property whereby two qubits – which can be separated by any distance, such as several light years – can be observed and come up with the same value (0 or 1). Whether they are 0s or 1s is random, but they will both agree, when entangled in the way that causes them to agree (you could alternately entangle them to always disagree).

Interestingly, in 1965 John Bell proved that the mechanism that makes this happen is NOT that the qubits share information in advance. You can read more about that here – it’s near the end: Ars Technica – A tale of two qubits: how quantum computers work.

A common misconception is that this means that we can have faster than light (instantaneous) communication across any distance. That turns out not to be true, due to the No-Communication-Theorem (Wikipedia).

Interestingly though, you can use separated entangled qubits in separated quantum circuits to do something called Quantum Pseudo Telepathy (Wikipedia), which lets you do some things that would otherwise be impossible – like reliably winning a specially designed game that would otherwise be a game of chance.

I don’t yet understand enough about Quantum Pseudo Telepathy to see why it isn’t considered communication. I also have no idea how entanglement is actually “implemented” in the universe, but nobody seems to (or if they do, they aren’t sharing!). How can it be that two coins flipped on opposite sides of the galaxy can be guaranteed to both land with the same side facing up?

Despite those mysteries, the math is DEAD SIMPLE, so let me share it with you, it’ll take like one short paragraph. Are you ready?

Quantum computing works by manipulating the probabilities of the states of qubits. If you have two qubits, there are four possible sets of values when you observe them: 00, 01, 10, 11. If you use quantum computing to set the probabilities of the 01 and 10 states to a 0% chance, and set the probabilities of the 00 and 11 states to a 50% chance each, you’ve now entangled the qubits such that when you observe them, they will always agree, because you’ve gotten rid of the chances that they could ever DISAGREE. Similarly, if instead of setting the 01 and 10 states to 0%, you set the probability of the 00 and 11 states to 0%, you’d have entangled qubits which would always disagree when you observed them, because you’ve gotten rid of the chances that they could ever AGREE.

That’s all there is to it. Strange how simple it is, isn’t it? The table below shows how the only possible outcomes are that the qubits agree, but it is a 50/50 chance whether they are a 0 or a 1:

$\begin{array}{c|c|c} \text{Outcome} & \text{Amplitude} & \text{Probability}\\ \hline 00 & 1/\sqrt{2} & 50\% \\ 01 & 0 & 0\% \\ 10 & 0 & 0\% \\ 11 & 1/\sqrt{2} & 50\% \\ \end{array}$

Entanglement isn’t limited to just two qubits, you can entangle any number of qubits together.

Entanglement has a special mathematical meaning. If you can represent a state of a group of qubits by a kronecker product, they are not entangled. If you CAN’T represent a state of a group of qubits by a kronecker product, they ARE entangled. These two things – entanglement and lack of kronecker product factorability (made that term up) – are the same thing.

As an example, what two vectors could you use the kronecker product on to get the entangled two qubit state $1/\sqrt{2}(|00\rangle+|11\rangle)$ (or in vector form $[1/\sqrt{2}, 0, 0, 1/\sqrt{2}]$ )? You’ll find there aren’t any! That state is the entangled state where the two qubits will always have the same value when you observe them.

Entangled qubits are nothing special in quantum circuits. You don’t have to take any special precautions when working with them. They are an interesting byproduct of quantum computing, and so basically are something neat that comes out of your quantum circuit, but they don’t require any extra thought when using them within your circuits. Don’t worry about them too much (:

Multi Qubit Gates

Let’s have a look at some multi qubit gates! (These are again from Wikipedia: Quantum Gate)

Swap Gate

That might seem weird, but the probability of the $|00\rangle$ state is the probability of the first qubit being 0 added to the probability of the second qubit being 0. If those probabilities swap, they still add to the same value, so this probability is unaffected. It’s the same situation for the $|11\rangle$ state.

Here’s the matrix for the swap gate:
$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Square Root of Swap Gate

I don’t really understand a usage case for this gate myself, but it apparently does a half way swap between two qubits. That once again modifies the probabilities of the $|01\rangle, |10\rangle$ states, but leaves state $|00\rangle$ and $|11\rangle$ alone again, for the same reason as the swap gate.

Wikipedia says that if you have this gate, and the single qubit gates, that you can do universal quantum computation. In other words, you can build a fully functional quantum computer using just this and the single qubit gates.

Here’s the matrix:
$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \frac{1}{2}(1+i) & \frac{1}{2}(1-i) & 0 \\ 0 & \frac{1}{2}(1-i) & \frac{1}{2}(1+i) & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Controlled Not Gate

The controlled not (CNOT) gate flips the value of the second qubit if the first qubit is true. This is a logic / flow control type of gate.

Interestingly, this gate is also useful for creating entangled qubits, which you’ll be able to see lower down!

Here is the matrix:
$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}$

The controlled not gate is also referred to as the quantum XOR gate. To see why, check out the truth table below for this gate:
$\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 00 & 00 \\ 01 & 01 \\ 10 & 11 \\ 11 & 10 \\ \end{array}$

If you look at the right most bit of the outputs, you’ll notice that it’s the XOR of the two input bits!

All quantum gates need to be reversible, and having these two qubits be the output of a hypothetical quantum XOR gate allows that to happen. If you look at the right most bit of the inputs, you’ll notice that it is also the XOR of the two output bits. It’s bidirectional, which is kind of weird and kind of interesting 😛

Generalized Control Gate

You can actually convert any single qubit gate into a controlled gate. That makes a 2 qubit gate which only does work on the second qubit if the first qubit is true.

How you do that is you make a 4×4 identity matrix, and make the lower 2×2 sub-matrix into the single qubit matrix you want to use.

In other words, if your single qubit matrix is this:
$\begin{bmatrix} U_{00} & U_{01} \\ U_{10} & U_{11} \\ \end{bmatrix}$

Then the controlled version of the matrix would be this:
$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & U_{00} & U_{01} \\ 0 & 0 & U_{10} & U_{11} \\ \end{bmatrix}$

Pretty simple right?

Toffoli Gate

The Tofolli gate is a 3 qubit gate that is also known as the CCNOT gate or controlled controlled not gate. It flips the third qubit if the first two qubits are true.

It’s matrix looks pretty uninteresting:
$\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \end{bmatrix}$

This is also known as the quantum AND gate. If you look at the truth table below, you’ll see that when the input has the right most qubit of 0, that in the output, the right most qubit will be the AND value of the two left qubits. We need the three bits of input mapping to three bits of output like this so that the gate is reversible.

$\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 000 & 000 \\ 001 & 001 \\ 010 & 010 \\ 011 & 011 \\ 100 & 100 \\ 101 & 101 \\ 110 & 111 \\ 111 & 110 \\ \end{array}$

Fredkin Gate

The Fredkin gate is a controlled swap gate. Here’s the matrix:
$\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}$

Just like we talked about how to make a generalized controlled gate acting on two qubits, you should be able to notice that this gate is just a specific version of a generalized controlled 3 qubit gate. You take an 8×8 identity matrix, and make the lower right 4×4 matrix be the 2 qubit gate that you want to add a control on. In this case, the lower right 4×4 matrix is just the swap gate we mentioned earlier in the post (:

Circuit To Entangle Qubits

The circuit to entangle qubits is pretty simple, so let’s start with that as the first quantum circuit we look at. It takes in two qubits. The first qubit is put through a Hadamard gate, and then both qubits are put through a controlled not gate. That circuit looks like this (image courtesy of Quantum Circuit Simulator):

The value you plug in for the second qubit determines what type of entanglement you get out. Setting the second qubit to 0, you will get entangled qubits which always agree when they are observed. Setting it to 1, you will get entangled qubits which always disagree when they are observed. The first qubit controls whether the phase of each state matches or mismatches. Note that these are the four Bell States (Wikipedia: Bell State).

$\begin{array}{c|c|c} \text{Input} & \text{Output In Ket Notation} & \text{Output As Vector} \\ \hline 00 & \frac{1}{\sqrt{2}}(|00\rangle+|11\rangle) & [1/\sqrt{2},0,0,1/\sqrt{2}] \\ 01 & \frac{1}{\sqrt{2}}(|01\rangle+|10\rangle) & [0,1/\sqrt{2},1/\sqrt{2},0] \\ 10 & \frac{1}{\sqrt{2}}(|00\rangle-|11\rangle) & [1/\sqrt{2},0,0,-1/\sqrt{2}] \\ 11 & \frac{1}{\sqrt{2}}(|01\rangle-|10\rangle) & [0,1/\sqrt{2},-1/\sqrt{2},0]\\ \end{array}$

In a quantum circuit, you can’t just apply a gate to an individual quabit at a time though. You have to make the matrix of your gate such that it applies to all qubits, applying the “identity” matrix to the qubits you don’t want to be affected by the gate.

So, how do we make a matrix that applies the Hadamard gate to qubit 1, and identity to qubit 2? You use the kronecker product!

Since we want to apply the Hadamard matrix to qubit 1 and identity to qubit 2, we are going to calculate $H \otimes I$ (If we wanted to apply the Hadamard gate to qubit 2 and identity to qubit 1 we would calculate $I \otimes H$ instead).

$H \otimes I = 1/\sqrt{2}* \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix}$

If you notice, the result of the kronecker product is just every value in the left matrix, multiplied by every value in the right matrix. Basically, the result is a 2×2 grid of identity matrices, where each of those identity matrices is multiplied by the corresponding value from the same cell in the left matrix. Since the left matrix has a 1 in all cells except the lower right, the same is true of the result… it’s a positive identity matrix in each cell, except the lower right one, which is a negative identity matrix. Hopefully that makes sense, it’s a lot easier than it sounds…

The second gate in the quantum circuit is the CNOT gate. We can actually multiply the gate we just made by the CNOT gate to represent the full quantum circuit as a single matrix. This uses regular matrix multiplication.

$1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}$

Lets plug some values into the circuit and see what comes out!

Lets start by plugging in 00. Our input qubits are $[1, 0]$ and $[1, 0]$ . The kronecker product of those two qubit vectors is $[1, 0, 0, 0]$ . Now let’s multiply that vector by the matrix of our quantum circuit.

$\begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix}$

Comparing this to the table showing how our input maps to output, we got the right answer! If you plugged in the other values listed, you would get the rest of the bell state entanglements.

Unentangling Qubits

All gates are reversible, so all circuits are reversible. To get the reverse circuit for a given matrix, you just get the inverse of the matrix. Since quantum gates (and circuits) are unitary matrices, taking the inverse of one of these matrices just means taking the conjugate transpose of the matrix. In other words, you take the transpose of the matrix, and then just negate the imaginary component of any complex numbers. In this example, there are no imaginary numbers, so you just take the transpose of the matrix.

Since our circuit matrix is this:

$\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}$

That means that the inverse must be this, since this is the (conjugate) transpose.

$\begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix}$

Let’s try it out by putting the output from last section in and seeing what comes out the other side:

$\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}$

It worked! We got our original input back.

Making Matrices for More Complex Circuits

We talked about how to make single qubit gates apply to specific qubits by using the kronecker product with identity to move the gate to the right location.

If you have 4 qubits and you want to apply some single qubit gate (say Hadamard) to the 3rd qubit, you would just do this to get the matrix:
$I \otimes I \otimes H \otimes I$

What if you want to do complex things with multi qubit gates though? Like what if you want to do a controlled not gate where you want the 2nd qubit to flip it’s value if the 4th qubit was true?

The answer is actually pretty simple… you use swaps gates to flip the values of the qubits so that your qubit values line up with the inputs of the gate, then you do the gate, and finally undo all the swaps to get the qubit values back to the right positions.

We need to swap qubits so that the 4th qubit value is in the 1st qubit slot, and the 2nd qubit value is in the 2nd qubit slot. We can do that with the following swaps:

Once we have done those swaps, we can do our controlled not, then do the swaps in reverse order to return the qubits to their original positions.

Here’s what the circuit looks like:

You’ll see the simpler version in circuit diagrams, but at least now you’ll know how to make things that look like this:

Below is the mathematical way that you would get the matrix representing this gate.

$I$ is the identity matrix, $S$ is the swap gate, and $C$ is the controlled not gate.

$M = \\ (I \otimes I \otimes S) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (C \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (I \otimes I \otimes S) \\$

Code

Here is some simple C++ to show both the entanglement circuit and the more complicated controlled not gate we described.

#include <stdio.h>
#include <vector>
#include <complex>
#include <assert.h>

typedef std::complex<float> TComplex;

//=================================================================================
struct SComplexVector
{
public:
    TComplex& Get (size_t i) { return v[i]; }

    const TComplex& Get (size_t i) const { return v[i]; }

    size_t Size() const
    {
        return v.size();
    }

    void Resize (size_t s)
    {
        v.resize(s);
    }

    void Print () const
    {
        printf("[");
        for (size_t i = 0, c = v.size(); i < c; ++i)
        {
            if (i > 0)
                printf(", ");
            const TComplex& val = Get(i);
            if (val.imag() == 0.0f)
            {
                if (val.real() == 0.0f)
                    printf("0");
                else if (val.real() == 1.0f)
                    printf("1");
                else
                    printf("%0.2f", val.real());
            }
            else
                printf("%0.2f + %0.2fi", val.real(), val.imag());
        }
        printf("]n");
    }

    std::vector<TComplex> v;
};

//=================================================================================
struct SComplexMatrix
{
public:
    TComplex& Get (size_t x, size_t y)
    {
        return v[y*Size() + x];
    }

    const TComplex& Get (size_t x, size_t y) const
    {
        return v[y*Size() + x];
    }

    // For an MxM matrix, this returns M
    size_t Size () const
    {
        size_t ret = (size_t)sqrt(v.size());
        assert(ret*ret == v.size());
        return ret;
    }

    // For an MxM matrix, this sets M
    void Resize(size_t s)
    {
        v.resize(s*s);
    }

    void Print() const
    {
        const size_t size = Size();

        for (size_t y = 0; y < size; ++y)
        {
            printf("[");
            for (size_t x = 0; x < size; ++x)
            {
                if (x > 0)
                    printf(", ");
                
                const TComplex& val = Get(x, y);
                if (val.imag() == 0.0f)
                {
                    if (val.real() == 0.0f)
                        printf("0");
                    else if (val.real() == 1.0f)
                        printf("1");
                    else
                        printf("%0.2f", val.real());
                }
                else
                    printf("%0.2f + %0.2fi", val.real(), val.imag());
            }
            printf("]n");
        }
    }

    std::vector<TComplex> v;
};

//=================================================================================
static const SComplexVector c_qubit0 = { { 1.0f, 0.0f } };  // false aka |0>
static const SComplexVector c_qubit1 = { { 0.0f, 1.0f } };  // true aka |1>

// 2x2 identity matrix
static const SComplexMatrix c_identity2x2 =
{
    {
        1.0f, 0.0f,
        0.0f, 1.0f,
    }
};

// Given the states |00>, |01>, |10>, |11>, swaps the |01> and |10> state
// If swapping the probabilities of two qubits, it won't affect the probabilities
// of them both being on or off since those add together.  It will swap the odds of
// only one of them being on.
static const SComplexMatrix c_swapGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 1.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    }
};

// Controlled not gate
// If the first qubit is true, flips the value of the second qubit
static const SComplexMatrix c_controlledNotGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f,
        0.0f, 0.0f, 1.0f, 0.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const SComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
SComplexVector KroneckerProduct (const SComplexVector& a, const SComplexVector& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexVector ret;
    ret.Resize(aSize*bSize);

    for (size_t i = 0, ic = aSize; i < ic; ++i)
    {
        for (size_t j = 0, jc = bSize; j < jc; ++j)
        {
            size_t n = i * bSize + j;
            ret.Get(n) = a.Get(i)*b.Get(j);
        }
    }
    return ret;
}

//=================================================================================
SComplexMatrix KroneckerProduct (const SComplexMatrix& a, const SComplexMatrix& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexMatrix ret;
    ret.Resize(aSize*bSize);

    for (size_t ax = 0; ax < aSize; ++ax)
    {
        for (size_t ay = 0; ay < aSize; ++ay)
        {
            const TComplex& aValue = a.Get(ax, ay);

            for (size_t bx = 0; bx < bSize; ++bx)
            {
                for (size_t by = 0; by < bSize; ++by)
                {
                    const TComplex& bValue = b.Get(bx, by);

                    size_t nx = ax*bSize + bx;
                    size_t ny = ay*bSize + by;

                    ret.Get(nx,ny) = aValue * bValue;
                }
            }
        }
    }

    return ret;
}

//=================================================================================
SComplexMatrix operator* (const SComplexMatrix& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexMatrix ret;
    ret.Resize(size);

    for (size_t nx = 0; nx < size; ++nx)
    {
        for (size_t ny = 0; ny < size; ++ny)
        {
            TComplex& val = ret.Get(nx, ny);
            val = 0.0f;
            for (size_t i = 0; i < size; ++i)
                val += a.Get(i, ny) * b.Get(nx, i);
        }
    }

    return ret;
}

//=================================================================================
SComplexVector operator* (const SComplexVector& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexVector ret;
    ret.Resize(size);

    for (size_t i = 0; i < size; ++i)
    {
        TComplex& val = ret.Get(i);
        val = 0;
        for (size_t j = 0; j < size; ++j)
            val += a.Get(j) * b.Get(i, j);
    }

    return ret;
}

//=================================================================================
int main (int argc, char **argv) {

    // 2 qubit entanglement circuit demo
    {
        // make the circuit
        const SComplexMatrix H1 = KroneckerProduct(c_hadamardGate, c_identity2x2);
        const SComplexMatrix circuit = H1 * c_controlledNotGate;

        // display the circuit
        printf("Entanglement circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(c_qubit0, c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit0, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }

    // 4 qubit demo: flip the second qubit if the fourth qubit is true
    {
        // make the circuit
        const SComplexMatrix cnot4qubit = KroneckerProduct(KroneckerProduct(c_controlledNotGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap12 = KroneckerProduct(KroneckerProduct(c_swapGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap23 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_swapGate), c_identity2x2);
        const SComplexMatrix swap34 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_identity2x2), c_swapGate);
        const SComplexMatrix circuit = 
            swap34 *
            swap23 *
            swap12 *
            swap23 *
            cnot4qubit *
            swap23 *
            swap12 *
            swap23 *
            swap34;

        // display the circuit
        printf("nFlip 2nd qubit if 4th qubit true circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }


    WaitForEnter();

    return 0;
}

Here is the first part of the output, which shows the results of the entangling circuit. You can see that the input gave the expected output Bell states:

Below is the second part of the output, which is the circuit that flips the 2nd qubit if the 4th qubit is true.

Each entry in the input and output vectors is the amplitude (probability) that the state it represents is true. The states start at the left with 0000, then 0001, then 0010 and continue until the very right most value which represents 1111.

If you look at the second input/output pair the input is [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0], which means it has a 100% chance of the 4 qubits being 0001 (aka the qubits represent the number 1). The output vector is [0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0], which means that it has a 100% chance of being 0101 (aka the qubits represent the number 5). Since the input did have the 4th bit set (4th from the left), it flipped the 2nd bit. So, you can see that it worked!

If you check all the input/output pairs, you’ll see that they all follow this rule.

We used only whole, real numbers, no using fractional probabilities, or imaginary amplitudes. What it does in those situations is a little bit harder to get intuition for, but rest assured that it does “the right thing” in those situations as well.

Next Up

Now that we have the basics of quantum computing down pretty well, it’s time to analyze a couple quantum algorithms to see how they work!

Quantum Computing For Programmers Part I: One Qubit

Are you a programmer? Do you have interest in learning how quantum computing works? Does hard core math and crazy abstract “philosophical” type physics questions about the nature of reality make you feel like you could never understand quantum computing?

If so, me too, welcome to the club! We are programmers, not mathematicians or physicists, but we are masters of our realm – computation and logic. With that, I say that quantum computing is ultimately meant for us, since it is computing and programming, so lets figure it out and do some interesting stuff 😛

Before we start, I want to mention that these posts should be able to be read stand alone, but if you want to get into the deeper details, you probably want to give this list of references I’m using a read at some point. Perhaps before reading my posts, or after, or both!
Quantum Computing References

Lastly, to be clear, quantum computers seem like they are a long way off from being common place objects. These posts will show how quantum computers work, and allow you to simulate quantum computers on your own computer with some simple C++. The code won’t run as fast as it would on a real quantum computer (obviously!), but it should let you better understand how Quantum Computing (QC) works and let you explore and experiment on your own machine.

Probabilities (aka SuperPositons)

At the core of QC is the qubit. A qubit is like a regular bit, in that it can have the value of 0 or 1, but unlike a regular bit, it can also be somewhere in the middle where it has a probability of being in each state.

That might sound strange, but think of a qubit like a coin. It can be heads up, or it can be heads down, or it can be in the air, waiting to land heads up or heads down.

When the coin is in the air, if it is perfectly balanced, it has a 50/50 chance of becoming heads or tails.

If the coin is not perfectly balanced (no real coin can be perfectly balanced!), it will have some bias towards heads or towards tails so will not be a pure 50/50 chance of heads or tails.

While the coin is on the ground, it is either a heads (0) or a tails (1), but while it’s in the air, you can imagine that it is a superposition of 0 and 1. When it lands, that superposition will collapse into a 0 or 1 value, which is also what happens when you “observe” or measure a qubit.

One interesting thing about quantum computing, and where part of it’s power comes from is that you can perform logic on superpositional values in a way that when you have your final superpositional result, you can measure it and get the result from your logic circuit as if the values had been fixed all the way through the system, instead of being in superpositions.

The bummer about this though is that since it’s all based on probabilities, your answer has a chance of being not the value you wanted to get. Quantum circuits / quantum algorithms will try to combat this by doing operations iteratively to make the probability of the desired answer go up, and the probability of the undesired answers go down.

They can also run the quantum algorithm multiple times, to take multiple samples and be more sure about their results.

You still with me? Pretty simple so far right?

Before moving on I also wanted to mention that regular computers – like the one you are using right now – also have a greater than zero chance that they will give you the wrong answer to a calculation. The error rate can increase when the components are too hot, but even otherwise, a stray particle of background radiation could hit your cpu and flip a bit, giving you the wrong answer. You could fight this by doing every calculation 3 times and going with majority rules, but even then, if two bits get flipped, the majority will be wrong. Nothing you can do can completely eliminate the random chance that your calculations will come up with the wrong answer, but the chances are low enough that we rely on computers to do the right things, even in the most critical of situations (like, not starting a thermonuclear war).

With quantum computers you could get that same level of assurance. You can never completely eliminate the chance that a calculation is wrong, but you can definitely be sure with as much accuracy as you can with a standard computer, so that shouldn’t really be an issue in practice, for the cases where you need “certainty”.

Probability Vectors (aka Amplitude Vectors)

Using the coin flip scenario above, we could describe the probability of a coin landing heads or tails as a vector where the first element is the chance that it will land heads, and the second element is the chance that it will land tails.

A perfectly balanced coin would look like this for instance: $[0.5,0.5]$

If the coin was 75% likely to land heads, and only 25% likely to land tails the vector would look like this: $[0.75,0.25]$

You might notice that the elements of the vector have to add up to 1. This makes sense because there are only two options and one of the two MUST happen (unless it lands perfectly on edge), so they better add up to 1! The technical term for this is that the vector must have an L1 Norm of 1.

Qubits describe their probabilities a little bit differently though. Instead of storing the probability of them being 0 or 1, they store what’s called the amplitude, which when squared gives the probability. Doing this lets them do something really important that I’ll talk about lower down, but for now, you can hopefully just accept that this is how it works.

Consider a qubit having a 50% chance that it’s a 0, and a 50% chance that it’s a 1. We need some number that when squared gives 0.5 aka 1/2. That value is $1/\sqrt{2}$ .

So, for a qubit that has an even chance of being a 0 or 1, a vector to represent it is $[1/\sqrt{2}, 1/\sqrt{2}]$ .

Then, a vector describing a qubit that has a 75% chance being a 0 and a 25% chance of being a 1 would be $[\sqrt{3}/2, 1/2]$ . If you square the first number you can see that you get 3/4 (75% chance) and squaring the second number gives you 1/4 (25% chance).

I got those vectors by solving the simple equation:
$x^2 = P$

Where $P$ is the desired probability from 0 to 1.

You might now notice that the length of a qubit amplitude vector has to be 1. In other words, it’s a normalized vector. The technical term for this is that the vector must have an L2 norm of 1.

Still nothing too crazy going on…

Ket Notation

There is some notation you’ll come across a lot when reading about quantum computing / mechanics / etc that looks like this:
$1/\sqrt{2}(|0\rangle + |1\rangle)$

That is “Ket Notation” and comes from “Bra-Ket Notation”. It’s bark is worse than it’s bite.

If you multiply it out, you get this:
$1/\sqrt{2} * |0\rangle + 1/\sqrt{2} * |1\rangle$

All that means is that the “0” state of the vector has an amplitude of $1/\sqrt{2}$ and so does the “1” state. Writing that as a vector, we get what we already saw above for the qubit which had 50/50 odds of being a 0 or a 1:
$[1/\sqrt{2}, 1/\sqrt{2}]$

Here is the ket notation for the qubit which had a 75% chance of being a 0 and a 25% chance of being a 1:
$1/2(\sqrt{3}|0\rangle + |1\rangle)$

You can multiply that out to get this:
$\sqrt{3}/2*|0\rangle + 1/2*|1\rangle$

Which translates to the vector we saw before:
$[\sqrt{3}/2, 1/2]$

Lastly, you might see it shown like this:
$|0\rangle$

All that means is that the “0” state is 1. Since the “1” state isn’t listed, it has a value of 0. That is the same as this vector:
$[1, 0]$

Ket notation is used because you only have to list the states that actually have values, and not the ones that have zeros, making a more compact representation. That will be more useful when we move into a larger number of qubits.

Note that the mapping of which state belongs to which spot in the vector is completely arbitrary. I chose to make the left value be the “0” state and the right value be the “1” state, but it really doesn’t matter how you define it, so long as you are consistent.

Phase

Phase is the reason that qubits store amplitude, which square to probabilities, instead of storing probabilities themselves.

I said that the amplitude vector for a qubit which has a 50/50 chance of being a 0 or a 1 was this:
$[1/\sqrt{2}, 1/\sqrt{2}]$

I lied a bit though, and that is only one possible answer. Here’s another that has a 50/50 chance:
$[-1/\sqrt{2}, 1/\sqrt{2}]$

This gets interesting when you add two vectors together. First lets add two amplitude vectors which have the same phases for each state:
$[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, 1/\sqrt{2}] = [\sqrt{2}, \sqrt{2}]$

Each component added together.

Now, let’s add two amplitude vectors where the $|1\rangle$ state has opposite phase in one of the vectors:
$[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, -1/\sqrt{2}] = [\sqrt{2}, 0]$

You can see that the $|0\rangle$ states added together while the $|1\rangle$ states subtracted and resulted in zero.

This is called destructive interference, and is one of the things that makes quantum computing powerful. We can change a qubit’s phase without affecting it’s value (or probability of values).

I left out another important amplitude vector for describing a qubit that has a 50/50 chance of being a 0 or a 1:
$[1/\sqrt{2}, 1/\sqrt{-2}]$

Which can also be written as:
$[1/\sqrt{2}, 1/(\sqrt{2}*i)]$

Yep, that is an imaginary number in the 1 state!

Since amplitude squared is probability, it means we can also use imaginary numbers for amplitude vector values. If you know that the term phase relates to angles and rotations, you might want to have a look at these two things to follow the rabbit hole a bit deeper. Totally optional (;
Using Imaginary Numbers To Rotate 2D Vectors
Wikipedia: Bloch Sphere

However, since imaginary numbers can be involved, squaring amplitudes to get probabilities isn’t enough. Like for example with the vector $[0,i]$ , if you square the $|1\rangle$ state to get the probability, you get -1, or -100%. That isn’t right!

The real operation to getting probability from amplitude is you multiply the amplitude by it’s complex conjugate. In other words, if you have a complex number, you multiply the imaginary part by -1 to get the conjugate, and multiply by that.

As an example, if you have $3+5i$ , the complex conjugate is $3-5i$ .

In the example of the vector $[0,i]$ , you get the probability of the $|1\rangle$ state by multiplying $i$ by $-i$ to get 1, or 100%.

Complex conjugate sounds scary, but hopefully you can see it isn’t as scary as it sounds. You just flip the sign of the imaginary component of the complex number.

Common One Qubit Logic Gates

It’s finally time to dive into some logic gates so we can actually do some quantum programming!

Real quickly I want to mention that we are going to be simulating quantum computing by multiplying the qubit amplitude vectors by matrices. The matrices themselves are unitary matrices, which preserves the length of the vectors (keeping them normalized).

One requirement of quantum gates is that they must be reversible, and unitary matrices are also reversible.

The matrices are square matrices and their dimensions will always be 2^N by 2^N where N is the number of qubits involved in the quantum circuit. With a single qubit, that means we will be working with 2×2 matrices which is fairly small.

When working with more qubits the size of the matrix grows very quickly though. Quantum computing is very fast at doing these “matrix multiplies” and is part of where their power comes from. Large matrix multiplies will be slow for us on regular (classical) computers, but the quantum computers would be able to do them very quickly. That’s where the difference is in our simulations versus the real deal.

Anyways, let’s get onto some specific logic gates! (These are from Wikipedia: Quantum Gate)

Not Gate (Pauli-X Gate)

This is a not gate, but is also described as a 180 degree rotation around the x axis of the bloch sphere.

$\begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}$

You might notice that it looks like a backwards identity matrix. That is exactly how it works too… if you multiply a 2 element vector by that matrix, it will flip the elements in that vector!

In the case of our qubit amplitude vectors, this has the result of swapping the probability that the qubit will be 0, with the probability that the qubit will be 1.

If the qubit is not in a superposition, and instead is actually just a 0 or a 1, it will act exactly like a traditional NOT gate and flip it from one value to the other.

If the qubit is in a superposition, it will just flip the probabilities of each state.

This maps the $|0\rangle$ state to $|1\rangle$ and vice versa.

Pauli-Y Gate

This is described as a 180 degree rotation around the y axis of the bloch sphere, and maps $|0\rangle$ to $i|1\rangle$ and $|1\rangle$ to $-i|0\rangle$ .

$\begin{bmatrix} 0 & -i \\ i & 0 \\ \end{bmatrix}$

It’s like a NOT but also adjusts phase.

Pauli-Z Gate

This is described as a 180 degree rotation around the z axis of the bloch sphere. All it does in practice is negate the phase of the $|1\rangle$ state.

$\begin{bmatrix} 1 & 0 \\ 0 & -1 \\ \end{bmatrix}$

General Phase Adjustment Gates

You can adjust the phase of individual gates by using a matrix like this one, which is a more general version of the Pauli-Z gate, and adjusts the phase of of the $|1\rangle$ state to whatever value is desired, without affecting the probabilities of the qubit values.

$\begin{bmatrix} 1 & 0 \\ 0 & e^{i\theta} \\ \end{bmatrix}$

The value $e^{i\theta}$ ends up being a complex number, that can also be written like this:
$cos(\theta)+i sin(\theta)$

where $\theta$ is any angle in radians.

You’ll find that circuits commonly only adjust the $|1\rangle$ state. This is because the absolute phase of the individual states don’t matter (since they don’t affect probabilities), but what does matter is the difference in phase between the $|0\rangle$ state and the $|1\rangle$ state, since that can cause constructive or destructive interference. For this reason, you’ll often see the $|0\rangle$ state without a phase value, and only the $|1\rangle$ state will get it’s phase modified during calculations.

Hadamard Gate (Interesting Gate!)

Ok so the NOT gate is somewhat useful. You are probably thinking the others might come in handy if you need to adjust the phase of a qubit, but so far, nothing has been that spectacular for single qubit gates.

Well, here’s the more interesting gate.

What this gate does is if you pass it a pure 0 or 1, it will output a value that has a 50% chance of being a 0 or a 1.

The interesting part happens when you put that output through a Hadamard gate again – you get the original value of 0 or 1 out!

It can do this because it stores the info about the original value as phase information. When the input value is $|0\rangle$ , the Hadamard gate outputs $[1/\sqrt{2}, 1/\sqrt{2}]$ which has matching phases for each state. When the input value is $|1\rangle$ , the gate outputs $[1/\sqrt{2}, -1/\sqrt{2}]$ which has opposite phases for each state.

$1/\sqrt{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix}$

This gate is a crucial part in both Grover’s algorithm – which can search an unsorted list in $O(\sqrt{N})$ , as well as in Shor’s algorithm, which can factor large numbers much faster than classical computers, putting certain types of cryptography at risk for being cracked by quantum computers.

It has some other uses too, which you can check out here:
Does a Hadamard Gate have uses outside of pure and evenly mixed states?

Code

Here is some example code to show some single qubit circuits in action.

#include <stdio.h>
#include <array>
#include <complex>

typedef std::array<std::complex<float>, 2> TQubit;
typedef std::array<std::complex<float>, 4> TComplexMatrix;
const float c_pi = 3.14159265359f;

//=================================================================================
static const TQubit c_qubit0 = { 1.0f, 0.0f };  // false aka |0>
static const TQubit c_qubit1 = { 0.0f, 1.0f };  // true aka |1>
static const TQubit c_qubit01_0deg = { 1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f) }; // 50% true. 0 degree phase
static const TQubit c_qubit01_180deg = { 1.0f / std::sqrt(2.0f), -1.0f / std::sqrt(2.0f) }; // 50% true. 180 degree phase

// A not gate AKA Pauli-X gate
// Flips false and true probabilities (amplitudes)
// Maps |0> to |1>, and |1> to |0>
// Rotates PI radians around the x axis of the Bloch Sphere
static const TComplexMatrix c_notGate =
{
    {
        0.0f, 1.0f,
        1.0f, 0.0f
    }
};
static const TComplexMatrix c_pauliXGate = c_notGate;

// Pauli-Y gate
// Maps |0> to i|1>, and |1> to -i|0>
// Rotates PI radians around the y axis of the Bloch Sphere
static const TComplexMatrix c_pauliYGate =
{
    {
        { 0.0f, 0.0f }, { 0.0f, -1.0f },
        { 0.0f, 1.0f }, { 0.0f, 0.0f }
    }
};

// Pauli-Z gate
// Negates the phase of the |1> state
// Rotates PI radians around the z axis of the Bloch Sphere
static const TComplexMatrix c_pauliZGate =
{
    {
        1.0f, 0.0f,
        0.0f, -1.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const TComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
TQubit ApplyGate (const TQubit& qubit, const TComplexMatrix& gate)
{
    // multiply qubit amplitude vector by unitary gate matrix
    return 
    {
        qubit[0] * gate[0] + qubit[1] * gate[1],
        qubit[0] * gate[2] + qubit[1] * gate[3]
    };
}

//=================================================================================
int ProbabilityOfBeingTrue (const TQubit& qubit)
{
    float prob = std::round((qubit[1] * std::conj(qubit[1])).real() * 100.0f);
    return int(prob);
}

//=================================================================================
TComplexMatrix MakePhaseAdjustmentGate (float radians)
{
    // This makes a gate like this:
    //
    // [ 1  0             ]
    // [ 0  e^(i*radians) ]
    //
    // The gate will adjust the phase of the |1> state by the specified amount.
    // A more general version of the pauli-z gate

    return
    {
        {
            1.0f, 0.0f,
            0.0f, std::exp(std::complex<float>(0.0f,1.0f) * radians)
        }
    };
}

//=================================================================================
void Print (const TQubit& qubit)
{
    printf("[(%0.2f, %0.2fi), (%0.2f, %0.2fi)] %i%% true",
        qubit[0].real(), qubit[0].imag(),
        qubit[1].real(), qubit[1].imag(),
        ProbabilityOfBeingTrue(qubit));
}

//=================================================================================
int main (int argc, char **argv)
{
    // Not Gate
    {
        printf("Not gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn");
    }

    // Pauli-y gate
    {
        printf("Pauli-y gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn");
    }

    // Pauli-z gate
    {
        printf("Pauli-z gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn");
    }

    // 45 degree phase adjustment gate
    {
        printf("45 degree phase gate:n  ");
        TComplexMatrix gate = MakePhaseAdjustmentGate(c_pi / 4.0f);

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn");
    }

    // Hadamard gate
    {
        printf("Hadamard gate round trip:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn");
    }

    // 1 bit circuit
    // Hadamard -> Pauli-Z -> Hadamard
    {
        printf("Circuit Hadamard->Pauli-Z->Hadamard:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n");
    }

    WaitForEnter();

    return 0;
}

Here’s the output of the program, showing the qubit circuits in action:

Next Up

In the next part, we’ll look at how multiple qubits work together in quantum circuits so we can do more interesting things. We might need a quick post explaining the Bloch sphere before that though. It’s a fairly important thing, but this post is already enough to digest, so I didn’t include it.

Solving Nested Modulus Equations

$x \% 2 = 1$

The above equation is pretty easy to solve, it just means that x is any value that when you divide by 2, gives a remainder of 1. x is all odd numbers. The more formal answer is:

$x \equiv 1 + 2\mathbb{Z}$

That reads as “x is equivalent to 1 plus 2 times any integer”. x has infinitely many solutions, so long as they fit that constraint.

That’s easy enough to figure out, but how would you solve this equation?

$(x \% 5) \%2 = 1$

or this one?

$((x \% 7) \% 5) \%2 = 1$

One Level (Simple Case)

There is something called the quotient remainder theorem that lets us change an equation like this:

$A \% B = C$

Into one like this:

$A \equiv C + B \mathbb{Z}$

The Z means “all integers” which implies that there are infinitely many solutions to A. This reads as “A is equivelant to C plus B times any integer”.

Let’s work out a couple examples.

$x \% 5 = 3$

That transforms to:

$x \equiv 3 + 5 \mathbb{Z}$

If you try plugging any positive or negative integer in place of Z, you’ll see that you get a number that satisfies the first equation.

Here’s another one:

$x \% 538 = 211$

That transforms to:

$x \equiv 211 + 538 \mathbb{Z}$

Once again you can plug any positive or negative integer in for Z and get a value for x that satisfies the original equation.

Pretty easy and pretty handy right? Let’s get a little more complex.

Two Levels

Let’s say you want to solve this equation from the intro.

$(x \% 5) \%2 = 1$

The first thing you might do is transform it to the below.

$x \% 5 \equiv 1 + 2 N_1$ where $N_1 \in \mathbb{Z}$ .

What now? Can we transform it again? If we do, we get this:

$x \equiv 1 + 2 N_1+ 5 N_2$ where $N_i \in \mathbb{Z}$ .

plugging 1 in for the N1 and N2, we get 8, which does satisfy the original equation.

If we plug 3 in for N1 and 1 in for N2 though, we get 12 which DOESN’T satisfy the original equation. What gives??

Well it turns out there are subtle restrictions on our equations that got lost in the shuffle. Let’s start over…

$(x \% 5) \%2 = 1$

In effect, what that equation says is “Something divided by two gives a remainder of 1” where the something happens to be x % 5, but we don’t really care about what the something actually is right now.

The equation also implies that the right side of the equation is a value modulo two, and it’s only valid values are {0,1}. Another way to write this is to say that the right side $\in \mathbb Z_2$ which reads as “in all integers mod 2”.

1 is obviously in the set of valid values {0,1}, so we don’t need any special notation just yet, but let’s keep it in mind as we move onto the next step.

$x \% 5 \equiv 1 + 2 \mathbb Z$

The above says that x divided by 5 gives a remainder that is 1 plus two times all integers. However there is a catch. There right side of the equation is a mod 5 value, which means it’s only valid values are {0,1,2,3,4}. In other words, the right side of the equation is $\in \mathbb Z_5$ .

This has an effect of implicitly limiting the valid values that we can plug in to Z. It limits it to values of Z where 1+2*Z is in {0,1,2,3,4}. Let’s write this out a little more formally.

$x \% 5 \equiv a$ where $a = 1 + 2 \mathbb Z$ and $a \in \mathbb Z_5$

Let’s transform the equation again to solve for x:

$x \equiv a + 5\mathbb Z$ where $a = 1 + 2 \mathbb Z$ and $a \in \mathbb Z_5$

Note that there is no modulus on the left side of the equation now, so the right side of the equation is unlimited into the valid values it can have.

How do we use this resulting formula though to find valid values of x?

First we figure out the valid values of a. $a = 1 + 2 \mathbb Z$ where $a \in \mathbb Z_5$ , so it’s only valid values have to be a subset of {0,1,2,3,4}.

Plugging a -1 in for Z, we get a value for a of -1, which isn’t a valid answer so we throw it away.

Plugging a 0 in for Z, we get a value for a of 1. That is a valid answer so we keep it!

Plugging a 1 in gives us 3 which is valid, so we keep that too.

Plugging a 2 in gives us a value of 5, which isn’t valid, so we throw it away and know that we are done.

Our values for a are {1,3}.

The solution we got was this:
$x \equiv a + 5\mathbb Z$

Plugging in each value of a means that we get two equations as a solution for x. Valid values of x are the union of these two lists – both equations give valid answers.

$x \equiv 1 + 5\mathbb Z$
$x \equiv 3 + 5\mathbb Z$

The first equation gives us values of {…, -4, 1, 6, 11, …}
The second equation gives us values of {…, -2, 3, 8, 13, …}

Taking the union of those, our solutions for x are:
{…, -4, -2, 1, 3, 6, 8, 11, 13, …}

If you plug those into the original equation $(x \% 5) \%2 = 1$ , you’ll see that they are valid solutions! Those equations also represent ALL solutions, so they aren’t only valid, they are also complete.

Let’s step it up just one more notch.

Three Levels (Boss Mode)

Let’s solve the hardest equation from the intro:

$((x \% 7) \% 5) \%2 = 1$

This starts off just like the two leveled equation. Something on the left mod 2 is equation to 1. The 1 on the right side is $\in \mathbb Z_2$ , but since that’s obviously true for this constant, we don’t need to do anything special. So, we transform it to the below:

$(x \% 7) \% 5 \equiv a$
$a = 1 + 2 \mathbb Z$
$a \in \mathbb Z_5$

Then we do another transformation:
$x \% 7 \equiv b$
$b = a + 5 \mathbb Z$
$a = 1 + 2 \mathbb Z$
$a \in \mathbb Z_5$
$b \in \mathbb Z_7$

Then for the last transformation, we get this:
$x \equiv b + 7 \mathbb Z$
$b = a + 5 \mathbb Z$
$a = 1 + 2 \mathbb Z$
$a \in \mathbb Z_5$
$b \in \mathbb Z_7$

Now that we have our equations worked out, we need to start finding out what the values actually are. We start with a because it’s what b and x are based on, and is made up of constants. We get {1,3} again as the valid values of a. That gives us:
$x \equiv b + 7 \mathbb Z$
$b = a + 5 \mathbb Z$
$a = [1,3]$
$b \in \mathbb Z_7$

Now we want to plug each value of a into b and find all the valid values for b. When we plug 1 into b for a, it becomes $b = 1 + 5 \mathbb Z$ , $b \in \mathbb Z_7$ . The valid values for that are {1,6}.

When we plug 3 into b for a, it becomes $b = 3 + 5 \mathbb Z$ , $b \in \mathbb Z_7$ . The only valid value there is {3}.

That means that our valid values for b are {1,3,6}. It’s the union of the valid values we found for each value of a.

That makes our equations into this:
$x \equiv b + 7 \mathbb Z$
$b = [1,3,6]$

If we plug those values of b into the equation for x, we get these three equations:

$x \equiv 1 + 7 \mathbb Z$
$x \equiv 3 + 7 \mathbb Z$
$x \equiv 6 + 7 \mathbb Z$

The first equation gives us some x values of {…, -6, 1, 8, 15, …}. The second equation gives us x values of {…, -4, 3, 10, 17, …}. The third equation gives us x values of {…, -1, 6, 13, 20, …}.

Taking the union of all of those, we get…
{…, -6, -4, -1, 1, 3, 6, 8, 10, 13, 15, 17, 20, …}

If you plug those numbers into the original equation $((x \% 7) \% 5) \%2 = 1$ , you can see that they are valid solutions!

When you are performing this algorithm, if you ever hit a case where there are no valid answers in one of the steps – like say, there were no valid answers for equation b in the above – that means that there is no solution to your equation.

Sample Code

Since solving these equations can be pretty manual and tedious, here is some code that can solve these equations for you. If you were confused at all by the explanation above, the code may also be able to show you how it works better, especially if you step through it and see what it’s doing.

#include <stdio.h>
#include <array>
#include <vector>

//=================================================================================
void WaitForEnter()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
void AddSolutions(std::vector<int>& solutions, int add, int multiply, int mod)
{
    int Z = 0;
    while (1)
    {
        int value = multiply * Z + add;
        if (value < mod)
            solutions.push_back(value);
        else
            return;
        ++Z;
    }
}

//=================================================================================
void AddSolutionsFromSolutions (std::vector<int>& solutions, const std::vector<int>& adds, int multiply, int mod)
{
    std::for_each(
        adds.begin(),
        adds.end(),
        [&solutions, multiply, mod] (int add)
        {
            AddSolutions(solutions, add, multiply, mod);
        }
    );
}

//=================================================================================
int main(int argc, char **argv)
{
    // a,b,c,d...
    // ((x % a) % b) % c = d
    // etc.
    //const int c_values[] = {7, 5, 2, 1};
    const int c_values[] = { 12, 9, 7, 5, 3, 2 };
    const size_t c_numValues = sizeof(c_values) / sizeof(c_values[0]);
    
    // print the equation
    printf("Solving for x:n");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf("(");
    printf("x");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf(" %% %i)", c_values[i]);
    printf(" = %inn", c_values[c_numValues-1]);

    // print the solution equations
    for (size_t i = 0; i < c_numValues - 2; ++i)
    {
        char eqn = i > 0 ? 'A' + i - 1 : 'x';
        char eq = i > 0 ? '=' : 0xF0;

        printf("%c %c %c + %i*Z", eqn, eq, 'B' + i - 1, c_values[i]);
        if (i > 0)
            printf(" (in Z_%i)n", c_values[i-1]);
        else
            printf(" (in Z)n");
    }
    printf("%c = %i + %i*Z (in Z_%i)nn", 'A' + c_numValues - 2 - 1, c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);

    // gather up the permutation of solutions for each equation, starting with the lowest equation which has only constants
    std::array<std::vector<int>, c_numValues - 2> solutions;
    AddSolutions(solutions[c_numValues - 3], c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);
    for (size_t i = c_numValues - 3; i > 0; --i)
        AddSolutionsFromSolutions(solutions[i-1], solutions[i], c_values[i], c_values[i-1]);

    // Detect empty set
    if (solutions[0].size() == 0)
    {
        printf("No solutions!n");
        WaitForEnter();
        return 0;
    }

    // Print the more specific solution equations
    printf("x = ");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        if (i < c - 1)
            printf("%c U ", 'a' + i);
        else
            printf("%cn", 'a' + i);
    }
    std::sort(solutions[0].begin(), solutions[0].end());
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
        printf("%c = %i + %iZn", 'a' + i, solutions[0][i], c_values[0]);
  
    // Print specific examples of solutions (first few numbers in each)
    std::vector<int> xValues;
    printf("n");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        printf("%c = {..., ", 'a' + i);
        for (int z = 0; z < 3; ++z)
        {
            printf("%i, ", solutions[0][i] + z * c_values[0]);
            xValues.push_back(solutions[0][i] + z * c_values[0]);
        }
        printf("...}n");
    }

    // Show the list of specific values of X
    std::sort(xValues.begin(), xValues.end());
    printf("nx = {..., ");
    std::for_each(xValues.begin(), xValues.end(), [](int v) {printf("%i, ", v); });
    printf("...}n");

    // Test the solutions to verify that they are valid!
    bool valuesOK = true;
    for (size_t i = 0, c = xValues.size(); i < c; ++i)
    {
        int value = xValues[i];
        for (size_t j = 0; j < c_numValues - 1; ++j)
            value = value % c_values[j];

        if (value != c_values[c_numValues - 1])
        {
            printf("Solution %i is invalid!!n", xValues[i]);
            valuesOK = false;
        }
    }
    if (valuesOK)
        printf("nAll solutions shown tested valid!n");

    WaitForEnter();
    return 0;
}

Here’s an example run, where it solves a 5 level equation.

Links

If you know of a better way to solve this type of equation let me know. This is what I found when trying to figure this stuff out, but it doesn’t mean it’s the only way to do it.

The one thing I don’t like about this technique is that when you find solutions, it’s very manual, and very specific to the values used. Imagine if one of the modulus divisors was a variable instead of them all being constants. How would you solve it then? I’m not really sure…

Have some links!

Math Stack Exchange: How to solve nested congruences?

Khan Academy: The Quotient Remainder Theorem

Solving Simultaneous Congruences (Chinese Remainder Theorem)

$x \equiv 2 \pmod{3}$

The equation above is a congruence. What it says is that x % 3 is 2.

The equals sign with three bars means “is equivalent to”, so more literally what the equation says is “x is equivalent to 2, when we are looking at only the integers mod 3”.

5 is a solution, so is 8, so is 11 and so is -1!

The general solution for x is $x = 2 +3*N$ where N is any integer, aka $N \in \mathbb{Z}$ .

What if you were trying to find an integer value x that satisfied the multiple congruences below? How would you solve it?

$x \equiv 2 \pmod{3}$
$x \equiv 2 \pmod{4}$
$x \equiv 1 \pmod{5}$

You could use brute force, but you’d have to check every value from 0 to 60 (not including 60) since 60 is the least common multiple (In this case, you calculate LCM 3*4*5 = 60, since the numbers have a greatest common divisor of 1, aka they are co-prime).

As the number of equations grew, or the mod values were larger, brute force could grow to be a large problem very quickly.

Well luckily there is a better way called the Chinese Remainder Theorem.

BTW – the answer is 26. Or 26 mod 60 more correctly, or 26 + 60*N where N is any integer.

The Chinese Remainder Theorem

The CRT was first published sometime in the 3rd-5th centuries by Sun Tzu – but not the Sun Tzu that wrote “The Art of War”, that was a different person. It was later proven to be true by Gauss in 1801.

When trying to learn the CRT, I found this amazing video that explains it extremely well. You should give it a look: The Chinese Remainder Theorem made easy.

In that video, it talks about modular inversion. You can read about the details of that in the last post I made: Modular Multiplicative Inverse.

An Example

Let’s work through an example that you can follow along with just to make sure you understand it, since I pointed you elsewhere for explanations 😛

Here is our set of congruences:
$x \equiv 5 \pmod{7}$
$x \equiv 8 \pmod{11}$
$x \equiv 2 \pmod{3}$

First up, we want to break it into 3 sections, combined with addition.
$x = \_\_\_\_\_ + \_\_\_\_\_ + \_\_\_\_\_$

In the above, the first empty space will be the solution to the first equation, the second empty space will be the solution to the second equation, and the third empty space will be the solution to the third equation.

Each section is going to get a co-efficient which is the mods for every equation multiplied together, except the one we are trying to solve. We still have an unknown though per term that we are going to solve in a moment.

$x = 11*3*N_1 + 7*3*N_2 + 7*11*N_3$
or
$x = 33*N_1 + 21*N_2 + 77*N_3$

The reason we make those coefficients is because it isolates our terms so that we can solve each equation individually then combine them to get the combined solution.

This works because you can see that in the first term, no matter what we end up choosing for $N_1$ , it will always be a perfect multiple of 3 and 11, which means that the value will be zero in the other terms / other equations, so won’t affect whatever value we come up for them.

Next up, we need to solve for the N’s.

The first term needs to have an $N_1$ such that $33*N_1 \pmod 7 = 5$ .

How do we solve that? We could use brute force, testing every number 0 through 6 (otherwise known as 7-1), but in the last post we showed a better way to do it using the extended euclidean algorithm. So, that link once again is: Modular Multiplicative Inverse.

The answer when solving using inversion comes out to be 15. Note that our answer is in “mod 7” space, so you could use any value $1+7*I$ where $I \in \mathbb{Z}$ , but we can stick to using 15 to make it easier to follow.

That gives us this:
$x = 33*15 + 21*N_2 + 77*N_3$
or
$x = 495 + 21*N_2 + 77*N_3$

The value of N2 and N3 end up being -8 and -2 respectively. That gives us:
$x = 33*15 + 21*-8 + 77*-2$
or
$x = 495 - 168 - 154$
or
$x = 173$

Since this is just one of many solutions, the real answer is:
$x = 173 \pmod{231}$
or
$x = 173 + 231*N$ where $N \in \mathbb{Z}$

Let’s check our result and see if we got it right…

173 % 7 is indeed 5.
173 % 11 is 8.
173 % 3 is 2.

Woot, it worked!

Example Code

#include <stdio.h>
#include <algorithm>
#include <array>

// x = a mod m
struct SEquation
{
    int a;
    int m;
};
 
//=================================================================================
unsigned int ExtendedEuclidianAlgorithm (int smaller, int larger, int &s, int &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }
 
    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<int, 2> remainders = { larger, smaller };
    std::array<int, 2> ss = { 1, 0 };
    std::array<int, 2> ts = { 0, 1 };
    int indexNeg2 = 0;
    int indexNeg1 = 1;
 
    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        int newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        int newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];
 
        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);
            return remainders[indexNeg1];
        }
 
        // calculate this round's s and t
        int newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        int newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];
 
        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;
 
        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}
 
//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}
 
//=================================================================================
int main(int argc, char **argv)
{
    const SEquation equations[] =
    {
        { 2, 3 },
        { 2, 4 },
        { 1, 5 }
    };

    const int c_numEquations = sizeof(equations) / sizeof(equations[0]);

    // print out the equations
    printf("Solving for x:n");
    for (int i = 0; i < c_numEquations; ++i)
        printf("eq %i:  x = %i mod %in", i, equations[i].a, equations[i].m);
    printf("n");

    // make sure the m's are pairwise co-prime
    for (int i = 0; i < c_numEquations; ++i)
    {
        for (int j = i + 1; j < c_numEquations; ++j)
        {
            int s, t;
            int gcd = ExtendedEuclidianAlgorithm(equations[i].m, equations[j].m, s, t);
            if (gcd != 1)
            {
                printf("%i and %i are not co-prime (index %i and %i)n", equations[i].m, equations[j].m, i, j);
                WaitForEnter();
                return 0;
            }
        }
    }

    // calculate the coefficients for each term
    std::array < int, c_numEquations > coefficients;
    coefficients.fill(1);
    for (int i = 0; i < c_numEquations; ++i)
    {
        for (int j = 0; j < c_numEquations; ++j)
        {
            if (i != j)
                coefficients[i] *= equations[j].m;
        }
    }

    // now figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    int result = 0;
    for (int i = 0; i < c_numEquations; ++i)
    {
        int s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], equations[i].m, s, t);
        coefficients[i] *= t * equations[i].a;
    }

    // calculate result and simplify it to the smallest positive integer mod lcm
    // lcm is the product when they are pairwise coprime, as the gcd of any two is 1.
    int lcm = 1;
    for (int i = 0; i < c_numEquations; ++i)
    {
        lcm *= equations[i].m;
        result += coefficients[i];
    }
    result = result % lcm;
    if (result < 0)
        result += lcm;

    // print out the answer
    printf("x = %i mod %inor...n", result, lcm);
    printf("x = %i + %i*NnWhere N is any positive or negative integer.nn", result, lcm);

    // verify that our result is correct
    printf("Verifying Equations:n");
    for (int i = 0; i < c_numEquations; ++i)
        printf("eq %i:  %i mod %i = %i (%s)n", i, result, equations[i].m, result % equations[i].m, (result % equations[i].m == equations[i].a) ? "PASS" : "!!FAIL!!");

    WaitForEnter();
    return 0;
}

Here’s the output of the program:

Links

One final note. If the program can’t find an answer, it doesn’t necessarily mean that there is no answer. For instance, you could imagine multiplying everything by 2, which would make the modulus values not be co-prime (they would have 2 as a common denominator). That would make this algorithm fail, even though there was a valid answer.

You can try the method of successive substitution as an alternate method when that happens.

Wikipedia: The Chinese Remainder Theorem

Wikipedia: Method Of Successive Substitution

Also, it looks like Khan Academy has a good bit on modular math!

Khan Academy: What is Modular Arithmetic?