Matrix Form of Bezier Curves

Bezier curves are most often talked about either in terms of the De Casteljau algorithm, or in terms of a mathematical function (Bernstein Polynomials).

Every now and then though, you see people talking about Bezier curves being calculated via matrices. If you ever wondered what that was all about, this post should hopefully explain and demystify that a bit.

If you don’t know how to come up with the equation of a Bezier curve for any number of control points, you should give this a read first:
Easy Binomial Expansion & Bezier Curve Formulas

And if you are curious about the De Casteljau algorithm, you can learn about that here:
The De Casteljau Algorithm for Evaluating Bezier Curves

Ok, all read up on that stuff? Let’s get talking about Bezier curves in matrix form! There are shadertoy links at the end with working wegl glsl demos that include source code.

Making the Matrix Form of Bezier Curves

Coming up with the matrix for a Bezier curve is surprisingly easy. Keep in mind the matrix we are making is for glsl which is a column major matrix order, so you might have to adjust things if you are using a row major matrix order setup (mostly, just transpose the matrix).

The first step is to get the formula for a Bezier curve. We’ll work through the example using a quadratic Bezier curve with 3 control points A,B,C, so we start with the formula below:

f(t) = A*(1-t)^2 + B*2t(1-t) + C*t^2

The next step is to break the equation into one equation per term. Each term has a control point, so we are basically splitting the formula up so that we have one formula per control point.

A*(1-t)^2 \\ B*2t(1-t) \\ C*t^2

Next, we remove the control points and expand each term to get:

1-2t+t^2 \\ 2t-2t^2 \\ t^2

Now, explicitly values of all powers of t that are present:
1*t^0-2*t^1+1*t^2 \\ 0*t^0+2*t^1-2*t^2 \\ 0*t^0+0*t^1+1*t^2

Now the final step. Take the constants that multiply your powers of t and make a matrix out of them. You are done!

\begin{bmatrix} 1 & -2 & 1 \\ 0 & 2 & -2 \\ 0 & 0 & 1 \\ \end{bmatrix}

Using the Matrix Form

Using the matrix form of Bezier curves is also pretty simple.

First, we need to make a vector of the power series of our t value:

powerSeries = \begin{bmatrix} t^0 & t^1 & t^2 \\ \end{bmatrix}

Which can also be written as:

powerSeries = \begin{bmatrix} 1 & t & t^2 \\ \end{bmatrix}

You also need a vector of your control points:

controlPoints = \begin{bmatrix} A & B & C \\ \end{bmatrix}

You next perform this operation to get a result vector:

result = powerSeries * curveMatrix * controlPoints

Then, you add up all components of result to get the value of the curve at time t.

value = result[0] + result[1] + result[2]

All done!

Note that this is a one dimensional Bezier curve. You need to do this operation once per axis to get your final multi dimensional Bezier curve point.

If you are confused by that last line, check out this post: One Dimensional Bezier Curves

Multiplying the Control Points In

You might notice that if you are evaluating several points on the same curve that you are going to be multiplying the curveMatrix matrix by the controlPoints vector over and over. You can multiply the control points into the Bezier curve matrix to make the specific matrix for those control points if you want to. You multiply the columns of the matrix by the control points, and adjust the result calculation like the below.

// Multiply the control points into the curve matrix
curveMatrix[0] *= A;
curveMatrix[1] *= B;
curveMatrix[2] *= C;

// Use the curve matrix that has the control points baked in, to do less math to get the result vector.
// You would calculate the curve matrix once and re-use it multiple times of course!
vec3 result = powerSeries * curveMatrix;
float value = result.x + result.y + result.z;

Closing

You might wonder when you’d use the matrix form. One time to use the matrix form would be when you had fast matrix math support (like on the GPU). Another time to use the matrix form though is if you ever want to cut up a Bezier curve into multiple smaller sub curves. The matrix form can help make that easier, and you can read more about that here if you want: A Matrix Formulation of the Cubic Bezier Curve

Here are some shadertoys that show this all working in webgl/glsl pixel shaders, along with source code:

Shadertoy: 1D Linear Bezier Matrix Form
Shadertoy: 1D Quadratic Bezier Matrix Form
Shadertoy: 1D Cubic Bezier Matrix Form

Actually Making Signed Distance Field Textures With JFA

This post is an addendum to the last post where I say that you can make distance field textures with JFA but don’t fully explain how to make SIGNED distance field textures, which is what you really want.

If you want to go straight to a working demo with webgl pixel shader source code, here is the shadertoy: Shadertoy: JFA SDF Texture

If you naively use a distance transform to make a distance field texture, you’ll get an UNSIGNED distance field texture, where you only have the distance to the surface of the object from the outside, but won’t have the distance to the surface of the object from the inside.

This is important because signed distance field textures have both, and use bilinear interpolation of distance on each side of the shape surface to make a nice smooth line. Below is what happens when you try to use an unsigned distance field texture (aka the distance transform of the image, using JFA / Voronoi information), using the zero distance line as the surface of the object:

It looks ok (if not fairly pixelated), but you can really see it break down when you zoom in:

So you might say to yourself, maybe i need to keep the surface line at distance 0.5 instead of 0.0 so that there is distance information to interpolate? If you do that, the first thing you might notice is that the objects get fatter:

But it does look better when you zoom in, which is a plus:

The real issue is that you really just need the distance from each pixel to the surface of the object from both the inside and the outside. In our case, our Voronoi diagram we make with JFA only gives the distance from the outside. So what is the solution? At first I was thinking maybe you can get the gradient of this data at the point of each pixel and “push the zero line in” a little bit to give at least one pixel layer worth of inside data. However, a brilliant friend of mine came up with the actual solution: You invert your source data so empty space becomes seed, and seed becomes empty space, and you run JFA again to get the distance from the inside!

That actually works very well. It’s also very easy to combine them. You make a pixel shader that reads the data from the outside Voronoi diagram and the inside Voronoi diagram, calculate the output distance (0.5 + outsideDistance * 0.5 – insideDistance * 0.5), and output that 0 to 1 distance value in one or more of the color channels.

Here’s a glsl excerpt below, note that we divide the distance by 8 and clamp between 0 and 1 so that the data is suitable for a normalized color image (normalized as in the color channels can store values between 0 and 1):

// calculate distances from seed coordinates
float outsideDist = clamp(length(outsideSeedCoord-fragCoord) / 8.0, 0.0, 1.0);
float insideDist  = clamp(length(insideSeedCoord-fFragCoord)  / 8.0, 0.0, 1.0);
    
// calculate output distance
float signedDistance = 0.5 + outsideDist * 0.5 - insideDist * 0.5;
        
// set the color based on that distance
fragColor = vec4(signedDistance);

It actually looks a lot like the first image where we use the zero distance line of the unsigned distance field texture, so we still aren’t quite there:

When you zoom in, it looks a little better, but something still seems a bit off:

The final step to making this look good is to realize that the power of signed distance textures is in their ability to interpolate distance information well. When we have a full resolution texture, there is no interpolation going on. We actually need to decrease the size of our distance field texture to make it look better. If only all problems were solved by making textures smaller!

Here is the resulting image when making the distance field texture 1/4 as large on each axis (1/16th as big total):

And zooming in you can see that it scales very well. The zoom is a 20x magnification, on top of the magnification we already get from it being a larger texture:

And just to show the intermediary textures, here is the outside distance Voronoi diagram:

And the inside distance Voronoi diagram (The seed is in bright green, the dim green is the empty space that has distance information):

And here is the final distance field texture used to render the final result I showed above.

Zoomed in to show just how low resolution it is! This is the thing that looks like a + or a sword just left of middle.

Again, here is the shadertoy that does this technique, generating a signed distance field texture on the fly for randomly placed objects, and then using that signed distance field to render a larger image that you can further zoom in to:
Shadertoy: JFA SDF Texture

GPU Texture Sampler Bezier Curve Evaluation

Below is a paper I submitted to jcgt.org that unfortunately did not get accepted. Maybe next time!

The main idea of this paper is that bilinear interpolation can be equivalent to the De Casteljau algorithm, which means that if you set up a texture in a specific way, and sample from it at specific texture coordinates, that it will in fact give you Bezier curve points as output! It scales up for higher dimensional textures, as well as higher order curves.

The image below shows this in action for a cubic Bezier curve (3 control points) being stored and recalled from a 2×2 texture (there is actually a curve stored in each color channel).

This image is from an extension linked to lower down which applies the technique to surfaces and volumes:

The primary feedback from the reviewers and editor was that:

  • It was an interesting technique and they thought it was a paper worth reading.
  • The usage case was fairly limited though – basically only when your are compute bound in your shader program, and have some curve calculations to offload to the texture sampler. Or if you are already using a lookup texture and would benefit from fewer instructions and smaller lookup textures.
  • It could have been shorter due to the writing being shorter, but also it could have been less thorough. For instance, it didn’t need to show equivalence to both the De Casteljau’s algorithm as well as Bernstein polynomials, since it’s already known that those are equivalent.
  • They wanted some more performance details

I agree with the feedback, and don’t feel like taking the time to change and resubmit or submit else where, so I’m sharing it here on my blog. I hope you enjoy it and find it interesting (:

Here is the paper:
GPUBezier2016.pdf

Here is the supplemental materials (opengl and webgl source code):
SupplementalMaterials.zip

Here is the webgl demo from the supplemental materials, hosted on my site:
GPU Efficient Texture Based Bezier Curve Evaluation

Here are some working shadertoy demos of the technique:
Shadertoy: Mystery Curves – Quadratic
Shadertoy: Mystery Curves – Cubic
Shadertoy: Mystery Curves – Quartic
Shadertoy: Mystery Curves – Quintic

Extensions

Continuations of this work:

Failed Experiments

Continuations that didn’t work out:

What are your thoughts? Is this cool? Is it lame? Got some ideas to improve it? Leave a comment! (:

Normalized Vector Interpolation TL;DR

My blog posts often serve as “external memory”, allowing me to go back and remember how specific things work months or years after I spent the time to learn about them.

So far it’s worked amazingly well! Instead of having a hazy memory of “oh um… i did bicubic interpolation once, how does that work again?” I can go back to my blog post, find the details with an explanation and simple working code, and can very rapidly come back up to speed. I seriously recommend keeping a blog if you are a programmer or similar. Plus, you know you really understand something when you can explain it to someone else, so it helps you learn to a deeper level than you would otherwise.

Anyways, this is going to be a very brief post on vector interpolation that I want to commit to my “external memory” for the future.

This is an answer to the question… “How do I interpolate between two normalized vectors?” or “How do i bilinearly or bicubically interpolate between normalized vectors?”

As an answer I found the three most common ways to do vector interpolation:

  • Slerp – short for “spherical interpolation”, this is the most correct way, but is also the costliest. In practice you likely do not need the precision.
  • lerp – short for “linear interpolation”, you just do a regular linear interpolation between the vectors and use that as a result.
  • nlerp – short for “normalized linear interpolation” you just normalize the result of a lerp. Useful if you need your interpolated vector to be a normalized vector.

In practice, lerp/nlerp are pretty good at getting a pretty close interpolated direction so long as the angle they are interpolating between is sufficiently small (say, 90 degrees), and nlerp is of course good at keeping the right length, if you need a normalized vector. If you want to preserve the length while interpolating between non normalized vectors, you could always interpolate the length and direction separately.

Here is an example of the three interpolations on a large angle. Dark grey = start vector, light grey = end vector. Green = slerp, blue = lerp, orange = nlerp.

Here is an example of a medium sized angle (~90 degrees) interpolating the same time t between the angles.

Lastly, here’s a smaller angle (~35 degrees). You can see that the results of lerp / nlerp are more accurate as the angle between the interpolated vectors gets smaller.

If you do lerp or nlerp, you can definitely do both bilinear as well as bicubic interpolation since they are just regularly interpolated values (and then optionally normalized)

Using slerp, you can do bilinear interpolation, but I’m not sure how bicubic would translate.

I miss something important? Leave a comment and let me know!

Code

Here’s some glsl code for slerp, lerp and nlerp. This code is for vec2’s specifically but the same code works for vectors of any dimension.

//============================================================
// adapted from source at:
// https://keithmaggio.wordpress.com/2011/02/15/math-magician-lerp-slerp-and-nlerp/
vec2 slerp(vec2 start, vec2 end, float percent)
{
     // Dot product - the cosine of the angle between 2 vectors.
     float dot = dot(start, end);     
     // Clamp it to be in the range of Acos()
     // This may be unnecessary, but floating point
     // precision can be a fickle mistress.
     dot = clamp(dot, -1.0, 1.0);
     // Acos(dot) returns the angle between start and end,
     // And multiplying that by percent returns the angle between
     // start and the final result.
     float theta = acos(dot)*percent;
     vec2 RelativeVec = normalize(end - start*dot); // Orthonormal basis
     // The final result.
     return ((start*cos(theta)) + (RelativeVec*sin(theta)));
}

vec2 lerp(vec2 start, vec2 end, float percent)
{
     return mix(start,end,percent);    
}

vec2 nlerp(vec2 start, vec2 end, float percent)
{
     return normalize(mix(start,end,percent));    
}

Links

An interactive shadertoy demo I made, that is also where the above images came from:
Shadertoy: Vector Interpolation

Further discussion on this topic may be present here:
Computer Graphics Stack Exchange: Interpolating vectors on a grid

Good reads that go deeper:
Math Magician – Lerp, Slerp, and Nlerp
Understanding Slerp, Then Not Using It

Hiding a Lookup Table in a Modulus Operation

Lookup tables are a tool found in every programmer’s tool belt.

Lookup tables let you pre-calculate a complex calculation in advance, store the results in a table (an array), and then during performance critical parts of your program, you access that table to get quick answers to the calculations, without having to do the complex calculation on the fly.

In this post I’ll show a way to embed a lookup table inside of a single (large) number, where you extract values from that lookup table by taking a modulus of that number with different, specific values.

This technique is slower and takes more memory than an actual lookup table, but it’s conceptually interesting, so I wanted to share.

Also, I stumbled on this known technique while working on my current paper. The paper will make this technique a bit more practical, and I’ll share more info as soon as I am able, but for now you can regard this as a curiosity 😛

Onto the details!

1 Bit Input, 1 Bit Output: Pass Through

Let’s learn by example and start with a calculation that takes in an input bit, and gives that same value for an output bit. It’s just a 1 bit pass through lookup table.

\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 0 & 0 \\ 1 & 1 \\ \end{array}

To be able to convert that to something we can decode with modulus we have to solve the following equations:

x \% k_0 = 0 \\ x \% k_1 = 1

x is the number that represents our lookup table. k_0 and k_1 are the values that we modulus x against to get our desired outputs out.

It looks as if we have two equations and three unknowns – which would be unsolvable – but in reality, x is the only unknown. The k values can be whatever values it takes to make the equations true.

I wrote a blog post on how to solve equations like these in a previous post: Solving Simultaneous Congruences (Chinese Remainder Theorem).

You can also use this chinese remainder theorem calculator, which is handy: Chinese Remainder Theorem Calculator

The short answer here is that the k values can be ANY numbers, so long as they are pairwise co-prime to each other – AKA they have a greatest common divisor of 1.

If we pick 3 and 4 for k0 and k1, then using the chinese remainder theorem we find that x can equal 9 and the equations are true. Technically the answer is 9 mod 12, so 21, 33, 45 and many other numbers are also valid values of x, but we are going to use the smallest answer to keep things smaller, and more manageable.

So, in this case, the value representing the lookup table would be 9. If you wanted to know what value it gave as output when you plugged in the value 0, you would modulus the lookup table (9) against k0 (3) to get the output. If you wanted to know what value it gave as output when you plugged in the value 1, you would modulus the lookup table (9) against k1 (4) to get the output. The table below shows that it passes through the value in both cases like it should:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 9 \% 3 & 0\\ 1 & x \% k_1 & 9 \% 4 & 1\\ \end{array}

1 Bit Input, 1 Bit Output: Not Gate

Let’s do something a little more interesting. Let’s make the output bit be the reverse of the input bit. The equations we’ll want to solve are this:

x \% k_0 = 1 \\ x \% k_1 = 0

We can use 3 and 4 for k0 and k1 again if we want to. Using the Chinese remainder theorem to solve the equations gives us a value of 4 for x. Check the truth table below to see how this works:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 4 \% 3 & 1\\ 1 & x \% k_1 & 4 \% 4 & 0\\ \end{array}

1 Bit Input, 1 Bit Output: Output Always 1

What if we wanted the output bit to always be 1 regardless of input bit?

x \% k_0 = 1 \\ x \% k_1 = 1

Using 3 and 4 for our k values again, we solve and get a value of 1 for x. Check the truth table to see it working below:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 0 & x \% k_0 & 1 \% 3 & 1\\ 1 & x \% k_1 & 1 \% 4 & 1\\ \end{array}

Hopefully one bit input to one bit output makes sense now. Let’s move on (:

2 Bit Input, 1 Bit Output: XOR Gate

Things get a little more interesting when we bump the number of input bits up to 2. If we want to make a number which represents XOR, we now have 4 equations to solve.

x \% k_{00} = 0 \\ x \% k_{01} = 1 \\ x \% k_{10} = 1 \\ x \% k_{11} = 0

In general we will have 2^N equations, where N is the number of input bits.

You might have noticed that I use subscripts for k corresponding to the input bits that the key represents. This is a convention I’ve found useful when working with this stuff. Makes it much easier to see what’s going on.

Now with four equations, we need 4 pairwise coprime numbers – no number has a common factor with another number besides 1.

Let’s pull them out of the air. Umm… 3, 4, 5, 7

Not too hard with only two bits of input, but you can see how adding input bits makes things a bit more complex. If you wanted to make something that took in two 16 bit numbers as input for example, you would need 2^32 co-prime numbers, since there was a total of 32 bits of input!

When we solve those four equations, we get a value of 21 for x.

Notice how x is larger now that we have more input bits? That is another added complexity as you add more input bits. The number representing your program can get very, very large, and require you to use “multi precision integer” math libraries to store and decode the programs, when the numbers get larger than what can be held in a 64 bit int.

Boost has a decent library for this, check out boost::multiprecision::cpp_int, it’s what I use. You can download boost from here: http://www.boost.org/doc/libs/1_59_0/more/getting_started/windows.html

Anyhow, let’s check the truth table to see if our values work:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x \% k_{00} & 21 \% 3 & 0 \\ 01 & x \% k_{01} & 21 \% 4 & 1 \\ 10 & x \% k_{10} & 21 \% 5 & 1 \\ 11 & x \% k_{11} & 21 \% 7 & 0 \end{array}

Woot, it worked.

2 Bit Input, 2 Bit Output: OR, AND

What happens when we add another bit of output? Basically we just treat each output bit as it’s own lookup table. This means that if we have two output bits, we will have two numbers representing our program (one for each bit), and that this is true regardless of how many input bits we have.

Let’s make the left output bit (x_0 ) be the OR of the input bits and the right output bit (x_1 ) be the AND of the input bits.

That give us these two sets of equations to solve:

x_0 \% k_{00} = 0 \\ x_0 \% k_{01} = 1 \\ x_0 \% k_{10} = 1 \\ x_0 \% k_{11} = 1 \\ \\ x_1 \% k_{00} = 0 \\ x_1 \% k_{01} = 0 \\ x_1 \% k_{10} = 0 \\ x_1 \% k_{11} = 1 \\

We can use the same coprime numbers for our k values as we used in the last section (3,4,5,7). Note that we use the same k values in each set of equations. This is intentional and required for things to work out!

If we solve each set of equations we get 141 for x0, and 120 for x1.

Let’s see if that worked:

\begin{array}{c|c|c|c} \text{Input} & \text{Symbolic} & \text{Numeric} & \text{Output} \\ \hline 00 & x_0 \% k_{00}, x_1 \% k_{00} & 141 \% 3, 120 \% 3 & 00 \\ 01 & x_0 \% k_{01}, x_1 \% k_{01} & 141 \% 4, 120 \% 4 & 10 \\ 10 & x_0 \% k_{10}, x_1 \% k_{10} & 141 \% 5, 120 \% 5 & 10 \\ 11 & x_0 \% k_{11}, x_1 \% k_{11} & 141 \% 7, 120 \% 7 & 11 \end{array}

Hey, it worked again. Neat!

Example Code

Now that we have the basics worked out, here is some sample code.

The lookup table takes in 8 bits as input, mapping 0..255 to 0…2pi and gives the sine of that value as output in a float. So it has 8 bits of input and 32 bits of output.

#include <vector>
#include <boost/multiprecision/cpp_int.hpp>
#include <stdint.h>
#include <string.h>
#include <memory>

typedef boost::multiprecision::cpp_int TINT;
typedef std::vector<TINT> TINTVec;

const float c_pi = 3.14159265359f;

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
static TINT ExtendedEuclidianAlgorithm (TINT smaller, TINT larger, TINT &s, TINT &t)
{
    // make sure A <= B before starting
    bool swapped = false;
    if (larger < smaller)
    {
        swapped = true;
        std::swap(smaller, larger);
    }

    // set up our storage for the loop.  We only need the last two values so will
    // just use a 2 entry circular buffer for each data item
    std::array<TINT, 2> remainders = { larger, smaller };
    std::array<TINT, 2> ss = { 1, 0 };
    std::array<TINT, 2> ts = { 0, 1 };
    size_t indexNeg2 = 0;
    size_t indexNeg1 = 1;

    // loop
    while (1)
    {
        // calculate our new quotient and remainder
        TINT newQuotient = remainders[indexNeg2] / remainders[indexNeg1];
        TINT newRemainder = remainders[indexNeg2] - newQuotient * remainders[indexNeg1];

        // if our remainder is zero we are done.
        if (newRemainder == 0)
        {
            // return our s and t values as well as the quotient as the GCD
            s = ss[indexNeg1];
            t = ts[indexNeg1];
            if (swapped)
                std::swap(s, t);

            // if t < 0, add the modulus divisor to it, to make it positive
            if (t < 0)
                t += smaller;
            return remainders[indexNeg1];
        }

        // calculate this round's s and t
        TINT newS = ss[indexNeg2] - newQuotient * ss[indexNeg1];
        TINT newT = ts[indexNeg2] - newQuotient * ts[indexNeg1];

        // store our values for the next iteration
        remainders[indexNeg2] = newRemainder;
        ss[indexNeg2] = newS;
        ts[indexNeg2] = newT;

        // move to the next iteration
        std::swap(indexNeg1, indexNeg2);
    }
}

//=================================================================================
void MakeKey (TINTVec &keys, TINT &keysLCM, size_t index)
{
    // if this is the first key, use 3
    if (index == 0)
    {
        keys[index] = 3;
        keysLCM = keys[index];
        return;
    }

    // Else start at the last number and keep checking odd numbers beyond that
    // until you find one that is co-prime.
    TINT nextNumber = keys[index - 1];
    while (1)
    {
        nextNumber += 2;
        if (std::all_of(
            keys.begin(),
            keys.begin() + index,
            [&nextNumber] (const TINT& v) -> bool
            {
                TINT s, t;
                return ExtendedEuclidianAlgorithm(v, nextNumber, s, t) == 1;
            }))
        {
            keys[index] = nextNumber;
            keysLCM *= nextNumber;
            return;
        }
    }
}

//=================================================================================
void CalculateLookupTable (
    TINT &lut,
    const std::vector<uint64_t> &output,
    const TINTVec &keys,
    const TINT &keysLCM,
    const TINTVec &coefficients,
    size_t bitMask
)
{
    // figure out how much to multiply each coefficient by to make it have the specified modulus residue (remainder)
    lut = 0;
    for (size_t i = 0, c = keys.size(); i < c; ++i)
    {
        // we either want this term to be 0 or 1 mod the key.  if zero, we can multiply by zero, and
        // not add anything into the bit value!
        if ((output[i] & bitMask) == 0)
            continue;

        // if 1, use chinese remainder theorem
        TINT s, t;
        ExtendedEuclidianAlgorithm(coefficients[i], keys[i], s, t);
        lut = (lut + ((coefficients[i] * t) % keysLCM)) % keysLCM;
    }
}

//=================================================================================
template <typename TINPUT, typename TOUTPUT, typename LAMBDA>
void MakeModulus (TINTVec &luts, TINTVec &keys, LAMBDA &lambda)
{
    // to keep things simple, input sizes are being constrained.
    // Do this in x64 instead of win32 to make size_t 8 bytes instead of 4
    static_assert(sizeof(TINPUT) < sizeof(size_t), "Input too large");
    static_assert(sizeof(TOUTPUT) < sizeof(uint64_t), "Output too large");

    // calculate some constants
    const size_t c_numInputBits = sizeof(TINPUT) * 8;
    const size_t c_numInputValues = 1 << c_numInputBits;
    const size_t c_numOutputBits = sizeof(TOUTPUT) * 8;

    // Generate the keys (coprimes)
    TINT keysLCM;
    keys.resize(c_numInputValues);
    for (size_t index = 0; index < c_numInputValues; ++index)
        MakeKey(keys, keysLCM, index);

    // calculate co-efficients for use in the chinese remainder theorem
    TINTVec coefficients;
    coefficients.resize(c_numInputValues);
    fill(coefficients.begin(), coefficients.end(), 1);
    for (size_t i = 0; i < c_numInputValues; ++i)
    {
        for (size_t j = 0; j < c_numInputValues; ++j)
        {
            if (i != j)
                coefficients[i] *= keys[j];
        }
    }

    // gather all the input to output mappings by permuting the input space
    // and storing the output for each input index
    std::vector<uint64_t> output;
    output.resize(c_numInputValues);
    union
    {
        TINPUT value;
        size_t index;
    } input;
    union
    {
        TOUTPUT value;
        size_t index;
    } outputConverter;

    for (input.index = 0; input.index < c_numInputValues; ++input.index)
    {
        outputConverter.value = lambda(input.value);
        output[input.index] = outputConverter.index;
    }

    // iterate through each possible output bit, since each bit is it's own lut
    luts.resize(c_numOutputBits);
    for (size_t i = 0; i < c_numOutputBits; ++i)
    {
        const size_t bitMask = 1 << i;
        CalculateLookupTable(
            luts[i],
            output,
            keys,
            keysLCM,
            coefficients,
            bitMask
        );
    }
}

//=================================================================================
int main (int argc, char **argv)
{
    // Look up tables encodes each bit, keys is used to decode each bit for specific
    // input values.
    TINTVec luts;
    TINTVec keys;

    // this is the function that it turns into modulus work
    typedef uint8_t TINPUT;
    typedef float TOUTPUT;
    auto lambda = [] (TINPUT input) -> TOUTPUT
    {
        return sin(((TOUTPUT)input) / 255.0f * 2.0f * c_pi);
    };

    MakeModulus<TINPUT, TOUTPUT>(luts, keys, lambda);

    // show last lut and key to show what kind of numbers they are
    std::cout << "Last Lut: " << *luts.rbegin() << "n";
    std::cout << "Last Key: " << *keys.rbegin() << "n";

    // Decode all input values
    std::cout << "n" << sizeof(TINPUT) << " bytes input, " << sizeof(TOUTPUT) << " bytes outputn";
    for (size_t keyIndex = 0, keyCount = keys.size(); keyIndex < keyCount; ++keyIndex)
    {
        union
        {
            TOUTPUT value;
            size_t index;
        } result;

        result.index = 0;

        for (size_t lutIndex = 0, lutCount = luts.size(); lutIndex < lutCount; ++lutIndex)
        {
            TINT remainder = luts[lutIndex] % keys[keyIndex];
            size_t remainderSizeT = size_t(remainder);
            result.index += (remainderSizeT << lutIndex);
        }

        TINT remainder = luts[0] % keys[keyIndex];
        std::cout << "i:" << keyIndex << " o:" << result.value << "n";
    }

    WaitForEnter();
    return 0;
}

Here is some output from the program. The first is to show what the last (largest) look up table and key look like. Notice how large the look up table number is!

Here it shows some sine values output from the program, using modulus against the large numbers calculated, to get the bits of the result out:

How to Get Lots of Pairwise Co-Prime Numbers?

You can generate a list of pairwise coprimes using brute force. Have an integer that you increment, and check if it’s pairwise co-prime to the existing items in the list. If it is, add it to the list! Rinse and repeat until you have as many as you want.

That is the most practical way to do it, but there are two other interesting ways I wanted to mention.

The first way is using Fermat numbers. The Fermat numbers are an infinite list of pairwise co-prime numbers and are calculated as 2^{2^n}+1 where n is an integer. Fermat numbers also have the benefit that you can get the nth item in the list without calculating the numbers that came before it. The only problem is that the numbers grow super huge very fast. The first 7 values are: 3, 5, 17, 257, 65537, 4294967297, 18446744073709551617. If Fermat numbers didn’t grow so quickly, they sure would be useful for things like this technique.

The second way is using something called Sylvester’s sequence. It too is an infinite list of pairwise co-prime numbers, and it too grows very large very quickly unfortunately. I also don’t believe there is a way to calculate the Nth item in the list directly. Every number is based on previous numbers, so you have to calculate them all from the beginning. No random access!

Beyond Binary

In this post I showed how to work in binary digits, but there is no reason why you have to encode single bits in the lookup tables.

Instead of encoding 0 or 1 in each modulus “lookup table”, you could also perhaps store base 10 numbers in the tables and have 0-9. Or, maybe you encode a byte per lookup table.

Encoding more than one bit effectively makes both your input and your output smaller, which helps the algorithm do more with less.

Your keys will need to be larger though, since the keys have to be larger than the value you plan to store, and your resulting lookup table will be a larger number as well. It might make the technique more worth while though.

I’ll leave that as an exercise for you. If try it and find neat stuff, post a comment and let us know, or drop me an email or something. It’d be neat to hear if people find any practical usage cases of this technique 😛

The End, For Now!

I want to point out that increasing the number of input bits in this technique is a pretty expensive thing to do, but increasing the number of output bits is a lot cheaper. It kind of makes sense in a way if you think about it. Input bits add information from the outside world that must be dealt with, while output bits are just fluff that can easily be diluted or concentrated by adding or removing bits that are associated with, and calculated from, the input bits.

Another problem you may have noticed with this technique is that if you have a really expensive calculation that you are trying to “flatten” into modulus math like this, that you have to run that calculation many, many times to know what values a lookup table would give you. You have to run it once per possible input to get every possible output. That is expected when making a lookup table, since you are paying a cost up front to make things faster later.

The paper I’m working on changes things a bit though. One of the things it does is it makes it so doing this technique only requires that you evaluate the function once, and it calculates all values simultaneously to give the end result that you can then do modulus against. It’s pretty cool IMO and I will share more details here as soon as I am able – and yes, i have actual working code that does that, believe it or not! I’m looking forward to being able to share it later on. Maybe someone will find some really cool usage case for it.

Quantum Computing For Programmers Part 2: Multiple Qubits

In part 1 (Quantum Computing For Programmers Part I: One Qubit) we looked at the basics of quantum computing and how to deal with a single qubit. Here we’ll talk about the interesting things that happen when you involve multiple qubits!

Multiple Qubit Probabilities & Possibilities

In the last post we used the analogy of a coin to describe a qubit. The coin can be either heads or tails, or flipping in the air, waiting to become either a heads or tails when it lands. The same is true of how a qubit works, where it can either be a 0 or a 1 or some superposition of both, only deciding which it is when observed. Furthermore, just like a real coin, there can be a bias towards becoming one value or the other.

We represented a coin by having a probability for each possible state (heads or tails), but represented a qubit by having an AMPLITUDE for each possible state (0 or 1), where you square an amplitude to get the probability.

Going back to the coin analogy, how would probabilities work if we had two coins?

The four possible outcomes with two coins are:
Heads/Heads
Heads/Tails
Tails/Heads
Tails/Tails

Each coin individually could either be sitting on the table heads or tails, or up in the air, waiting to become either a heads or a tails, with some probability of becoming a heads or a tails.

To figure out the probability of each possible outcome, you just multiply the probability of each coin’s outcome together.

For instance, let’s say that the first coin (coin A) is already sitting on the table as a heads, and the second coin (coin B) is flipping in the air, with a 1/3 chance of becoming heads and a 2/3 chance of becoming tails. Here is how you would calculate the probability of each possible state:

\begin{array}{c|c|c|c} \text{Outcome A/B} & \text{Coin A Probability} & \text{Coin B Probability} & \text{Outcome Probability (A*B)}\\ \hline heads / heads & 100\% & 33\% & 33\% \\ heads / tails & 100\% & 67\% & 67\% \\ tails / heads & 0\% & 33\% & 0\% \\ tails / tails & 0\% & 67\% & 0\% \\ \end{array}

Qubits actually work the same way! Using the same values as the coins, let’s say qubit A is a 0, and qubit B has a 1/3 chance of becoming a 0, and a 2/3 chance of becoming a 1. Converting those probabilities to amplitudes you could get A=[1,0], B=[\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}].

\begin{array}{c|c|c|c|c} \text{Outcome AB} & \text{Qubit A Amplitude} & \text{Qubit B Amplitude} & \text{Outcome Amplitude(A*B)} & \text{Outcome Probability}\\ \hline 00 & 1 & 1/\sqrt{3} & 1/\sqrt{3} & 33\% \\ 01 & 1 & \sqrt{2}/\sqrt{3} & \sqrt{2}/\sqrt{3} & 67\% \\ 10 & 0 & 1/\sqrt{3} & 0 & 0\%\\ 11 & 0 & \sqrt{2}/\sqrt{3} & 0 & 0\% \\ \end{array}

Note that in both cases, the probabilities add up to 100% like you’d expect.

In the case of qubits, the resulting vector is [\frac{1}{\sqrt{3}}, \frac{\sqrt{2}}{\sqrt{3}}, 0, 0] , which is still normalized, and represents the amplitudes for the 4 possible states of those two qubits: 00, 01, 10 and 11.

When working with one qubit (or coin), there are 2 possible outcomes 0 or 1 (heads or tails). When working with two qubits (or coins), there are four possible outcomes 00, 01, 10, 11 (heads/heads, heads/tails, tails/heads, tails/tails). When working with three qubits or coins, there are eight possible outcomes: 000,001,010 … 111.

With both coins and qubits there are 2^N possibilities, where N is the number of qubits or coins you have.

If you had 8 qubits, you’d have to use a 256 dimensional vector to describe the possibilities, since 2^8 is 256. When performing quantum computing with 8 qubits, you only have to deal with the 8 qubits. When simulating quantum computing on a regular, classical computer, you have to deal with the 256 amplitudes. This kind of gives a glimpse at how quantum computers can be faster than regular computers at certain things. There is an economy of scale working against us on regular computers simulating quantum computing.

The method we used to combine the probabilities of single qubits into an amplitude vector representing multiple qubits is called the Kronecker product. That’s just a fancy way of saying we have to multiply everything from the first vector by everything from the second vector, to get a third vector that is bigger than the first two. You’ll see it represented like this: A \otimes B and while it’s similar to the “outer product” and even uses the same symbol (!), it gives a slightly different result versus if you did an outer product and then vectorized the matrix.

The Kronecker product of vectors works like this:

\begin{bmatrix} A_1 \\ A_2 \\ ... \\ A_M \\ \end{bmatrix} \otimes \begin{bmatrix} B_1 \\ B_2 \\ ... \\ B_N \\ \end{bmatrix} = \begin{bmatrix} A_1 B_1 \\ A_1 B_2 \\ ... \\ A_1 B_N \\ A_2 B_1 \\ A_2 B_2 \\ ... \\ A_2 B_N \\ ... \\ A_M B_1 \\ A_M B_2 \\ ... \\ A_M B_N \\ \end{bmatrix}

Let’s show an example involving two qubits.

The first qubit has a 75% chance of being heads so it’s amplitude is [\sqrt{3}/2,1/2]. The second qubit has a 2/3 chance of being heads so has an amplitude of [\sqrt{2}/\sqrt{3}, 1/\sqrt{3}].

To calculate the amplitude vector representing these two qubits, we do a kronecker product:
\begin{bmatrix} \cfrac{\sqrt{3}}{2} \\ \cfrac{1}{2} \\ \end{bmatrix} \otimes \begin{bmatrix} \cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{\sqrt{3}} \\ \end{bmatrix} = \begin{bmatrix} \cfrac{\sqrt{3}}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{\sqrt{3}}{2}\cfrac{1}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{\sqrt{2}}{\sqrt{3}} \\ \cfrac{1}{2}\cfrac{1}{\sqrt{3}} \\ \end{bmatrix}

If you simplify that, you get:
\begin{bmatrix} \cfrac{1}{\sqrt{2}} \\ \cfrac{1}{2} \\ \cfrac{1}{\sqrt{6}} \\ \cfrac{1}{2\sqrt{3}} \\ \end{bmatrix}

or horizontally for easier reading:
\begin{bmatrix} \cfrac{1}{\sqrt{2}} & \cfrac{1}{2} & \cfrac{1}{\sqrt{6}} & \cfrac{1}{2\sqrt{3}} & \end{bmatrix}

Squaring those values to get the probabilities of each state, we get the below, which you might notice adds up to 100%, meaning the vector is still normalized!
\begin{bmatrix} 50\% & 25\% & 17\% & 8\% \end{bmatrix}

This generalizes for larger numbers of qubits, so you could use the Kronecker product to combine a vector describing 4 qubits with a vector describing 3 qubits, to come up with a vector that describes 7 qubits (which would have 128 amplitudes in it, since 2^7 = 128!).

The kronecker product also generalizes to matrices, which we will talk about shortly.

To be able to work with multiple qubits in a quantum circuit, you need to represent all qubits involved in a single vector. Doing this, while being the correct thing to do, also allows entanglement to happen, which we will talk about next!

Entanglement – Simpler Than You Might Think!

Time to demystify quantum entanglement!

Quantum entanglement is the property whereby two qubits – which can be separated by any distance, such as several light years – can be observed and come up with the same value (0 or 1). Whether they are 0s or 1s is random, but they will both agree, when entangled in the way that causes them to agree (you could alternately entangle them to always disagree).

Interestingly, in 1965 John Bell proved that the mechanism that makes this happen is NOT that the qubits share information in advance. You can read more about that here – it’s near the end: Ars Technica – A tale of two qubits: how quantum computers work.

A common misconception is that this means that we can have faster than light (instantaneous) communication across any distance. That turns out not to be true, due to the No-Communication-Theorem (Wikipedia).

Interestingly though, you can use separated entangled qubits in separated quantum circuits to do something called Quantum Pseudo Telepathy (Wikipedia), which lets you do some things that would otherwise be impossible – like reliably winning a specially designed game that would otherwise be a game of chance.

I don’t yet understand enough about Quantum Pseudo Telepathy to see why it isn’t considered communication. I also have no idea how entanglement is actually “implemented” in the universe, but nobody seems to (or if they do, they aren’t sharing!). How can it be that two coins flipped on opposite sides of the galaxy can be guaranteed to both land with the same side facing up?

Despite those mysteries, the math is DEAD SIMPLE, so let me share it with you, it’ll take like one short paragraph. Are you ready?

Quantum computing works by manipulating the probabilities of the states of qubits. If you have two qubits, there are four possible sets of values when you observe them: 00, 01, 10, 11. If you use quantum computing to set the probabilities of the 01 and 10 states to a 0% chance, and set the probabilities of the 00 and 11 states to a 50% chance each, you’ve now entangled the qubits such that when you observe them, they will always agree, because you’ve gotten rid of the chances that they could ever DISAGREE. Similarly, if instead of setting the 01 and 10 states to 0%, you set the probability of the 00 and 11 states to 0%, you’d have entangled qubits which would always disagree when you observed them, because you’ve gotten rid of the chances that they could ever AGREE.

That’s all there is to it. Strange how simple it is, isn’t it? The table below shows how the only possible outcomes are that the qubits agree, but it is a 50/50 chance whether they are a 0 or a 1:

\begin{array}{c|c|c} \text{Outcome} & \text{Amplitude} & \text{Probability}\\ \hline 00 & 1/\sqrt{2} & 50\% \\ 01 & 0 & 0\% \\ 10 & 0 & 0\% \\ 11 & 1/\sqrt{2} & 50\% \\ \end{array}

Entanglement isn’t limited to just two qubits, you can entangle any number of qubits together.

Entanglement has a special mathematical meaning. If you can represent a state of a group of qubits by a kronecker product, they are not entangled. If you CAN’T represent a state of a group of qubits by a kronecker product, they ARE entangled. These two things – entanglement and lack of kronecker product factorability (made that term up) – are the same thing.

As an example, what two vectors could you use the kronecker product on to get the entangled two qubit state 1/\sqrt{2}(|00\rangle+|11\rangle) (or in vector form [1/\sqrt{2}, 0, 0, 1/\sqrt{2}])? You’ll find there aren’t any! That state is the entangled state where the two qubits will always have the same value when you observe them.

Entangled qubits are nothing special in quantum circuits. You don’t have to take any special precautions when working with them. They are an interesting byproduct of quantum computing, and so basically are something neat that comes out of your quantum circuit, but they don’t require any extra thought when using them within your circuits. Don’t worry about them too much (:

Multi Qubit Gates

Let’s have a look at some multi qubit gates! (These are again from Wikipedia: Quantum Gate)

Swap Gate

Given two qubits, with possible states |00\rangle, |01\rangle, |10\rangle, |11\rangle, this gate swaps the amplitudes (probabilities) of |01\rangle,  |10\rangle and leaves the probabilities of the states |00\rangle and |11\rangle alone.

That might seem weird, but the probability of the |00\rangle state is the probability of the first qubit being 0 added to the probability of the second qubit being 0. If those probabilities swap, they still add to the same value, so this probability is unaffected. It’s the same situation for the |11\rangle state.

Here’s the matrix for the swap gate:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Square Root of Swap Gate

I don’t really understand a usage case for this gate myself, but it apparently does a half way swap between two qubits. That once again modifies the probabilities of the |01\rangle,  |10\rangle states, but leaves state |00\rangle and |11\rangle alone again, for the same reason as the swap gate.

Wikipedia says that if you have this gate, and the single qubit gates, that you can do universal quantum computation. In other words, you can build a fully functional quantum computer using just this and the single qubit gates.

Here’s the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \frac{1}{2}(1+i) & \frac{1}{2}(1-i) & 0 \\ 0 & \frac{1}{2}(1-i) & \frac{1}{2}(1+i) & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix}

Controlled Not Gate

The controlled not (CNOT) gate flips the value of the second qubit if the first qubit is true. This is a logic / flow control type of gate.

Interestingly, this gate is also useful for creating entangled qubits, which you’ll be able to see lower down!

Here is the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}

The controlled not gate is also referred to as the quantum XOR gate. To see why, check out the truth table below for this gate:
\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 00 & 00 \\ 01 & 01 \\ 10 & 11 \\ 11 & 10 \\ \end{array}

If you look at the right most bit of the outputs, you’ll notice that it’s the XOR of the two input bits!

All quantum gates need to be reversible, and having these two qubits be the output of a hypothetical quantum XOR gate allows that to happen. If you look at the right most bit of the inputs, you’ll notice that it is also the XOR of the two output bits. It’s bidirectional, which is kind of weird and kind of interesting 😛

Generalized Control Gate

You can actually convert any single qubit gate into a controlled gate. That makes a 2 qubit gate which only does work on the second qubit if the first qubit is true.

How you do that is you make a 4×4 identity matrix, and make the lower 2×2 sub-matrix into the single qubit matrix you want to use.

In other words, if your single qubit matrix is this:
\begin{bmatrix} U_{00} & U_{01} \\ U_{10} & U_{11} \\ \end{bmatrix}

Then the controlled version of the matrix would be this:
\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & U_{00} & U_{01} \\ 0 & 0 & U_{10} & U_{11} \\ \end{bmatrix}

Pretty simple right?

Toffoli Gate

The Tofolli gate is a 3 qubit gate that is also known as the CCNOT gate or controlled controlled not gate. It flips the third qubit if the first two qubits are true.

It’s matrix looks pretty uninteresting:
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ \end{bmatrix}

This is also known as the quantum AND gate. If you look at the truth table below, you’ll see that when the input has the right most qubit of 0, that in the output, the right most qubit will be the AND value of the two left qubits. We need the three bits of input mapping to three bits of output like this so that the gate is reversible.

\begin{array}{c|c} \text{Input} & \text{Output} \\ \hline 000 & 000 \\ 001 & 001 \\ 010 & 010 \\ 011 & 011 \\ 100 & 100 \\ 101 & 101 \\ 110 & 111 \\ 111 & 110 \\ \end{array}

Fredkin Gate

The Fredkin gate is a controlled swap gate. Here’s the matrix:
\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{bmatrix}

Just like we talked about how to make a generalized controlled gate acting on two qubits, you should be able to notice that this gate is just a specific version of a generalized controlled 3 qubit gate. You take an 8×8 identity matrix, and make the lower right 4×4 matrix be the 2 qubit gate that you want to add a control on. In this case, the lower right 4×4 matrix is just the swap gate we mentioned earlier in the post (:

Circuit To Entangle Qubits

The circuit to entangle qubits is pretty simple, so let’s start with that as the first quantum circuit we look at. It takes in two qubits. The first qubit is put through a Hadamard gate, and then both qubits are put through a controlled not gate. That circuit looks like this (image courtesy of Quantum Circuit Simulator):

The value you plug in for the second qubit determines what type of entanglement you get out. Setting the second qubit to 0, you will get entangled qubits which always agree when they are observed. Setting it to 1, you will get entangled qubits which always disagree when they are observed. The first qubit controls whether the phase of each state matches or mismatches. Note that these are the four Bell States (Wikipedia: Bell State).

\begin{array}{c|c|c} \text{Input} & \text{Output In Ket Notation} & \text{Output As Vector} \\ \hline 00 & \frac{1}{\sqrt{2}}(|00\rangle+|11\rangle) & [1/\sqrt{2},0,0,1/\sqrt{2}] \\ 01 & \frac{1}{\sqrt{2}}(|01\rangle+|10\rangle) & [0,1/\sqrt{2},1/\sqrt{2},0] \\ 10 & \frac{1}{\sqrt{2}}(|00\rangle-|11\rangle) & [1/\sqrt{2},0,0,-1/\sqrt{2}] \\ 11 & \frac{1}{\sqrt{2}}(|01\rangle-|10\rangle) & [0,1/\sqrt{2},-1/\sqrt{2},0]\\ \end{array}

In a quantum circuit, you can’t just apply a gate to an individual quabit at a time though. You have to make the matrix of your gate such that it applies to all qubits, applying the “identity” matrix to the qubits you don’t want to be affected by the gate.

So, how do we make a matrix that applies the Hadamard gate to qubit 1, and identity to qubit 2? You use the kronecker product!

Since we want to apply the Hadamard matrix to qubit 1 and identity to qubit 2, we are going to calculate H \otimes I (If we wanted to apply the Hadamard gate to qubit 2 and identity to qubit 1 we would calculate I \otimes H instead).

H \otimes I =  1/\sqrt{2}* \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix}

If you notice, the result of the kronecker product is just every value in the left matrix, multiplied by every value in the right matrix. Basically, the result is a 2×2 grid of identity matrices, where each of those identity matrices is multiplied by the corresponding value from the same cell in the left matrix. Since the left matrix has a 1 in all cells except the lower right, the same is true of the result… it’s a positive identity matrix in each cell, except the lower right one, which is a negative identity matrix. Hopefully that makes sense, it’s a lot easier than it sounds…

The second gate in the quantum circuit is the CNOT gate. We can actually multiply the gate we just made by the CNOT gate to represent the full quantum circuit as a single matrix. This uses regular matrix multiplication.

1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & -1 & 0 \\ 0 & 1 & 0 & -1 \\ \end{bmatrix} * \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} = 1/\sqrt{2}* \begin{bmatrix} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & -1 \\ 0 & 1 & -1 & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}

Lets plug some values into the circuit and see what comes out!

Lets start by plugging in 00. Our input qubits are [1, 0] and [1, 0]. The kronecker product of those two qubit vectors is [1, 0, 0, 0]. Now let’s multiply that vector by the matrix of our quantum circuit.

\begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix}

Comparing this to the table showing how our input maps to output, we got the right answer! If you plugged in the other values listed, you would get the rest of the bell state entanglements.

Unentangling Qubits

All gates are reversible, so all circuits are reversible. To get the reverse circuit for a given matrix, you just get the inverse of the matrix. Since quantum gates (and circuits) are unitary matrices, taking the inverse of one of these matrices just means taking the conjugate transpose of the matrix. In other words, you take the transpose of the matrix, and then just negate the imaginary component of any complex numbers. In this example, there are no imaginary numbers, so you just take the transpose of the matrix.

Since our circuit matrix is this:

\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 1/\sqrt{2} & 0 \\ 1/\sqrt{2} & 0 & 0 & -1/\sqrt{2} \\ 0 & 1/\sqrt{2} & -1/\sqrt{2} & 0 \\ \end{bmatrix}

That means that the inverse must be this, since this is the (conjugate) transpose.

\begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix}

Let’s try it out by putting the output from last section in and seeing what comes out the other side:

\begin{bmatrix} 1/\sqrt{2} & 0 & 0 & 1/\sqrt{2} \end{bmatrix} * \begin{bmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} & 0 \\ 0 & 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1/\sqrt{2} & 0 & -1/\sqrt{2} \\ 1/\sqrt{2} & 0 & -1/\sqrt{2} & 0 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & 0 \end{bmatrix}

It worked! We got our original input back.

Making Matrices for More Complex Circuits

We talked about how to make single qubit gates apply to specific qubits by using the kronecker product with identity to move the gate to the right location.

If you have 4 qubits and you want to apply some single qubit gate (say Hadamard) to the 3rd qubit, you would just do this to get the matrix:
I \otimes I \otimes H \otimes I

What if you want to do complex things with multi qubit gates though? Like what if you want to do a controlled not gate where you want the 2nd qubit to flip it’s value if the 4th qubit was true?

The answer is actually pretty simple… you use swaps gates to flip the values of the qubits so that your qubit values line up with the inputs of the gate, then you do the gate, and finally undo all the swaps to get the qubit values back to the right positions.

We need to swap qubits so that the 4th qubit value is in the 1st qubit slot, and the 2nd qubit value is in the 2nd qubit slot. We can do that with the following swaps:

Once we have done those swaps, we can do our controlled not, then do the swaps in reverse order to return the qubits to their original positions.

Here’s what the circuit looks like:

You’ll see the simpler version in circuit diagrams, but at least now you’ll know how to make things that look like this:

Below is the mathematical way that you would get the matrix representing this gate.

I is the identity matrix, S is the swap gate, and C is the controlled not gate.

M = \\ (I \otimes I \otimes S) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (C \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (S \otimes I \otimes I) * \\ (I \otimes S \otimes I) * \\ (I \otimes I \otimes S) \\

Code

Here is some simple C++ to show both the entanglement circuit and the more complicated controlled not gate we described.

#include <stdio.h>
#include <vector>
#include <complex>
#include <assert.h>

typedef std::complex<float> TComplex;

//=================================================================================
struct SComplexVector
{
public:
    TComplex& Get (size_t i) { return v[i]; }

    const TComplex& Get (size_t i) const { return v[i]; }

    size_t Size() const
    {
        return v.size();
    }

    void Resize (size_t s)
    {
        v.resize(s);
    }

    void Print () const
    {
        printf("[");
        for (size_t i = 0, c = v.size(); i < c; ++i)
        {
            if (i > 0)
                printf(", ");
            const TComplex& val = Get(i);
            if (val.imag() == 0.0f)
            {
                if (val.real() == 0.0f)
                    printf("0");
                else if (val.real() == 1.0f)
                    printf("1");
                else
                    printf("%0.2f", val.real());
            }
            else
                printf("%0.2f + %0.2fi", val.real(), val.imag());
        }
        printf("]n");
    }

    std::vector<TComplex> v;
};

//=================================================================================
struct SComplexMatrix
{
public:
    TComplex& Get (size_t x, size_t y)
    {
        return v[y*Size() + x];
    }

    const TComplex& Get (size_t x, size_t y) const
    {
        return v[y*Size() + x];
    }

    // For an MxM matrix, this returns M
    size_t Size () const
    {
        size_t ret = (size_t)sqrt(v.size());
        assert(ret*ret == v.size());
        return ret;
    }

    // For an MxM matrix, this sets M
    void Resize(size_t s)
    {
        v.resize(s*s);
    }

    void Print() const
    {
        const size_t size = Size();

        for (size_t y = 0; y < size; ++y)
        {
            printf("[");
            for (size_t x = 0; x < size; ++x)
            {
                if (x > 0)
                    printf(", ");
                
                const TComplex& val = Get(x, y);
                if (val.imag() == 0.0f)
                {
                    if (val.real() == 0.0f)
                        printf("0");
                    else if (val.real() == 1.0f)
                        printf("1");
                    else
                        printf("%0.2f", val.real());
                }
                else
                    printf("%0.2f + %0.2fi", val.real(), val.imag());
            }
            printf("]n");
        }
    }

    std::vector<TComplex> v;
};

//=================================================================================
static const SComplexVector c_qubit0 = { { 1.0f, 0.0f } };  // false aka |0>
static const SComplexVector c_qubit1 = { { 0.0f, 1.0f } };  // true aka |1>

// 2x2 identity matrix
static const SComplexMatrix c_identity2x2 =
{
    {
        1.0f, 0.0f,
        0.0f, 1.0f,
    }
};

// Given the states |00>, |01>, |10>, |11>, swaps the |01> and |10> state
// If swapping the probabilities of two qubits, it won't affect the probabilities
// of them both being on or off since those add together.  It will swap the odds of
// only one of them being on.
static const SComplexMatrix c_swapGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 1.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    }
};

// Controlled not gate
// If the first qubit is true, flips the value of the second qubit
static const SComplexMatrix c_controlledNotGate =
{
    {
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f,
        0.0f, 0.0f, 1.0f, 0.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const SComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
SComplexVector KroneckerProduct (const SComplexVector& a, const SComplexVector& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexVector ret;
    ret.Resize(aSize*bSize);

    for (size_t i = 0, ic = aSize; i < ic; ++i)
    {
        for (size_t j = 0, jc = bSize; j < jc; ++j)
        {
            size_t n = i * bSize + j;
            ret.Get(n) = a.Get(i)*b.Get(j);
        }
    }
    return ret;
}

//=================================================================================
SComplexMatrix KroneckerProduct (const SComplexMatrix& a, const SComplexMatrix& b)
{
    const size_t aSize = a.Size();
    const size_t bSize = b.Size();

    SComplexMatrix ret;
    ret.Resize(aSize*bSize);

    for (size_t ax = 0; ax < aSize; ++ax)
    {
        for (size_t ay = 0; ay < aSize; ++ay)
        {
            const TComplex& aValue = a.Get(ax, ay);

            for (size_t bx = 0; bx < bSize; ++bx)
            {
                for (size_t by = 0; by < bSize; ++by)
                {
                    const TComplex& bValue = b.Get(bx, by);

                    size_t nx = ax*bSize + bx;
                    size_t ny = ay*bSize + by;

                    ret.Get(nx,ny) = aValue * bValue;
                }
            }
        }
    }

    return ret;
}

//=================================================================================
SComplexMatrix operator* (const SComplexMatrix& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexMatrix ret;
    ret.Resize(size);

    for (size_t nx = 0; nx < size; ++nx)
    {
        for (size_t ny = 0; ny < size; ++ny)
        {
            TComplex& val = ret.Get(nx, ny);
            val = 0.0f;
            for (size_t i = 0; i < size; ++i)
                val += a.Get(i, ny) * b.Get(nx, i);
        }
    }

    return ret;
}

//=================================================================================
SComplexVector operator* (const SComplexVector& a, const SComplexMatrix& b)
{
    assert(a.Size() == b.Size());
    const size_t size = a.Size();

    SComplexVector ret;
    ret.Resize(size);

    for (size_t i = 0; i < size; ++i)
    {
        TComplex& val = ret.Get(i);
        val = 0;
        for (size_t j = 0; j < size; ++j)
            val += a.Get(j) * b.Get(i, j);
    }

    return ret;
}

//=================================================================================
int main (int argc, char **argv) {

    // 2 qubit entanglement circuit demo
    {
        // make the circuit
        const SComplexMatrix H1 = KroneckerProduct(c_hadamardGate, c_identity2x2);
        const SComplexMatrix circuit = H1 * c_controlledNotGate;

        // display the circuit
        printf("Entanglement circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(c_qubit0, c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit0, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(c_qubit1, c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }

    // 4 qubit demo: flip the second qubit if the fourth qubit is true
    {
        // make the circuit
        const SComplexMatrix cnot4qubit = KroneckerProduct(KroneckerProduct(c_controlledNotGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap12 = KroneckerProduct(KroneckerProduct(c_swapGate, c_identity2x2), c_identity2x2);
        const SComplexMatrix swap23 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_swapGate), c_identity2x2);
        const SComplexMatrix swap34 = KroneckerProduct(KroneckerProduct(c_identity2x2, c_identity2x2), c_swapGate);
        const SComplexMatrix circuit = 
            swap34 *
            swap23 *
            swap12 *
            swap23 *
            cnot4qubit *
            swap23 *
            swap12 *
            swap23 *
            swap34;

        // display the circuit
        printf("nFlip 2nd qubit if 4th qubit true circuit:n");
        circuit.Print();

        // permute the inputs and see what comes out when we pass them through the circuit!
        SComplexVector input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit0);
        SComplexVector output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit0, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit0), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit0), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit0);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();

        input = KroneckerProduct(KroneckerProduct(KroneckerProduct(c_qubit1, c_qubit1), c_qubit1), c_qubit1);
        output = input * circuit;
        printf("ninput:");
        input.Print();
        printf("output:");
        output.Print();
    }


    WaitForEnter();

    return 0;
}

Here is the first part of the output, which shows the results of the entangling circuit. You can see that the input gave the expected output Bell states:

Below is the second part of the output, which is the circuit that flips the 2nd qubit if the 4th qubit is true.

Each entry in the input and output vectors is the amplitude (probability) that the state it represents is true. The states start at the left with 0000, then 0001, then 0010 and continue until the very right most value which represents 1111.

If you look at the second input/output pair the input is [0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0], which means it has a 100% chance of the 4 qubits being 0001 (aka the qubits represent the number 1). The output vector is [0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0], which means that it has a 100% chance of being 0101 (aka the qubits represent the number 5). Since the input did have the 4th bit set (4th from the left), it flipped the 2nd bit. So, you can see that it worked!

If you check all the input/output pairs, you’ll see that they all follow this rule.

We used only whole, real numbers, no using fractional probabilities, or imaginary amplitudes. What it does in those situations is a little bit harder to get intuition for, but rest assured that it does “the right thing” in those situations as well.

Next Up

Now that we have the basics of quantum computing down pretty well, it’s time to analyze a couple quantum algorithms to see how they work!

Quantum Computing For Programmers Part Ib: Bloch Sphere

If you read anything on quantum computing you are extremely likely to see the Bloch sphere, so it’s probably important to explain what it is and how it works.

Image from wikipedia: Wikipedia: Qubit

The Bloch sphere is a way of visually representing a qubit.

The north pole is the |0\rangle state, and the south pole is the |1\rangle state. The Z axis is vertical and runs from the |0\rangle to the |1\rangle state. A qubit’s state is represented by a point on the sphere.

If you read the last post, you might remember that there was a Pauli-Z gate which changed the phase of a qubit without changing it’s probability, as well as a more generalized version of the phase changing gate which also just rotated around the Z axis. Looking at the bloch sphere, you can see why that is true! Rotating a point around the Z axis moves it around the sphere horizontally, but it doesn’t make it any closer or farther to either the north or south pole. Since the distance to the poles is preserved, the probability of being one or the other stays the same, even though the phase changes.

The not gate, aka the Pauli-X gate, rotated a point around the X axis 180 degrees. Looking at the sphere, you could see how that would change a |1\rangle to a |0\rangle, or otherwise flip the probabilities & amplitudes of the states.

The Pauli-Y gate is a little more complex though (pun intended!) as it mapped |0\rangle to i|1\rangle and |1\rangle to -i|0\rangle by rotating 180 degrees around the Y axis. You may wonder how |1\rangle and i|1\rangle can both refer to the south pole, but the situation there is that every point EXCEPT the poles have a unique representation. So, they really do both refer to the south pole.

To find where a qubit sits on the sphere, the first step is to figure out the spherical coordinates theta (\theta) and phi (\phi) values, which you can then convert to a 3d point.

Start with whatever qubit you have in this form, where you know what the values for alpha (\alpha) and beta (\beta) are:
|\psi\rangle = \alpha|0\rangle + \beta|1\rangle

You can also write a qubit as:
|\psi\rangle = \cos\left(\tfrac{\theta}{2}\right) |0 \rangle \, + \, e^{i \phi}  \sin\left(\tfrac{\theta}{2}\right) |1 \rangle

Or like this, with less euler, and more trig:
|\psi\rangle = \cos\left(\tfrac{\theta}{2}\right) |0 \rangle \, +  \, ( \cos \phi + i \sin \phi) \, \sin\left(\tfrac{\theta}{2}\right) |1 \rangle

You can then set your known alpha and beta values to the trig (or euler) based equations:
\alpha = \cos\left(\tfrac{\theta}{2}\right)
\beta = \, ( \cos \phi + i \sin \phi) \, \sin\left(\tfrac{\theta}{2}\right)

and then solve for theta and phi (I wave my hands a bit here).

Once you have the theta and phi values figured out, you can convert that to a unit distance 3d point representing a point on the sphere by using the below, which describes the X,Y,Z components of the point:
(\sin \theta \cos \phi, \; \sin \theta  \sin \phi, \; \cos \theta)

Hopefully this explanation makes enough sense that if you encounter the bloch sphere, or things talking about rotations on the bloch sphere, you won’t be too lost 😛

Ok… time to move on to multiple qubit quantum circuits so we can do more interesting things and perhaps analyze the most well known quantum algorithms!

Quantum Computing For Programmers Part I: One Qubit

Are you a programmer? Do you have interest in learning how quantum computing works? Does hard core math and crazy abstract “philosophical” type physics questions about the nature of reality make you feel like you could never understand quantum computing?

If so, me too, welcome to the club! We are programmers, not mathematicians or physicists, but we are masters of our realm – computation and logic. With that, I say that quantum computing is ultimately meant for us, since it is computing and programming, so lets figure it out and do some interesting stuff 😛

Before we start, I want to mention that these posts should be able to be read stand alone, but if you want to get into the deeper details, you probably want to give this list of references I’m using a read at some point. Perhaps before reading my posts, or after, or both!
Quantum Computing References

Lastly, to be clear, quantum computers seem like they are a long way off from being common place objects. These posts will show how quantum computers work, and allow you to simulate quantum computers on your own computer with some simple C++. The code won’t run as fast as it would on a real quantum computer (obviously!), but it should let you better understand how Quantum Computing (QC) works and let you explore and experiment on your own machine.

Probabilities (aka SuperPositons)

At the core of QC is the qubit. A qubit is like a regular bit, in that it can have the value of 0 or 1, but unlike a regular bit, it can also be somewhere in the middle where it has a probability of being in each state.

That might sound strange, but think of a qubit like a coin. It can be heads up, or it can be heads down, or it can be in the air, waiting to land heads up or heads down.

When the coin is in the air, if it is perfectly balanced, it has a 50/50 chance of becoming heads or tails.

If the coin is not perfectly balanced (no real coin can be perfectly balanced!), it will have some bias towards heads or towards tails so will not be a pure 50/50 chance of heads or tails.

While the coin is on the ground, it is either a heads (0) or a tails (1), but while it’s in the air, you can imagine that it is a superposition of 0 and 1. When it lands, that superposition will collapse into a 0 or 1 value, which is also what happens when you “observe” or measure a qubit.

One interesting thing about quantum computing, and where part of it’s power comes from is that you can perform logic on superpositional values in a way that when you have your final superpositional result, you can measure it and get the result from your logic circuit as if the values had been fixed all the way through the system, instead of being in superpositions.

The bummer about this though is that since it’s all based on probabilities, your answer has a chance of being not the value you wanted to get. Quantum circuits / quantum algorithms will try to combat this by doing operations iteratively to make the probability of the desired answer go up, and the probability of the undesired answers go down.

They can also run the quantum algorithm multiple times, to take multiple samples and be more sure about their results.

You still with me? Pretty simple so far right?

Before moving on I also wanted to mention that regular computers – like the one you are using right now – also have a greater than zero chance that they will give you the wrong answer to a calculation. The error rate can increase when the components are too hot, but even otherwise, a stray particle of background radiation could hit your cpu and flip a bit, giving you the wrong answer. You could fight this by doing every calculation 3 times and going with majority rules, but even then, if two bits get flipped, the majority will be wrong. Nothing you can do can completely eliminate the random chance that your calculations will come up with the wrong answer, but the chances are low enough that we rely on computers to do the right things, even in the most critical of situations (like, not starting a thermonuclear war).

With quantum computers you could get that same level of assurance. You can never completely eliminate the chance that a calculation is wrong, but you can definitely be sure with as much accuracy as you can with a standard computer, so that shouldn’t really be an issue in practice, for the cases where you need “certainty”.

Probability Vectors (aka Amplitude Vectors)

Using the coin flip scenario above, we could describe the probability of a coin landing heads or tails as a vector where the first element is the chance that it will land heads, and the second element is the chance that it will land tails.

A perfectly balanced coin would look like this for instance: [0.5,0.5]

If the coin was 75% likely to land heads, and only 25% likely to land tails the vector would look like this: [0.75,0.25]

You might notice that the elements of the vector have to add up to 1. This makes sense because there are only two options and one of the two MUST happen (unless it lands perfectly on edge), so they better add up to 1! The technical term for this is that the vector must have an L1 Norm of 1.

Qubits describe their probabilities a little bit differently though. Instead of storing the probability of them being 0 or 1, they store what’s called the amplitude, which when squared gives the probability. Doing this lets them do something really important that I’ll talk about lower down, but for now, you can hopefully just accept that this is how it works.

Consider a qubit having a 50% chance that it’s a 0, and a 50% chance that it’s a 1. We need some number that when squared gives 0.5 aka 1/2. That value is 1/\sqrt{2}.

So, for a qubit that has an even chance of being a 0 or 1, a vector to represent it is [1/\sqrt{2}, 1/\sqrt{2}].

Then, a vector describing a qubit that has a 75% chance being a 0 and a 25% chance of being a 1 would be [\sqrt{3}/2, 1/2]. If you square the first number you can see that you get 3/4 (75% chance) and squaring the second number gives you 1/4 (25% chance).

I got those vectors by solving the simple equation:
x^2 = P

Where P is the desired probability from 0 to 1.

You might now notice that the length of a qubit amplitude vector has to be 1. In other words, it’s a normalized vector. The technical term for this is that the vector must have an L2 norm of 1.

Still nothing too crazy going on…

Ket Notation

There is some notation you’ll come across a lot when reading about quantum computing / mechanics / etc that looks like this:
1/\sqrt{2}(|0\rangle + |1\rangle)

That is “Ket Notation” and comes from “Bra-Ket Notation”. It’s bark is worse than it’s bite.

If you multiply it out, you get this:
1/\sqrt{2} * |0\rangle + 1/\sqrt{2} * |1\rangle

All that means is that the “0” state of the vector has an amplitude of 1/\sqrt{2} and so does the “1” state. Writing that as a vector, we get what we already saw above for the qubit which had 50/50 odds of being a 0 or a 1:
[1/\sqrt{2}, 1/\sqrt{2}]

Here is the ket notation for the qubit which had a 75% chance of being a 0 and a 25% chance of being a 1:
1/2(\sqrt{3}|0\rangle + |1\rangle)

You can multiply that out to get this:
\sqrt{3}/2*|0\rangle + 1/2*|1\rangle

Which translates to the vector we saw before:
[\sqrt{3}/2, 1/2]

Lastly, you might see it shown like this:
|0\rangle

All that means is that the “0” state is 1. Since the “1” state isn’t listed, it has a value of 0. That is the same as this vector:
[1, 0]

Ket notation is used because you only have to list the states that actually have values, and not the ones that have zeros, making a more compact representation. That will be more useful when we move into a larger number of qubits.

Note that the mapping of which state belongs to which spot in the vector is completely arbitrary. I chose to make the left value be the “0” state and the right value be the “1” state, but it really doesn’t matter how you define it, so long as you are consistent.

Phase

Phase is the reason that qubits store amplitude, which square to probabilities, instead of storing probabilities themselves.

I said that the amplitude vector for a qubit which has a 50/50 chance of being a 0 or a 1 was this:
[1/\sqrt{2}, 1/\sqrt{2}]

I lied a bit though, and that is only one possible answer. Here’s another that has a 50/50 chance:
[-1/\sqrt{2}, 1/\sqrt{2}]

This gets interesting when you add two vectors together. First lets add two amplitude vectors which have the same phases for each state:
[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, 1/\sqrt{2}] = [\sqrt{2}, \sqrt{2}]

Each component added together.

Now, let’s add two amplitude vectors where the |1\rangle state has opposite phase in one of the vectors:
[1/\sqrt{2}, 1/\sqrt{2}] + [1/\sqrt{2}, -1/\sqrt{2}] = [\sqrt{2}, 0]

You can see that the |0\rangle states added together while the |1\rangle states subtracted and resulted in zero.

This is called destructive interference, and is one of the things that makes quantum computing powerful. We can change a qubit’s phase without affecting it’s value (or probability of values).

I left out another important amplitude vector for describing a qubit that has a 50/50 chance of being a 0 or a 1:
[1/\sqrt{2}, 1/\sqrt{-2}]

Which can also be written as:
[1/\sqrt{2}, 1/(\sqrt{2}*i)]

Yep, that is an imaginary number in the 1 state!

Since amplitude squared is probability, it means we can also use imaginary numbers for amplitude vector values. If you know that the term phase relates to angles and rotations, you might want to have a look at these two things to follow the rabbit hole a bit deeper. Totally optional (;
Using Imaginary Numbers To Rotate 2D Vectors
Wikipedia: Bloch Sphere

However, since imaginary numbers can be involved, squaring amplitudes to get probabilities isn’t enough. Like for example with the vector [0,i], if you square the |1\rangle state to get the probability, you get -1, or -100%. That isn’t right!

The real operation to getting probability from amplitude is you multiply the amplitude by it’s complex conjugate. In other words, if you have a complex number, you multiply the imaginary part by -1 to get the conjugate, and multiply by that.

As an example, if you have 3+5i, the complex conjugate is 3-5i.

In the example of the vector [0,i], you get the probability of the |1\rangle state by multiplying i by -i to get 1, or 100%.

Complex conjugate sounds scary, but hopefully you can see it isn’t as scary as it sounds. You just flip the sign of the imaginary component of the complex number.

Common One Qubit Logic Gates

It’s finally time to dive into some logic gates so we can actually do some quantum programming!

Real quickly I want to mention that we are going to be simulating quantum computing by multiplying the qubit amplitude vectors by matrices. The matrices themselves are unitary matrices, which preserves the length of the vectors (keeping them normalized).

One requirement of quantum gates is that they must be reversible, and unitary matrices are also reversible.

The matrices are square matrices and their dimensions will always be 2^N by 2^N where N is the number of qubits involved in the quantum circuit. With a single qubit, that means we will be working with 2×2 matrices which is fairly small.

When working with more qubits the size of the matrix grows very quickly though. Quantum computing is very fast at doing these “matrix multiplies” and is part of where their power comes from. Large matrix multiplies will be slow for us on regular (classical) computers, but the quantum computers would be able to do them very quickly. That’s where the difference is in our simulations versus the real deal.

Anyways, let’s get onto some specific logic gates! (These are from Wikipedia: Quantum Gate)

Not Gate (Pauli-X Gate)

This is a not gate, but is also described as a 180 degree rotation around the x axis of the bloch sphere.

\begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}

You might notice that it looks like a backwards identity matrix. That is exactly how it works too… if you multiply a 2 element vector by that matrix, it will flip the elements in that vector!

In the case of our qubit amplitude vectors, this has the result of swapping the probability that the qubit will be 0, with the probability that the qubit will be 1.

If the qubit is not in a superposition, and instead is actually just a 0 or a 1, it will act exactly like a traditional NOT gate and flip it from one value to the other.

If the qubit is in a superposition, it will just flip the probabilities of each state.

This maps the |0\rangle state to |1\rangle and vice versa.

Pauli-Y Gate

This is described as a 180 degree rotation around the y axis of the bloch sphere, and maps |0\rangle to i|1\rangle and |1\rangle to -i|0\rangle.

\begin{bmatrix} 0 & -i \\ i & 0 \\ \end{bmatrix}

It’s like a NOT but also adjusts phase.

Pauli-Z Gate

This is described as a 180 degree rotation around the z axis of the bloch sphere. All it does in practice is negate the phase of the |1\rangle state.

\begin{bmatrix} 1 & 0 \\ 0 & -1 \\ \end{bmatrix}

General Phase Adjustment Gates

You can adjust the phase of individual gates by using a matrix like this one, which is a more general version of the Pauli-Z gate, and adjusts the phase of of the |1\rangle state to whatever value is desired, without affecting the probabilities of the qubit values.

\begin{bmatrix} 1 & 0 \\ 0 & e^{i\theta} \\ \end{bmatrix}

The value e^{i\theta} ends up being a complex number, that can also be written like this:
cos(\theta)+i sin(\theta)

where \theta is any angle in radians.

You’ll find that circuits commonly only adjust the |1\rangle state. This is because the absolute phase of the individual states don’t matter (since they don’t affect probabilities), but what does matter is the difference in phase between the |0\rangle state and the |1\rangle state, since that can cause constructive or destructive interference. For this reason, you’ll often see the |0\rangle state without a phase value, and only the |1\rangle state will get it’s phase modified during calculations.

Hadamard Gate (Interesting Gate!)

Ok so the NOT gate is somewhat useful. You are probably thinking the others might come in handy if you need to adjust the phase of a qubit, but so far, nothing has been that spectacular for single qubit gates.

Well, here’s the more interesting gate.

What this gate does is if you pass it a pure 0 or 1, it will output a value that has a 50% chance of being a 0 or a 1.

The interesting part happens when you put that output through a Hadamard gate again – you get the original value of 0 or 1 out!

It can do this because it stores the info about the original value as phase information. When the input value is |0\rangle, the Hadamard gate outputs [1/\sqrt{2}, 1/\sqrt{2}] which has matching phases for each state. When the input value is |1\rangle, the gate outputs [1/\sqrt{2}, -1/\sqrt{2}] which has opposite phases for each state.

1/\sqrt{2} \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ \end{bmatrix}

This gate is a crucial part in both Grover’s algorithm – which can search an unsorted list in O(\sqrt{N}), as well as in Shor’s algorithm, which can factor large numbers much faster than classical computers, putting certain types of cryptography at risk for being cracked by quantum computers.

It has some other uses too, which you can check out here:
Does a Hadamard Gate have uses outside of pure and evenly mixed states?

Code

Here is some example code to show some single qubit circuits in action.

#include <stdio.h>
#include <array>
#include <complex>

typedef std::array<std::complex<float>, 2> TQubit;
typedef std::array<std::complex<float>, 4> TComplexMatrix;
const float c_pi = 3.14159265359f;

//=================================================================================
static const TQubit c_qubit0 = { 1.0f, 0.0f };  // false aka |0>
static const TQubit c_qubit1 = { 0.0f, 1.0f };  // true aka |1>
static const TQubit c_qubit01_0deg = { 1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f) }; // 50% true. 0 degree phase
static const TQubit c_qubit01_180deg = { 1.0f / std::sqrt(2.0f), -1.0f / std::sqrt(2.0f) }; // 50% true. 180 degree phase

// A not gate AKA Pauli-X gate
// Flips false and true probabilities (amplitudes)
// Maps |0> to |1>, and |1> to |0>
// Rotates PI radians around the x axis of the Bloch Sphere
static const TComplexMatrix c_notGate =
{
    {
        0.0f, 1.0f,
        1.0f, 0.0f
    }
};
static const TComplexMatrix c_pauliXGate = c_notGate;

// Pauli-Y gate
// Maps |0> to i|1>, and |1> to -i|0>
// Rotates PI radians around the y axis of the Bloch Sphere
static const TComplexMatrix c_pauliYGate =
{
    {
        { 0.0f, 0.0f }, { 0.0f, -1.0f },
        { 0.0f, 1.0f }, { 0.0f, 0.0f }
    }
};

// Pauli-Z gate
// Negates the phase of the |1> state
// Rotates PI radians around the z axis of the Bloch Sphere
static const TComplexMatrix c_pauliZGate =
{
    {
        1.0f, 0.0f,
        0.0f, -1.0f
    }
};

// Hadamard gate
// Takes a pure |0> or |1> state and makes a 50/50 superposition between |0> and |1>.
// Put a 50/50 superposition through and get the pure |0> or |1> back.
// Encodes the origional value in the phase information as either matching or
// mismatching phase.
static const TComplexMatrix c_hadamardGate =
{
    {
        1.0f / std::sqrt(2.0f), 1.0f / std::sqrt(2.0f),
        1.0f / std::sqrt(2.0f), 1.0f / -std::sqrt(2.0f)
    }
};

//=================================================================================
void WaitForEnter ()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
TQubit ApplyGate (const TQubit& qubit, const TComplexMatrix& gate)
{
    // multiply qubit amplitude vector by unitary gate matrix
    return 
    {
        qubit[0] * gate[0] + qubit[1] * gate[1],
        qubit[0] * gate[2] + qubit[1] * gate[3]
    };
}

//=================================================================================
int ProbabilityOfBeingTrue (const TQubit& qubit)
{
    float prob = std::round((qubit[1] * std::conj(qubit[1])).real() * 100.0f);
    return int(prob);
}

//=================================================================================
TComplexMatrix MakePhaseAdjustmentGate (float radians)
{
    // This makes a gate like this:
    //
    // [ 1  0             ]
    // [ 0  e^(i*radians) ]
    //
    // The gate will adjust the phase of the |1> state by the specified amount.
    // A more general version of the pauli-z gate

    return
    {
        {
            1.0f, 0.0f,
            0.0f, std::exp(std::complex<float>(0.0f,1.0f) * radians)
        }
    };
}

//=================================================================================
void Print (const TQubit& qubit)
{
    printf("[(%0.2f, %0.2fi), (%0.2f, %0.2fi)] %i%% true",
        qubit[0].real(), qubit[0].imag(),
        qubit[1].real(), qubit[1].imag(),
        ProbabilityOfBeingTrue(qubit));
}

//=================================================================================
int main (int argc, char **argv)
{
    // Not Gate
    {
        printf("Not gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  ! = ");
        v = ApplyGate(v, c_notGate);
        Print(v);
        printf("nn");
    }

    // Pauli-y gate
    {
        printf("Pauli-y gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Y = ");
        v = ApplyGate(v, c_pauliYGate);
        Print(v);
        printf("nn");
    }

    // Pauli-z gate
    {
        printf("Pauli-z gate:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("nn");
    }

    // 45 degree phase adjustment gate
    {
        printf("45 degree phase gate:n  ");
        TComplexMatrix gate = MakePhaseAdjustmentGate(c_pi / 4.0f);

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  M = ");
        v = ApplyGate(v, gate);
        Print(v);
        printf("nn");
    }

    // Hadamard gate
    {
        printf("Hadamard gate round trip:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn");
    }

    // 1 bit circuit
    // Hadamard -> Pauli-Z -> Hadamard
    {
        printf("Circuit Hadamard->Pauli-Z->Hadamard:n  ");

        // Qubit: false
        TQubit v = c_qubit0;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: true
        v = c_qubit1;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("nn  ");

        // Qubit: 50% chance, reverse phase
        v = c_qubit01_180deg;
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n  Z = ");
        v = ApplyGate(v, c_pauliZGate);
        Print(v);
        printf("n  H = ");
        v = ApplyGate(v, c_hadamardGate);
        Print(v);
        printf("n");
    }

    WaitForEnter();

    return 0;
}

Here’s the output of the program, showing the qubit circuits in action:

Next Up

In the next part, we’ll look at how multiple qubits work together in quantum circuits so we can do more interesting things. We might need a quick post explaining the Bloch sphere before that though. It’s a fairly important thing, but this post is already enough to digest, so I didn’t include it.

Quantum Computing References

I’m in the middle of some research to better understand quantum computing so that I can write a short series of blog posts entitled “Quantum Computing for Programmers”.

These posts will be light on – and possibly completely absent of – hardcore math and physics, and instead speak more to programmers. It will aim to show the rules of quantum computing, explain which operations are fast and which aren’t, show the basic building blocks of quantum logic, combine those simple gates into more complicated quantum circuits, and most importantly it will have simple sample C++ code which shows this stuff in action, so you can do your own (simulated) quantum computing experimentation and exploration on your own computer.

This page is a list of the references I’ve found so far, and I’ll keep updating it as I find new, useful references.

Gentle Introduction

Ars Technica – A tale of two qubits: how quantum computers work

Technical Details, Very Well Explained

Twisted Oak Studios – What Quantum Computers Do Faster, with Caveats

Twisted Oak Studios – Grover’s Quantum Search Algorithm

A Bit More Advanced, Still Very Readable

Scott Aaronson – Lecture 9: Quantum

Scott Aaronson – Lecture 10: Quantum Computing

Good Info, More Mathy

Math ∩ Programming – A Motivation for Quantum Computing

Math ∩ Programming – The Quantum Bit

Math ∩ Programming – Multiple Qubits and the Quantum Circuit

Useful Wikipedia Pages

Wikipedia – Commonly Used Quantum Gates

Wikipedia – Quantum Circuit

Wikipedia – Deutsch–Jozsa algorithm

Quantum Algorithm Stuff

A Quantum Network Flow Puzzle

Building your own Quantum Fourier Transform

An interactive page that shows you how quantum pseudo telepathy is not communication

Twisted Oak Studios – Implementing Quantum Pseudo-Telepathy

Other

An article on some people who are working on emulating (not simulating) quantum computers using sound waves

A Photonic C-NOT Gate Breakthrough for Quantum Computing

Wolfram Alpha Demonstrations: Generating Entangled Qubits

Wikipedia: Simon’s problem

Quantum Circuit Simulator in a Web Page

Solving Nested Modulus Equations

x \% 2 = 1

The above equation is pretty easy to solve, it just means that x is any value that when you divide by 2, gives a remainder of 1. x is all odd numbers. The more formal answer is:

x \equiv 1 + 2\mathbb{Z}

That reads as “x is equivalent to 1 plus 2 times any integer”. x has infinitely many solutions, so long as they fit that constraint.

That’s easy enough to figure out, but how would you solve this equation?

(x \% 5) \%2 = 1

or this one?

((x \% 7) \% 5) \%2 = 1

One Level (Simple Case)

There is something called the quotient remainder theorem that lets us change an equation like this:

A \% B = C

Into one like this:

A \equiv C + B \mathbb{Z}

The Z means “all integers” which implies that there are infinitely many solutions to A. This reads as “A is equivelant to C plus B times any integer”.

Let’s work out a couple examples.

x \% 5 = 3

That transforms to:

x \equiv 3 + 5 \mathbb{Z}

If you try plugging any positive or negative integer in place of Z, you’ll see that you get a number that satisfies the first equation.

Here’s another one:

x \% 538 = 211

That transforms to:

x \equiv 211 + 538 \mathbb{Z}

Once again you can plug any positive or negative integer in for Z and get a value for x that satisfies the original equation.

Pretty easy and pretty handy right? Let’s get a little more complex.

Two Levels

Let’s say you want to solve this equation from the intro.

(x \% 5) \%2 = 1

The first thing you might do is transform it to the below.

x \% 5 \equiv 1 + 2 N_1 where N_1 \in \mathbb{Z}.

What now? Can we transform it again? If we do, we get this:

x \equiv 1 + 2 N_1+ 5 N_2 where N_i \in \mathbb{Z}.

plugging 1 in for the N1 and N2, we get 8, which does satisfy the original equation.

If we plug 3 in for N1 and 1 in for N2 though, we get 12 which DOESN’T satisfy the original equation. What gives??

Well it turns out there are subtle restrictions on our equations that got lost in the shuffle. Let’s start over…

(x \% 5) \%2 = 1

In effect, what that equation says is “Something divided by two gives a remainder of 1” where the something happens to be x % 5, but we don’t really care about what the something actually is right now.

The equation also implies that the right side of the equation is a value modulo two, and it’s only valid values are {0,1}. Another way to write this is to say that the right side \in \mathbb Z_2 which reads as “in all integers mod 2”.

1 is obviously in the set of valid values {0,1}, so we don’t need any special notation just yet, but let’s keep it in mind as we move onto the next step.

x \% 5 \equiv 1 + 2 \mathbb Z

The above says that x divided by 5 gives a remainder that is 1 plus two times all integers. However there is a catch. There right side of the equation is a mod 5 value, which means it’s only valid values are {0,1,2,3,4}. In other words, the right side of the equation is \in \mathbb Z_5.

This has an effect of implicitly limiting the valid values that we can plug in to Z. It limits it to values of Z where 1+2*Z is in {0,1,2,3,4}. Let’s write this out a little more formally.

x \% 5 \equiv a where a = 1 + 2 \mathbb Z and a \in \mathbb Z_5

Let’s transform the equation again to solve for x:

x \equiv a + 5\mathbb Z where a = 1 + 2 \mathbb Z and a \in \mathbb Z_5

Note that there is no modulus on the left side of the equation now, so the right side of the equation is unlimited into the valid values it can have.

How do we use this resulting formula though to find valid values of x?

First we figure out the valid values of a. a = 1 + 2 \mathbb Z where a \in \mathbb Z_5, so it’s only valid values have to be a subset of {0,1,2,3,4}.

Plugging a -1 in for Z, we get a value for a of -1, which isn’t a valid answer so we throw it away.

Plugging a 0 in for Z, we get a value for a of 1. That is a valid answer so we keep it!

Plugging a 1 in gives us 3 which is valid, so we keep that too.

Plugging a 2 in gives us a value of 5, which isn’t valid, so we throw it away and know that we are done.

Our values for a are {1,3}.

The solution we got was this:
x \equiv a + 5\mathbb Z

Plugging in each value of a means that we get two equations as a solution for x. Valid values of x are the union of these two lists – both equations give valid answers.

x \equiv 1 + 5\mathbb Z
x \equiv 3 + 5\mathbb Z

The first equation gives us values of {…, -4, 1, 6, 11, …}
The second equation gives us values of {…, -2, 3, 8, 13, …}

Taking the union of those, our solutions for x are:
{…, -4, -2, 1, 3, 6, 8, 11, 13, …}

If you plug those into the original equation (x \% 5) \%2 = 1, you’ll see that they are valid solutions! Those equations also represent ALL solutions, so they aren’t only valid, they are also complete.

Let’s step it up just one more notch.

Three Levels (Boss Mode)

Let’s solve the hardest equation from the intro:

((x \% 7) \% 5) \%2 = 1

This starts off just like the two leveled equation. Something on the left mod 2 is equation to 1. The 1 on the right side is \in \mathbb Z_2, but since that’s obviously true for this constant, we don’t need to do anything special. So, we transform it to the below:

(x \% 7) \% 5 \equiv a
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5

Then we do another transformation:
x \% 7 \equiv b
b = a + 5 \mathbb Z
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5
b \in \mathbb Z_7

Then for the last transformation, we get this:
x \equiv b + 7 \mathbb Z
b = a + 5 \mathbb Z
a = 1 + 2 \mathbb Z
a \in \mathbb Z_5
b \in \mathbb Z_7

Now that we have our equations worked out, we need to start finding out what the values actually are. We start with a because it’s what b and x are based on, and is made up of constants. We get {1,3} again as the valid values of a. That gives us:
x \equiv b + 7 \mathbb Z
b = a + 5 \mathbb Z
a = [1,3]
b \in \mathbb Z_7

Now we want to plug each value of a into b and find all the valid values for b. When we plug 1 into b for a, it becomes b = 1 + 5 \mathbb Z, b \in \mathbb Z_7. The valid values for that are {1,6}.

When we plug 3 into b for a, it becomes b = 3 + 5 \mathbb Z, b \in \mathbb Z_7. The only valid value there is {3}.

That means that our valid values for b are {1,3,6}. It’s the union of the valid values we found for each value of a.

That makes our equations into this:
x \equiv b + 7 \mathbb Z
b = [1,3,6]

If we plug those values of b into the equation for x, we get these three equations:

x \equiv 1 + 7 \mathbb Z
x \equiv 3 + 7 \mathbb Z
x \equiv 6 + 7 \mathbb Z

The first equation gives us some x values of {…, -6, 1, 8, 15, …}. The second equation gives us x values of {…, -4, 3, 10, 17, …}. The third equation gives us x values of {…, -1, 6, 13, 20, …}.

Taking the union of all of those, we get…
{…, -6, -4, -1, 1, 3, 6, 8, 10, 13, 15, 17, 20, …}

If you plug those numbers into the original equation ((x \% 7) \% 5) \%2 = 1, you can see that they are valid solutions!

When you are performing this algorithm, if you ever hit a case where there are no valid answers in one of the steps – like say, there were no valid answers for equation b in the above – that means that there is no solution to your equation.

Sample Code

Since solving these equations can be pretty manual and tedious, here is some code that can solve these equations for you. If you were confused at all by the explanation above, the code may also be able to show you how it works better, especially if you step through it and see what it’s doing.

#include <stdio.h>
#include <array>
#include <vector>

//=================================================================================
void WaitForEnter()
{
    printf("nPress Enter to quit");
    fflush(stdin);
    getchar();
}

//=================================================================================
void AddSolutions(std::vector<int>& solutions, int add, int multiply, int mod)
{
    int Z = 0;
    while (1)
    {
        int value = multiply * Z + add;
        if (value < mod)
            solutions.push_back(value);
        else
            return;
        ++Z;
    }
}

//=================================================================================
void AddSolutionsFromSolutions (std::vector<int>& solutions, const std::vector<int>& adds, int multiply, int mod)
{
    std::for_each(
        adds.begin(),
        adds.end(),
        [&solutions, multiply, mod] (int add)
        {
            AddSolutions(solutions, add, multiply, mod);
        }
    );
}

//=================================================================================
int main(int argc, char **argv)
{
    // a,b,c,d...
    // ((x % a) % b) % c = d
    // etc.
    //const int c_values[] = {7, 5, 2, 1};
    const int c_values[] = { 12, 9, 7, 5, 3, 2 };
    const size_t c_numValues = sizeof(c_values) / sizeof(c_values[0]);
    
    // print the equation
    printf("Solving for x:n");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf("(");
    printf("x");
    for (size_t i = 0; i < c_numValues - 1; ++i)
        printf(" %% %i)", c_values[i]);
    printf(" = %inn", c_values[c_numValues-1]);

    // print the solution equations
    for (size_t i = 0; i < c_numValues - 2; ++i)
    {
        char eqn = i > 0 ? 'A' + i - 1 : 'x';
        char eq = i > 0 ? '=' : 0xF0;

        printf("%c %c %c + %i*Z", eqn, eq, 'B' + i - 1, c_values[i]);
        if (i > 0)
            printf(" (in Z_%i)n", c_values[i-1]);
        else
            printf(" (in Z)n");
    }
    printf("%c = %i + %i*Z (in Z_%i)nn", 'A' + c_numValues - 2 - 1, c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);

    // gather up the permutation of solutions for each equation, starting with the lowest equation which has only constants
    std::array<std::vector<int>, c_numValues - 2> solutions;
    AddSolutions(solutions[c_numValues - 3], c_values[c_numValues - 1], c_values[c_numValues - 2], c_values[c_numValues - 3]);
    for (size_t i = c_numValues - 3; i > 0; --i)
        AddSolutionsFromSolutions(solutions[i-1], solutions[i], c_values[i], c_values[i-1]);

    // Detect empty set
    if (solutions[0].size() == 0)
    {
        printf("No solutions!n");
        WaitForEnter();
        return 0;
    }

    // Print the more specific solution equations
    printf("x = ");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        if (i < c - 1)
            printf("%c U ", 'a' + i);
        else
            printf("%cn", 'a' + i);
    }
    std::sort(solutions[0].begin(), solutions[0].end());
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
        printf("%c = %i + %iZn", 'a' + i, solutions[0][i], c_values[0]);
  
    // Print specific examples of solutions (first few numbers in each)
    std::vector<int> xValues;
    printf("n");
    for (size_t i = 0, c = solutions[0].size(); i < c; ++i)
    {
        printf("%c = {..., ", 'a' + i);
        for (int z = 0; z < 3; ++z)
        {
            printf("%i, ", solutions[0][i] + z * c_values[0]);
            xValues.push_back(solutions[0][i] + z * c_values[0]);
        }
        printf("...}n");
    }

    // Show the list of specific values of X
    std::sort(xValues.begin(), xValues.end());
    printf("nx = {..., ");
    std::for_each(xValues.begin(), xValues.end(), [](int v) {printf("%i, ", v); });
    printf("...}n");

    // Test the solutions to verify that they are valid!
    bool valuesOK = true;
    for (size_t i = 0, c = xValues.size(); i < c; ++i)
    {
        int value = xValues[i];
        for (size_t j = 0; j < c_numValues - 1; ++j)
            value = value % c_values[j];

        if (value != c_values[c_numValues - 1])
        {
            printf("Solution %i is invalid!!n", xValues[i]);
            valuesOK = false;
        }
    }
    if (valuesOK)
        printf("nAll solutions shown tested valid!n");

    WaitForEnter();
    return 0;
}

Here’s an example run, where it solves a 5 level equation.

Links

If you know of a better way to solve this type of equation let me know. This is what I found when trying to figure this stuff out, but it doesn’t mean it’s the only way to do it.

The one thing I don’t like about this technique is that when you find solutions, it’s very manual, and very specific to the values used. Imagine if one of the modulus divisors was a variable instead of them all being constants. How would you solve it then? I’m not really sure…

Have some links!

Math Stack Exchange: How to solve nested congruences?

Khan Academy: The Quotient Remainder Theorem