Interleaved Gradient Noise: A Different Kind of Low Discrepancy Sequence

The python code that goes along with this post can be found at

In 2014, Jorge Jimenez from Activision presented a type of noise optimized for use with Temporal Anti Aliasing called Interleaved Gradient Noise or IGN ( This noise helps the neighborhood sampling history rejection part of TAA be more accurate, allowing the render to be closer to ground truth. IGN was ahead of it’s time. It still isn’t as well known or understood as it should be, and it shows the way for further advancements.

IGN can be used whenever you need a per pixel random number in rendering, and in this post we’ll compare and contrast IGN against three of its cousins: white noise, blue noise and Bayer matrices. Below are the 16×16 textures that we’ll be using for comparisons in this post.

For a first comparison, let’s look at the histograms of each texture. There are 256 pixels and the histogram has 256 buckets.

IGN is made by plugging the integer x and y pixel coordinates into a function and gives a floating point value out. It has a fairly uniform histogram. White noise is floating point white noise and has a fairly uneven histogram. Blue noise was made with the void and cluster algorithm, stored in a U8 texture, and has a perfectly uniform histogram – all 256 values are present in the 16×16 texture. Bayer also has all 256 values present in the texture.

Here is C++ code for calculating IGN:

float IGN(int pixelX, int pixelY)
    return std::fmodf(52.9829189f * std::fmodf(0.06711056f*float(pixelX) + 0.00583715f*float(pixelY), 1.0f), 1.0f);

How Is IGN Low Discrepancy?

An informal definition of low discrepancy is that the density of points in an area is close to the amount of area divided by the number of points. That is, if you had 10 points, you’d expect every 1/10th section of the area to have one point in it, and you’d expect all 3/10th sections to have 3 points. An important note is that low discrepancy sequences want LOW discrepancy, but not zero discrepancy. Check out wikipedia for a more formal explanation:

Evenly distributed samples are good for sampling, and thus numerical integration. Imagine you had a photograph and you wanted to calculate the brightness of the photo by taking 10 sample points and averaging them, instead of averaging all of the pixels. If your sample points clumped together in a few spots, your average will likely be too bright or too dark. If your points are evenly spaced all over the image, your average is more likely to be more accurate.

Zero discrepancy is regular sampling though, which can resonate with patterns in the data and give biased results. Low discrepancy avoids that, while still gaining benefits of being fairly evenly distributed.

IGN is low discrepancy in a different sort of way. If you look at any 3×3 block of pixels, even overlapping ones, you will find that the 9 values roughly match all values 0/9, 1/9, 2/9, … , 8/9, but that they are a bit randomized from the actual values. Every 3×3 block of pixels makes a low discrepancy set on the 1D number line.

Let’s pick a couple blocks of pixels and look at the distance between values in those pixels. First is IGN, which has a very low, and constant, standard deviation. The values are well spaced.

Here is white noise which has clumps and voids so has very high variance in distance between values:

Here is blue noise which does a lot better than white noise, but isn’t as good as IGN.

Lastly here is Bayer which is better than white noise, but is still pretty clumpy.

How Does IGN Make TAA Work Better?

TAA, or temporal anti aliasing, tries to make better renders for cheaper by amortizing rendering costs across multiple frames. Why take 10 samples in 1 frame, when you can take 1 sample for 10 frames and combine them?

The challenge in TAA is that objects are often moving, and so is the camera. You can use the current frame’s camera matrix, the previous frame’s camera matrix, and motion vectors to try and map pixels between frames (called temporal reprojection) but there are times when objects become occluded, or similar events that cause the found history to actually be invalid. If you don’t handle these cases and throw out the invalid history, you get ghosting where pixels use invalid history.

A common way to handle the problem of ghosting is to make a minimum and maximum RGB color cube of the 3×3 neighboring pixel colors for the current frame of a pixel, and clamp the previous frame’s pixel color to be inside of that box. The clamping makes any history which is too different be much closer to what is expected. The previous frame’s clamped pixel color is then linearly interpolated towards the current frame’s pixel color by a value such as 0.1. That leaky integration is called “Exponential Moving Average” which allows a running average that forgets old samples over time, without having to store the previous samples.

A great read for more details on TAA is “A Survey of Temporal Antialiasing Techniques” by Yang et al:

So where does IGN come in?

When TAA samples the 3×3 neighborhood, the intent is to get an idea of what possible colors the pixel should be able to take, based on the other other pixels in the local area. The more this neighborhood accurately represents the possible values of pixels in this local area, the more accurate the color clipping history rejection will be. IGN makes the local area more accurately represent the full set of possibilities in small neighborhoods of pixels.

For instance, let’s say you had a bright magenta object in front of a dark green forest background, and you were using stochastic alpha to make the magenta object be semi transparent. That is, the bright magenta object may have an opacity of 0.1111… (1/9) so using a random number per pixel in this object, you’d let 1/9th of the pixels be written to the screen, while 8/9ths of them would be discarded.

Ideally, you’d want every 3×3 block of pixels in this magenta object to have a single magenta pixel surviving the stochastic alpha test so that the neighborhood sampling would see that magenta was a possibility, and to keep the previous pixel’s history instead of rejecting it, allowing the pixel to converge to 1/9th transparency better.

With white noise random numbers, you would end up with clumps of magenta pixels and voids where they should be but aren’t. This makes TAA reject history more often than it should, making for a worse, less converged result.

With IGN, every 3×3 block of pixels (even overlapping blocks) has a low discrepancy set of scalar values, so you can expect that out of every 3×3 block of 9 pixels, that 1 pixel will survive the stochastic alpha test. This is how IGN improves rendering under TAA.

Blue noise sort of has this property, but not as much as IGN does. Bayer looks like it has this property but the regular grid of the result isn’t good for diagonal distances, while also looking more artificial.

Stochastic transparency using various per pixel noise types. The object has 1/9th transparency. Friends don’t let friends use white noise!

In other situations where you need a per pixel random number, results like the above will normally hold as well (small regions of pixels will more accurately represent all the possibilities), this isn’t limited to stochastic alpha.

Derivation Of IGN And Extensions

If you were to sit down to make IGN you might define your constraints as: “Every 3×3 block of pixels in an infinite texture should have the values 1 through 9”. At this point, you’ve basically described sudoku. If you then go on to add “Also, this should include OVERLAPPING blocks”, you’ve made a generalized sudoku. It turns out this is too many conflicting constraint and is not solvable. A way to get around this problem would be to put a little bit of drift in the numbers over space so that it was mostly solved and the error of the imperfect solution was distributed over space. At this point, you have reached how IGN works.

I asked Jorge how he made IGN and it turned out to involve spending a full 8 hour day (or was it longer? I forget!) sitting at a computer tweaking constants by hand until they had the properties he was looking for. That is some serious dedication!

Much like in the spatiotemporal blue noise work ( you might be wondering how to animate IGN over time. Jorge found a way to scroll the IGN over time to make individual pixels have better sampling over time, while still being perfectly good IGN over space. You scroll the texture by 5.588238 pixels each frame:

float IGN(int pixelX, int pixelY, int frame)
    frame = frame % 64; // need to periodically reset frame to avoid numerical issues
    float x = float(pixelX) + 5.588238f * float(frame);
    float y = float(pixelY) + 5.588238f * float(frame);
    return std::fmodf(52.9829189f * std::fmodf(0.06711056f*float(x) + 0.00583715f*float(y), 1.0f), 1.0f);

If you are wondering how you might be able to make vector valued IGN, we did that in our spatiotemporal blue noise work by putting the scalar IGN values through a Hilbert curve. The scalar value was multiplied (and rounded) to make an integer index, and that was put into the Hilbert curve to make a vector out. When we used those vectors for rendering, the resulting noise in the render was very close to scalar IGN. There are probably other methods, but this ought to be a good starting point.

Proposed Terminology: Low Discrepancy Grids

Low discrepancy sequences are ordered sequences of scalar or vector values. They are a function that looks like the below, with index being an integer, and value being a vector or a scalar:

\mathbf{\text{value}} = f(\text{index})

Or in C++:

std::vector<float> LowDiscrepancySequence(int index);

IGN works differently though. You plug in an integer x and y pixel coordinate and it gives you a floating point scalar value.

\text{value} = f(\text{Pixel}_{xy})

Or in C++:

float IGN(int pixelX, int pixelY);

Low discrepancy sequences are in contrast to Low discrepancy sets. Sequences have an order, and taking any number of the values starting at index 0 will also be low discrepancy. Low discrepancy sets don’t have an order, and should only be expected to be low discrepancy if all the values in the set are considered together.

Other terminology calls low discrepancy sequences “progressive” and low discrepancy sets “non progressive”.

So what should we call IGN or similar noise functions that take in a multi dimensional integer index and spit out a scalar value? There is definitely an ordering, so it isn’t a set, but the ordering is 2 dimensional and there really isn’t a starting location, since negative numbers work just as well as positive numbers in the formula.

I propose we should refer to them as low discrepancy grids. That would cover the various types of grids: regular, irregular, skewed, curvilinear, and beyond, and these in any dimension. IGN itself more specifically would be a low discrepancy regular grid, or a low discrepancy cartesian grid.


Related wikipedia pages:


Interleaved Gradient Noise is a very interesting noise pattern for use with per pixel random numbers, optimized towards neighborhood sampling rejection based TAA.

Even though it isn’t as widely known or understood as it should be, a secondary value to this work is showing that per pixel random numbers / sampling patterns can be generated for specific needs with great success.

This concept, along with the importance sampled vector valued spatiotemporal blue noise work recently put out are just two instances of this more general concept, and I believe they are just the beginning of other things yet to be created.

Why Can’t You Design Noise in Frequency Space?

The python code that goes along with this blog post can be found at

To evaluate the quality of a blue noise texture, you can analyze it in frequency space by taking a discrete Fourier transform. What you want to see is something that looks like tv static (white noise) with a darkened center, like the below. The frequencies in the center are the low frequencies, while the frequencies towards the edges are the high frequencies. This DFT shows high frequency randomness without any low frequency content, which is what blue noise is.

Left: blue noise texture. Right: discrete Fourier transform of that texture, showing a blue noise spectrum.

A common question then is: why can’t you just make what you want in frequency space and do an inverse Fourier transform to get the noise out you want? This could let you make all sorts of custom crafted types of noise, not just spatial blue noise.

Let’s try that out in 1D and see what happens.

First we make N complex values from polar coordinates that have a random angle 0 to 2pi and a random radius from 0 to 1. These will be the frequencies for our N noise values. We also want to make sure that the + and – frequency bins are complex conjugates of each other so that when we do an IDFT, we’ll get a strictly real valued signal.

After initializing these frequencies to white noise, we’ll multiply the values by a gaussian kernel to make the values towards the edges smaller. This is a low pass filter since the higher frequencies are reduced and the lower frequencies are mostly left alone. At this point, an IDFT would give us low frequency red noise, so we subtract these frequencies from the original white noise initialized frequencies. This is a high pass filter because the higher frequencies are left alone, while the low frequencies are reduced. At this point, an IDFT would give us high frequency blue noise. (There are a couple other things done, like setting the 0hz DC frequency bucket to a specific value. Check out the python code for more details.) Here is what we get if we do this for 64 noise values (N = 64):

Let’s see how this compares to 64 blue noise values made with the void and cluster algorithm:

The frequencies in the DFTs (right) look pretty similar but the histogram (2nd from right) from the void and cluster algorithm are much more uniform, and the values (3rd from right) look a lot more even. The output of the IDFT actually gave us the “raw values” shown 2nd from left in the first image, which are out of the [0,1] range, but are scaled and shifted to make the “normalized values” shown next to it.

Let’s look at the histogram and DFT for each at 10,240 samples. First is the IDFT method, then void and cluster generated blue noise.

So interestingly, the IDFT method makes noise that is gaussian distributed. This kind of makes sense because we are filling out frequencies as uniform random white noise, which are turning into uniform random white noise sinusoids that are being summed together, which will tend towards a gaussian distribution as you sum up more of them. In contrast, the void and cluster method makes uniform distributed values which are perfectly uniform.

The other interesting difference is that the IDFT method has frequency magnitudes very closely matching a gaussian, while the void and cluster algorithm has distinct valleys. I’ve seen these valleys show up as ripples in DFTs like the below, for 2D blue noise DFTs. It’s unclear to me if the ripples add value to the noise or if they are an imperfect artifact, but seeing as we often see these ripples in DSP (like with sinc), it’s my guess that these probably do add value, but I can’t quantify it.

Most blue noise textures are uniform distributed (we recently put out some work showing how to make them non uniform distributed though: but if you wanted gaussian distributed blue noise for some reason, maybe this IDFT method would work well for you? Hard to say but it could be interesting to try it out.

This is ultimately what the problem with the IDFT method is though… you get gaussian distributed values, not uniform, and the noise seems to be lower quality as well. If these issues could be solved, or if this noise has value as is, I think that’d be a real interesting and useful result. It would be interesting to then take this to vector valued masks and see if the same could be done there (check out my last post for more info:

The way I made noise through IDFT may be completely different than what you have in mind, and if so, you may get very different results. I’d love to hear any thoughts. I’m on twitter as @Atrix256.

I wonder if doing gradient descent on histograms and frequency phases could make uniform distributions and higher quality noise? Also, while there is importance in blue noise being actual blue noise (high frequency, better perceptually and designed to be removed with a gaussian blur), there is also importance in the fact that neighboring pixel values are very different from each other. I haven’t seen any methods for generating blue noise that were based on (anti)correlation but I would bet there’s a method waiting to be found there. If you do an auto correlation of a blue noise texture, it shows that pixels have anti correlation with their neighbors, and slight correlation with the neighbors of their neighbors, and even slighter anti correlation with the neighbors of those neighbors and so on. The ripple goes flat pretty quickly, so maybe an algorithm to satisfy those constraints wouldn’t be that difficult or have that long of a run time?

More Data

Here are some more comparisons of (1) Void and Cluster blue noise, (2) IDFT high pass filtered white noise to make blue noise, (3) IDFT low pass filtered white noise to make red noise. We’ll compare 8 values, 16, 32, 64, 128 and 256.

8 values:

16 values:

32 values:

64 values:

128 values:

256 values:

Not All Blue Noise is Created Equal Part 2

In 2017 I wrote a couple blog posts looking at animating noise for integration over time:

Recently we put out some research work I did with some other people at NVIDIA that is essentially the part three:

This blog post is a follow up blog post to one I wrote in 2018 that showed some subtle but important differences in blue noise textures:

In this blog post we look at some different types of blue noise in frequency space, including the new spatiotemporal blue noise types.

The source images and python processing script that generated the images in this post can be found at:

Some Memes

First, some spatiotemporal blue noise memes 🙂

How Do You Analyze Frequencies of Vector Valued Blue Noise Textures?

The “Blue-noise dithered sampling” paper ( was the first paper to make vector valued blue noise textures. In that paper, they did frequency analysis by DFTing each axis (color channel) separately and showing the results in the respective color channels.

That’s what we did as well in our spatiotemporal blue noise work, but it doesn’t tell the whole story. What this method tells you is that the X axis is blue noise, the Y axis is blue noise and the Z axis is blue noise, but it doesn’t tell you that the 3D vector itself is blue noise. There is in fact a whole paper regarding the difference between vectors as a whole being blue, or their individual components being blue, and how some insights there can lead to better image quality and convergence! (Projective Blue Noise:

Is there a way to do a Fourier transform of a texture full of vectors?

Spherical harmonics comes to mind, but I couldn’t reason out how you could use it for that purpose. You could use SH to fit all the vectors in a texture but you could shuffle the pixels in the texture and get the same results so that doesn’t seem right. It seems like you’d need to start with getting SH to be able to fit a 1D stream of (time series) vectors, and then extend it to 2D.

There is also apparently a way to do a Fourier transform in Clifford analysis that seems promising, but it’s above my understanding so am not sure how it works, or if it really even does work for this case (

Another method was suggested to me on twitter and I’ve lost the tweet unfortunately and can’t remember who sent it to me. If it was you, or you remember who it was please let me know so I can say thank you and give credit! The idea was to take the DFT of each component (which results in a complex number per pixel, per color channel), get the magnitude of each pixel in each component (which results in a real number per pixel, per color channel) and then treat that as an N dimensional vector that you take the magnitude of for the final result. This seems to work well but I can’t explain why it works, so am not certain about it.

In the image below, the top row is a blue noise texture where the red and green channels each have independently generated blue noise. The “DFT Single” column shows that when looking at the DFT of each channel separately, it shows blue noise – a dark hole in the low frequencies in the middle, and randomized higher frequencies. When looking at the DFT combined as described in the last paragraph, it shoes a pinkish noise type frequency content though – strong low frequencies, and then white noise added on top. In contrast, the 2nd row noise was made with the “Blue-noise dithered sampling” paper’s method of making a true vec2 valued blue noise texture. The “DFT Single” row shows blue noise again, and the “Combined” column shows blue noise as well. This is what we’d expect because the scalar RG row makes the X and Y values of the vectors independently, and doesn’t care about the frequencies of the vectors made form combining them. The Vec2 RG row is made considering the combined vectors, on the other hand. You can see similar results for the Scalar RGB and Vec3 RGB rows. All textures are 64×64, 8 bit per color channel pngs.

The blue noise is much less pronounced in the true vector valued blue noise textures. Does this mean there is higher quality vector valued blue noise waiting to be found? I’m not sure, but I do think so. In our spatiotemporal blue noise work, we found that blue noise made with the void and cluster algorithm were higher quality than those made with the “blue-noise dithered sampling” method when generating scalar noise with each.


Next up let’s look at 2D slices and DFTs of those 2D slices of different types of blue noise textures. “Blue 2D” is 64 different 2D blue noise textures generated, each being 64×64. “Blue 2Dx1D” is spatiotemporal blue noise which is 64x64x64 (not included in the repo for this blog post, and generated with the code here: “Blue3D” is 3D blue noise which is 64x64x64. We’ll look at both XY slices and XZ slices.

One interesting thing to see is that 3D blue noise looks the same when sliced on XY or on XZ while the other two are very different. Another is that the spatiotemporal blue noise has a lower cutoff frequency than the 2D blue noise. Maybe adjusting energy sigmas on time vs space could change that. Also, of course, 2D blue noise is white noise on the Z axis while the spatiotemporal blue noise has attenuated low frequencies on the Z axis.

3D DFT – Blue 2D

When we take a 3D DFT of a 64x64x64 image, we get frequency magnitudes that are also 64x64x64. It’s hard to visualize that so I’ll show slices of the 3D DFT – first the XY aligned slices, then the XZ aligned slices to get a different view of the same data.

First is 2D blue noise – 64 different 2D blue noise textures that are each 64×64. Here are the XY slices with slice 0 in the upper left, slice 1 to the right of it, and slice 63 in the lower right.

Here are the XZ slices which just show cross sections of the cylinder we were looking at in the XY slices. At slice 32, there is a vertical black line with a white dot in it. That white dot is DC, but I’m not sure why that black line is there.

3D DFT – Blue 2Dx1D

Next is the 2Dx1D blue noise, or spatiotemporal blue noise. First are the XY slices:

Then the XZ slices:

This spatiotemporal blue noise looks the same as the 2D blue noise we saw last, but with a darkening of all frequencies near where Z is zero. This is what we expect and hoped for: it’s 2D blue noise but also has attenuated low frequencies on the z axis (time).

3D DFT – Blue 3D

Here are the XY and then XZ slices of the 3D DFT of 64x64x64 3D blue noise. The 3D DFT is a darkened sphere surrounded by white noise, which is what we see in the slices as well, with a circle that grows and shrinks, like if you took slices of a sphere.

Inverting a Button Press (Featuring Current Dividers)

This post talks about how to make a circuit where you press a button to turn off a light, and also explains how and why it works.

Here’s a circuit lighting up an LED (diagram made in 5 Volts is powering the circuit and the LED has a voltage drop of 1.85 volts, leaving 3.15 volts. Those 3.15 volts are put across a 100 ohm resistor, resulting in 32 milliamps of current through the circuit. That is a bit high for the LED but my power supply is showing 25 milliamps of current actually going through the circuit, which is more in line with the actual limits of the LED.

This image has an empty alt attribute; its file name is image-1.png
This image has an empty alt attribute; its file name is image-2.png

We can add a switch, so that the light is off until we press the button to turn it on. When the button is pressed, it closes the circuit and allows electricity to flow.

This image has an empty alt attribute; its file name is image-3.png
This image has an empty alt attribute; its file name is image-4.png
This image has an empty alt attribute; its file name is image-5.png

What if we want a circuit where the light is on until we press the button, and then the light turns off?

To make that happen, the basic idea is that you have a switch that when pressed, connects the circuit to a lower resistance path to ground. it can’t be a zero resistance path to ground though, because then it would be a short circuit, draw a lot of current, and your components could heat up and catch fire.

Here i make a circuit through the LED with 200 Ohms of resistance, making 16mA. When the button is pressed, another path opens up though which is 100 Ohms of resistance into ground (32mA).

This image has an empty alt attribute; its file name is image-8.png
This image has an empty alt attribute; its file name is image-6.png
This image has an empty alt attribute; its file name is image-7.png

Since there is less resistance when the button is pressed, there is more current and more power being used when the LED is off (!!!). My power supply says 11 milliamps / 55 milliwatts when the button isn’t pressed, and 45 milliamps / 225 milliwatts when the button is pressed, and the light is off. Interesting that turning off a light would use more power, isn’t it? To help lower the current draw when the button is pressed, you could replace the 2nd resistor with a higher resistor value, but that will also lower the current when the button isn’t pressed, and so make the LED dimmer.

You could keep the same overall resistance but have more of it in the 2nd resistor and less in the first resistor. At best it would make it so pressing the button only used a tiny bit more power than when it wasn’t pressed, but this setup will make it always use more power when the button is pressed.

Why Does Electricity Completely Bypass The Resistor and LED?

You might wonder why when the switch is closed, making the circuit below, that the electricity seems to completely bypass R1 and the LED. The circuit is connected, so shouldn’t some go through R1 and the LED too? What prevents that from happening?

This image has an empty alt attribute; its file name is image-10.png

This gets doubly strange when you realize that while there is no resistor on the path where the switch was, the wire itself does have resistance, so it’s like there is just a very small resistor on that path. Wires have effects on both voltage (voltage drops across sections of them) and current, just like resistors do, so why is this configuration special?

First let’s look at the current in parallel resistors. We’ll start with two 100 Ohm resistors in parallel off of a 5 volt source.

This image has an empty alt attribute; its file name is image-9-3.png

First we can calculate the equivalent parallel resistance of these two resistors. When resistors are in parallel, the effective resistance actually drops. The formula for equivalent parallel resistance is:

1/R = 1/R_1 + 1/R_2 + ... + 1/R_N

So for these, 1/R = 1/100 + 1/100 = 1/50. So, R = 50, meaning these two 100 Ohm resistors in parallel are equivalent to this:

This image has an empty alt attribute; its file name is image-11.png

50 Ohms of resistance at 5 volts means you get 5 volts / 50 ohms = 0.1 amps of current through either circuit.

When a circuit splits though like in the circuit with two 100 ohm resistors, the current splits as well and is divided, possibly unevenly, across those paths.

Paths with lower resistance get more percentage of the current. The formula for current through a resistor in a set of parallel resistors is:

(I * 1/R_k) / (1/R_1 + 1/R_2 + ... + 1/R_N)

Calculating it for R1, we have: (0.1 * 1/100) / (1/100 + 1/100) = 0.001 / (2/100) = 0.05 amps.

It isn’t real surprising that it gets half the amount of amps, since both resistors are the same value. They each get half the current.

What if we change the resistors to have values of 190 Ohms and 10 Ohms respectively?

This image has an empty alt attribute; its file name is image-9-5.png

First up, we can calculate the equivalent parallel resistance: 1/R = 1/190 + 1/10. R = 9.5 Ohms. At 5 volts, we get 5 volts / 9.5 ohms = 526mA of current.

Let’s now calculate how much of the current goes through the 190 Ohm resistor.

(0.526 * 1/190) / (1/190 + 1/10) = 0.026A or 26mA.

Let’s calculate how much goes through the 10 Ohm resistor.

(0.526 * 1/10) / (1/190 + 1/10) = 0.5A or 500mA.

Most of the current by far is now going through the 10 Ohm resistor, the smaller resistor. As R2 gets smaller and approaches 0 Ohms of resistance, like a wire would also approach, it’s going to approach taking all of the current, since one divided by the resistance controls what percentage of the current is taken down that path, and 1 divided by a very small number is a very large number.

That’s the intuition for why current will almost 100% go through the open wire in the circuit with the switch when it’s available. When you put resistors in parallel like this, it’s actually called a current divider, much like a previous post showed how to use resistors to make voltage dividers.

When the switch is pressed in our setup, a small amount of current does go through the resistor though, and tries to go through the LED as well, but there’s another thing at play here: voltage.

Voltage if you remember is the difference in electrical potential between two points. The LED requires a difference of 1.8 volts minimum to let electricity through and to light up. When the switch is pressed, the voltage is the same on the positive and negative side of the LED which means the LED has zero volts and does not light up or let electricity flow through it. Let’s explore why that is…

This image has an empty alt attribute; its file name is image-12.png

In this circuit, there is 5 volts, a total of 1000 Ohms, and so 5mA of current. The voltage drop across R1 is the full 5 volts.

This image has an empty alt attribute; its file name is image-14.png

This circuit also has 5 volts, 1000 Ohms total, and 5mA of current. To calculate the voltage drop of the resistors, you use Ohms law of the form V = IR. You multiply the current of the circuit by the resistor value to get the voltage drop across the resistor. For R1 it’s V = 0.005A * 900 Ohms = 4.5 volts. For R2 it’s V = 0.005A * 100 Ohms = 0.5 volts.

You can see that the larger resistor got more voltage drop, while the smaller resistor got almost no voltage drop.

This image has an empty alt attribute; its file name is image-15.png

This circuit has the same voltage, total resistance, and current. Skipping ahead to calculating the voltage drop across the resistors, R1 is 0.005 * 999 = 4.995 volts. R2 is 0.005 * 1 = 0.005 volts. As R2 approaches zero, the voltage drop also approaches zero. This is why in our original circuit (below), when the switch is closed, there is (almost) no voltage drop across the wire on the right, meaning the voltage above R1 and the voltage below the LED are the same, so there is 0 volts going through the path of the circuit that has an LED on it. This along with the nearly zero current trying to go through there as well.

This image has an empty alt attribute; its file name is image-8.png

If you want to know more about this stuff, give this a read:

There is also a neat technique to do these sorts of calculations called nodal analysis:

There is another technique called mesh analysis:

Alternatives To This “Off Button” Circuit

The fact that the circuit uses more power when the light is off is pretty bad, so you are probably interested in some alternatives.

The button I am using is called “single pole single throw” switch. The single pole means that there is one electrically connected input, and the single throw means that it only has one output that the input is connected to or not. Double pole switches are switches that can control two different circuits with the same button/switch – you can turn on two different parts of a circuit when one button is pressed. Double throw switches are switches that connect to one output if the button is pressed, and a different output if the button is not pressed. Using a double throw switch in the last circuit, you could hook the LED up to the output for when the button was not pressed, and you could leave the other output disconnected, for when the button was pressed. That would make it use no power when the light was turned off, instead of using more power.

With all this talk of switches, you may be wondering if there’s a switch that you can turn on and off with electricity, instead of requiring a human to actually press a button. There are in fact such things! There is something called a relay, which when it is given power, it powers an electromagnet inside of itself and closes a switch using that magnetism. You can actually hear them click as they turn on and off! Much more common for this task are transistors though, which allow small amounts of electricity to control the flow of larger amounts of electricity. This allows them to be used as electronic switches, but also allows them to work as amplifiers. It would be fun to write a blog post about them at a future point. Transistors can be used to make circuits that invert a button press value too though. At that point, we are basically talking about a NOT gate.

Thanks for reading!


Someone pointed out to me that LEDs themselves have internal resistance, so you could move all the resistance down into the shared path. This works because the LED has internal resistance. The nice thing about this is that when the button is pressed, it only uses 25mA instead of 50mA. I tested it and it does indeed work!

This image has an empty alt attribute; its file name is image-19.png

Resistance and Voltage Dividers

When i first started working with electronics, i tended to think of my circuits, or even parts of my circuit, in isolation. The horror of it though is that your circuit is plugged into other things – at minimum a battery, but commonly other devices, or your house and the power grid – and those things can affect how your circuit works.

Beyond being physically connected with wires to other things, your circuits also have a connection to the rest of the world through electromagnetic fields.

In this post we are going to talk about voltage divers, which on one hand can be useful if made on purpose, but can also be made on accident and cause you strange behaviors.

Voltage Dividers

Voltage dividers are a way of giving you a lower voltage. If you have a 9 volt battery and only want 6 volts, a voltage divider can do that for you. There is a downside to voltage dividers that we’ll explore in this post, but they are incredibly simple to make: you only need two resistors.

First let’s look at a single resistor in a circuit. Lets put a 1000 ohm resistor in a circuit with a 9 volt battery. If we connect our multimeter probes to the wire on the same side of the resistor and measure volts we’ll get zero volts (see diagram below). This is because volts is a measurement of electric potential between two points. Our multimeter is measuring the difference in electric potential between two points right next to each other on a wire, and the difference is essentially zero. The red and black arrows on the circuit diagram are where we connect the red (+) and black (-) probes of our multimeter.(Tangent: 9 milliamps is going through this circuit since there is 1000 ohms of resistance and 9 volts. The power supply says 8 but it has limited accuracy, resistors are not exactly their labeled value, wires have resistance, etc. It also reads that there are 9 volts * 8 milliamps = 72 milliwatts of power being used.)

(Circuit diagrams made at

What if we put our multimeter on different sides of the resistor? In that case, we read 9 volts. The resistor makes it more difficult for electricity to cross, and thus there is a difference in electric potential of 9 volts, on each side.

What would happen if we put two resistors in?

If we measure at the red and black arrows again, we’ll still have 9 volts. If we measure at the red and orange arrows though, we’ll see 4.5 volts. If we read at the orange and black arrows, we’ll also see 4.5 volts. We know that the whole circuit needs to go from 9 volts to 0 volts since that is what is provided by our battery, but it dropped by half on the first resistor, and then dropped the rest of the way on the second resistor. (Tangent: the total resistance here is 2000 ohms, so 4.5 milliamps would flow through the circuit)

Let’s change the value of the resistors and see what happens.

I didn’t have a 2000 ohm resistor so i just put two 1000 ohm resistors in series (more on that further down).

If we measure between the red and black, we still have 9 volts. If we measure between the red and orange, we get 3 volts though, and if we measure between the orange and black, we get 6 volts. Weird! (Tangent: The total resistance here is 3000 ohms, so there should be 9 volts / 3000 ohms = 3 milliamps flowing through the circuit but my power supply isn’t showing that correctly.)

Similarly, you can change the second resistor to be half instead of double and get the opposite result.

I didn’t have a 500 ohm resistor so i put two 1000 resistor ohms in parallel (more on that further down).

What is going on here is that the 9 volts are dropping off across the resistors based on their relative values. When the resistors are equal in value, they each get half of the voltage. When they are unequal, the voltage across the R2 resistor is calculated like this:

V_{R2} = V * \text{R2} / (\text{R1}+\text{R2})

To actually use this as a power source, you would connect new wires as the positive and negative power for a sub circuit.

Note in the above, I’m not saying that is -6V and +6V, which would be 12 volts total, I’m just labeling the positive and negative sides of the 6 volts of power available.

You could use the top part as a 3 volt source if you wanted instead, or in addition to the 6 volts you are using from the bottom part. You could even split the voltage into more than just two levels, but instead could put in N resistors to have N voltage levels.

The famous 555 timer for instance internally uses a voltage divider with three 5K resistors to make three different power levels, and that is why it’s called a 555 interestingly. You can see it at the top of this diagram of a 555 timer, between the ground (pin 1) and the +Vcc supply (pin 8).

555 timer block diagram

(This image is from this 555 timer tutorial:

Resistors in Series vs in Parallel

When I needed a 2k Ohm resistor in the last section I put two 1k Ohm resistors in series. When you put resistors in series, their values add together, allowing you to additively create whatever resistance you need.

When I needed a 500 Ohm resistor and didn’t have one, I put two 1k Ohm resistors in parallel. This is because putting resistors in parallel gives electricity more than one path to get through, and thus has lower resistance than if there was only one of the resistors. The exact equation for the resistance of resistors in parallel is:

1/R = 1/R_1 + 1/R_2 + ... + 1/R_N

Where R_i is the value of a specific resistor.

This means that if you put two of the same valued resistors in parallel, the resistance will be cut in half. If you put three of them in parallel, the resistance will be cut in three.

This formula comes up again in electronics. For capacitors, when you put them in parallel, their capacitance adds. When you put them in series, their capacitance follows the parallel resistor equation. It’s the same formulas, but parallel / series reversed. Strange huh?

1/C = 1/C_1 + 1/C_2 + ... + 1/C_N

Where C_i is the value of a specific capacitor (in Farads).

Something else strange is that this is why thicker wire has less resistance too. There are more paths for electricity to travel through the thicker wire, compared to thinner wire, so resistance goes down.

Below are some images of two 1k Ohm resistors in series and in parallel, with the multimeter showing the total resistance value.

One resistor:

Two resistors in series:

Two resistors in parallel:

What Happens When Using a Voltage Divider?

Ok so let’s start with the voltage divider we set up before.

Now let’s say we actually use that 6 volts to power something. That something will have a resistance of 2k Ohms. Maybe it’s some kind of light bulb.

We can simplify this circuit though. The 2k Ohms of our load, and the 2k Ohms of the voltage divider are in parallel so we can use our formula for parallel resistance, or remember that two capacitors of equal value in parallel get half the resistance. So that means we could describe our circuit this way, as far as resistance is concerned:

The problem with that is that our voltage divider has changed. The resistors are equal now, which means that our 6 volts has dropped down to 4.5 volts!

If we decreased the resistance of what we were powering, the voltage would drop too. Intuitively, imagine if you had a short circuit so had zero resistance across the load, the electricity would completely bypass the 2k Ohm resistor in the voltage divider as if it weren’t there, so there would be zero volts difference between the top and bottom of the 2k Ohm resistor.

If we increased the resistance of what we were powering, we would raise the combined parallel resistance there on the 2nd part of the voltage divider, but luckily would at most have 2k ohm resistance. For instance, using a 1 mega ohm resistive load, the parallel resistance formula gives us a resistance of 1.996 k Ohms. So, if we had a high resistance load, we’d get nearly our full 6 volts, but would never quite have the full 6 volts. At the limit, if our load was disconnected, and thus had infinite resistance, we would get the full 6 volts.

If you know the resistance of the load you are plugging into the voltage divider, you can take it into account and choose a resistor for the voltage divider that gives you the desired parallel resistance amount and thus the right voltage. Some loads have variable resistances though, and then you have a problem and should look at other methods of changing the DC voltage level, such as a buck converter.

Some loads have no resistance though, and a voltage divider can come in really handy. Supplying power to a transistor’s base, or to an op amp’s input, or to an optocoupler’s input for instance can make great use of these because they just “read” the voltage signal there without putting any extra load on it.

The lesson here is that whenever you plug things together, you might get strange drops in voltage because you’ve accidentally created a voltage divider. If your resistance is sufficiently higher than whatever internal resistance what you’ve plugged into has, you can ignore the voltage drop, but that also decreases the amperage so may not be desirable.

This effect even comes up in batteries (and other power sources) which essentially can be modeled as an ideal voltage source, with a small resistance (like 10 ohms). If you use a low valued resistor on a battery, the voltage will drop because you are secretly part of a voltage divider involving the internal resistance of the battery (and in fact, that “internal resistor” can’t take that much power and will start heating up, which can be dangerous! So don’t short circuit batteries!). Since a battery’s resistance is so small, your resistance level is likely to be much higher when using the battery to power something, and this isn’t something you really have to worry about in normal situations.

Of course, all this talk only deals with DC and resistors. Things get more complex when you have capacitors, inductors or AC power.

Maximum Power (Watts)

So we saw that as R2’s resistance gets larger, the voltage across R2 becomes larger, and at infinite resistance, it gets all the voltage available.

We also know that the larger the resistance, the lower the amps in the circuit, so getting that voltage comes at a cost.

Watts is a unit of measurement of power and is volts multiplied by amps. It turns out that if you want your voltage divider to have maximum power (watts), that R1 should equal R2. Wikipedia has more about that here:

Here are some graphs showing this, where if resistor R1 is 1k Ohms, that you get the highest amount of watts when R2 is also 1k Ohms, despite the behavior of the volts and amps.

Calculating Resistance (and Voltage) of an Unknown Circuit

Since plugging your circuit into other things can make an implicit / unintentional voltage divider, you probably want to know how much resistance some other black box circuitry might have. Luckily you can figure this out using Ohms law (see last post: Voltage, Amps, Resistance and LEDs (Ohm’s Law)) and some simple algebra.

First, connect a resistor to the + and – and measure the amps in the circuit. If you use a resistor that is too low value, or has too low of a wattage rating, the resistor will get hot, possibly start glowing or burst into flames (resistors have a rating in watts and the common ones for small electronics like those seen in this post can handle 1/4 of a watt). So basically be careful if doing this with high voltages – and in fact, if my blog is your primary source of knowledge, please don’t mess with high voltage 🙂

So let’s say we connect a 1k Ohm resistor and read a value of 0.01 amps or 10 milliamps.

Ohms law says:

I = V/R

where I is current, V is volts and R is resistance.

So we now have this formula:

0.01 = V / (1000 + R_1)

We have one equation with two unknowns, so we need another equation to make it solvable by having two equations and two uknowns. Let’s say we take an amperage measurement using a 500 Ohm resistor and get 0.017 amps or 17 milliamps.

That gives us a second equation:

0.017 = V / (500 +  R_1)

We now have two equations with two unknowns!

We can solve the first equation for V and get:

V = 0.01 * (1000 + R_1)

From there we can plug V into the second equation to get:

0.017 = 0.01 * (1000 + R_1) / (500 + R_1)

Solving for R1, we get:

R_1 = (0.01 * 1000 - 0.017 * 500) / (0.017 - 0.01) =  214.28 \Omega

If you do the calculations, you get 214.28 ohms, which means the unknown circuit has that much resistance.

What’s nice is that you can also use this to get the total amount of voltage available to this circuit by plugging this resistance into the first equation that we solved for V:

V = 0.01 * (1000 + 214.28) = 12.14 \text{volts}

This was a toy example i made up, using 12 volts and 200 ohms of resistance, so our answer is pretty close. The inaccuracies came from rounding off the numbers, but you’ll get the same problems in real life from not completely accurate measurements and imperfect electronic components.

For convenience, here are the equations to calculate the resistance of an unknown circuit, without having to do the algebra each time.

R_1 = ( I_A * R_{2A} - I_B * R_{2B}) / (I_B-I_A)

Where R_1 is the resistance of the unknown circuit. R_{2A} is the first resistor value you connected and measured to get I_A amps. R_{2B} is the second resistor value you connected and measured to get I_B amps.

Once you have the R_1 value, you can plug it into this to get the voltage available to the circuit:

V = I_A * (R_{2A} + R_1)

Let’s take these equations for a spin with a battery. I accidentally popped the fuse on my digital multimeter and can’t use it to measure amps so i’ll use my analog multimeter.

First i’ll measure the amps with a 1k Ohm resistor. The knob is set to 10 milliamp measurements so the bottom row of readings (that are labeled 0 to 10) are where you read from. I drew some yellow to show you where to read from. I read 8.6 milliamps.

Next i’ll put two 1k Ohm resistors in series to make 2k Ohms of resistance and measure amps to get what looks like 4.6 milliamps.

Ok so let’s plug our values into the equations!

R_1 = ( I_A * R_{2A} - I_B * R_{2B}) / (I_B-I_A)

R_1 = ( 0.0086 * 1000 - 0.0046 * 2000) / (0.0046-0.0086) = 150 \Omega

So it looks like this 9 V battery has 150 ohms of resistance. I’ve heard that as a battery is used, it’s resistance goes up, so maybe this battery is nearing needing to be replaced having such large resistance.

Let’s calculate how many volts it has.

V = I_A * (R_{2A} + R_1)

V = 0.0086 * (1000 + 150) = 9.89 volts

So, the battery has 9.89 volts inside of it. Either they made the battery have higher than 9 volts inside of it, to account for internal resistance dropping the output voltage, or my 5$ analog multimeter is not very accurate and these are just ball park figures.


Thanks for reading and hopefully you found this interesting or useful.

Have any requests or ideas for other topics to write about? Drop me a message on twitter at @Atrix256.

Voltage, Amps, Resistance and LEDs (Ohm’s Law)

I’ve taken up learning electronics during the pandemic, and have enjoyed it quite a bit. I’ve been programming for 25+ years, so it’s nice to have something different to learn and work on that is still both technical and creative. It’s cool getting a deeper understanding of how the fundamental forces of nature work, as well as being able to MacGyver a hand crank powered flashlight from an old printer if needed (Check out a 40 second video of that here!). It’s also nice having something physical to show at the end of the day, although it does require consumable parts, so there are pros and cons vs making software.

Friends (Hi Wayne!) and YouTube have helped me learn a lot, but I found the subject pretty alien at first and wanted to try my hand at some explanations from a different POV. This post starts that journey by taking the first steps into DC electronics.

Ultra Basics

Electricity flows if there is a path for it to flow in and the flow is made up of electrons.

Electrons are negatively charged so travel from the negative side of a circuit to the positive side.

Conventional current flow is backwards from this though, and says that electricity flows from the positive side to the negative side. In this case, it’s not electrons flowing but “holes” flowing. Holes are a weird concept, but they are just a place that will accept an electron.

Here is an open circuit, which means that there is a gap. Since the circuit is not closed, electricity cannot flow. (made in

If you close the circuit, like the below, electricity is able to flow.

The circle on the left is a power source with a + and – terminal. It’s labeled as a 1.5 volt double A battery.

Here is a diagram of a circuit with a switch that can be used to open or close the circuit. Being able to read and make circuit diagrams is real helpful when building things or trying to understand how circuits work.

Of note: the higher the voltage, the farther that electricity can jump across gaps. So, while at low voltage, a circuit may be open, turning up the voltage may make it closed when the electricity arcs across!

Ohm’s Law


IMAGE CREDIT: Eberhard Sengpiel

The most useful thing you can learn about DC electricity is Ohm’s law which mathematically explains the relationship between voltage, amperage and resistance. Ohm’s law is:

I = V/R

In the equation I stands for Intensity and means current aka amps, V stands for voltage and R stands for resistance.

If electricity was water, voltage would be the water pressure, amperage would be how much water was running through the pipe, and resistance would be a squeezing of the pipe, like in the image above.

Current is measured in amperes (amps) or the letter A. 500mA is 500 milliamps, or half of an amp, and 1.2A is 1.2 amps. Note: electricity is dangerous! It can take only a few hundred milliamps to be fatal, but voltage is needed to be able to let those amps penetrate your skin.

Voltage is measured in volts or the letter V. If you see 9V on a battery, that means it’s a 9 volt battery, and is capable of providing 9 volts.

Resistance is measured in Ohms or the omega symbol \Omega. So if you see 5\Omega that means 5 Ohms of resistance. If you see 5k\Omega that means 5 kiloohms which is 1000 times as much resistance. If you see 5M\Omega with a capitol M, that means 5 megaohms, which is 1000 times as much resistance again.

Where Ohm’s law comes in handy is when you know two of these three values and you are trying to calculate the third one.

As written, the formula showed how to calculate amps when you know voltage and resistance, but you can use algebra to re-arrange it to a formula for any of the three:

I = V/R

R = V/I

V = I*R

This comes up quite often – if you know how much voltage a battery has, and you know how many amps you need, you can use this to calculate the value of the resistor to get the desired amps.

Diodes, LEDs and Resistors

LED stands for Light Emitting Diode. A diode is something which lets electricity only flow in one direction and it has a couple of common uses:

  • Protecting circuits from electricity flowing in the wrong direction.
  • Turning Alternating Current (AC) into Direct Current (AC) by rectifying it (preventing the negative part of AC from getting through. Same as last bullet point)
  • Lots of cool tricks, like stabilizing uneven power levels by letting voltage over a specific value “spill over” out of the circuit.

Here is a pack of various diodes i bought from amazon for 10$. There are a quite a few different types of diodes, which are useful for different situations.

Here are some diodes close up. The black one is a rectifier diode IN4001, and the more colorful one is a switching diode 1N4148. Those part numbers are actually written on the diodes themselves but are a bit hard to see. You can use these numbers to look up the data sheet for the parts to understand how they work, what their properties are, how much voltage and amps they can handle, and often even see simple circuit diagrams on using them for common tasks. Data sheets are super useful and if doing electronics work, you will be googling quite a few of them! Here is the data sheet for 1N4148 which i found by googling for “1N4148 data sheet” and clicking the first link. 1N4148 Data sheet.

Here are two circuit diagrams with diodes in them. The black triangle with the line on it is a diode. The arrow shows the direction that it allows conventional flow to travel. The line on the arrow corresponds to the bands on the right of the diodes in the image above, which is the negative side of the diode (cathode). The left circuit is a closed circuit and allows electricity to flow. That diode is forward biased. The circuit on the right has the diode reverse biased which does not allow electricity to flow.

LEDs can do many things regular diodes can do, since they are diodes, but they have the property that when electricity flows through them, they light up. Since they are diodes, and only let electricity flow in one direction, LEDs have a + side and a – side and you have to hook them up correctly in a circuit for them to light up. If you hook them up the wrong way, it doesn’t damage them, but they don’t light up and they don’t close the circuit for electricity to flow. The symbol for an LED is the diode symbol, but with arrows coming out of it.

Here are a pack of LEDs i have that came as part of a larger electronics kit. You can get a couple hundred LEDs in a variety of colors from amazon for about 10$. Some LEDs are in colored plastic cases, some are in clear cases. There are even LEDs that shine in infrared and ultraviolet. LEDs also come in different sizes. This pack has 3mm and 5mm LEDS.

Here is an up close look at a white LED. The longer leg is the positive side, which means you need to plug the positive side of the circuit into it if you want it to light up. the negative side has a shorter leg, but the negative side also has a flat side on the circular ring at the bottom, which can’t really be seen in this picture.

All diodes have a voltage drop, which is a voltage amount consumed by the diode. If you are providing less than that amount of voltage, the diode will act as an open switch, and electricity won’t flow through it. The specific voltage drop for diodes can be found in data sheets, but i’ve found it difficult to find data sheets for LEDs. Luckily I picked up a “Mega328” component tester from amazon for 15$. It lets you plug in a component, press the blue button, and then tells you information about the component. It’s super handy! Here you can see the voltage drop of 2 different LEDs. The smaller red LED has a voltage drop of 1.88V while the larger green LED has a voltage drop of 2.5V. If you supply them with less than that amount of voltage, they will not light up!

So what would happen if we tried to connect the LEDs to the batteries below?

The large green LED has a 2.5V voltage drop, while the AAA battery only has 1.5V as you can see on the label. That means the LED doesn’t light up.

The smaller red LED has a 1.88V voltage drop and is connecting to a 9V battery so it has enough voltage and should light up. Let’s use Ohm’s law to calculate how much current – in amps – are going through the LED.

I = V/R and in our case V is 9 and R is 0 because we have no resistance.

I = \frac{9}{0} = \infty

Oops we have infinite current! The LED is destroyed pretty quickly after you plug it in.

There isn’t actually infinite current, because the metal wires connected to the LED have a very tiny amount of resistance to them, just like all wire, and the battery has a limit of how many amps it can give. So in any case, it isn’t infinite amps, but it is a very large number, limited by how many amps the 9V battery can actually deliver. The LED would actually be destroyed. You should basically always use a resistor with an LED to limit the current and keep it from being destroyed. Here is an interesting read about how to calculate the internal resistance of a battery which will then tell you how many amps it can give you: Measuring Internal Resistance of Batteries.

When you have a circuit with this low of resistance, it’s considered a short circuit, and if the LED didn’t get destroyed, the battery would start getting hot and it could become a dangerous situation. This is also why short circuits themselves are bad news. They have a LOT of current running through them which can cause things to heat up, melt and catch fire.

3mm and 5mm LEDs typically want 20 milliamps maximum (20mA or 0.02A) to be at full brightness. If you give them less, they will be less bright but still function.

We can calculate then how much resistance they want to be maximally bright if we know the voltage of the power source we are using and the voltage drop off of the LED we are trying to power.

Let’s take the larger green LED with a 2.5V voltage drop, and power it with a 9V batter, aiming to get 20mA.

First we subtract the voltage drop from the supply to see how much voltage we have to work with: 9V – 2.5V = 6.5V.

Next, we know we want 20mA and we have 6.5V, and we are just trying to solve for resistance so we use Ohm’s law: R = V/I.

R = 6.5V / 0.02A = 325\Omega

So, we need 325 ohms of resistance to get 20mA in our LED from a 9V battery. Here is a pack of resistors i got from amazon for 12$.

Resistors have funny colored bands on them which tell you their rating. You can find charts for decoding them all over the place, but again, the “Mega328” will tell you this too.

In fact, a multi meter will tell you as well. Multi meters aren’t very expensive. Here’s one i got from amazon for 35$ which has tons of features and works really nicely.

I don’t have any 325 ohm resistors, but i do have 470 ohm resistors, so i’ll just use one of those. That’s 14mA if you do the math, which is a bit lower than 20mA, but it still works just fine despite not being as bright as it could be. You can get different resistances by connecting resistors in parallel or series and doing some math, but this works for now. I used a mini breadboard (the green thing) to hook this circuit up. Every horizontal line of 5 holes is connected to each other electrically. It’s a nice way to play with circuits without having to solder things together. By convention, red is used for the positive terminal and black or blue is used for the negative.

By the way, quick fun fact. A 1.5V AA battery is considered dead when it has dropped down to 1.35V. At this point, it still has energy in it though! If you are clever with electronics, you could make circuitry to use this power from dead batteries to give you 1.5V or higher, and you could drain so called dead batteries even further.

LEDs Turning Light Into Power

Many things in electronics turn out to be reversible. Speakers work as poor microphones, and microphones work as poor speakers. Similarly, LEDs can work as poor solar cells and turn light into energy. Want to see? Here i hook my multimeter up to an LED, and have it set to read volts. It reads 48.7mV. Energy is flowing all around us from radio waves, etc, so it’s picking up some of that.

When i put the LED in the beam of the flashlight, it jumps up to 1.644V. Pretty cool huh?

Did You Like This Post?

It’s a little different than what I usually write about, but hopefully you liked it. Careful though, this stuff escalates quickly. Before you know it you’ll be harvesting optocouplers and coils from old printers to make a rail gun.

Perlin Noise Experiments

I talk and write a lot about noise so people will sometimes ask me about Perlin noise and other types of noise used for procedural content generation. I’m not usually much help because the noise I focus on is more about sampling and stochastic rendering techniques.

I was recently ray marching some Perlin noise based fog though, and came across Eevee’s ( great write up on Perlin noise here:

While reading that, it caught my eye that clumping of the random numbers was a problem. “Of course!” I thought to myself “White noise has clumping problems. I wonder how using blue noise instead would fare?” and decided to write this blog post, thinking also that low discrepancy sequences could be useful. This is the results of those and some more basic Perlin noise experiments. TL;DR nothing ground breaking was found, but there may still be some things of interest here.

The simple C++ code that generated the images for this post, and the small python script to make DFTs is available at


2D Perlin noise uses a grid of 2D vectors that is smaller than the final image resolution. To shade a pixel, it gets the four corners of the cell containing the pixel, dot products the vector of each corner to the pixel with the vector at the corner, and does bilinear interpolation of this scalar value to get the color of the pixel.

If you just do that, you get an image that looks like this (Image on left, discrete Fourier transform on right):

That obviously is no good, so just like Inigo Quilez does in his article (, the fractional part of the pixel’s position on the grid is put through a smoothing function to round it out a bit. The original paper used smooth step ( which looks like this:

An improvement in a follow up paper is to use smoother step instead, which is a higher degree interpolating polynomial, which looks like this:

Different Sized Grids

This shows what it looks like to use different sized grids for the perlin noise. The first uses 2×2 grids, then 4×4, then 8×8, then 16×16, then 32×32 and lastly 64×64. It’s interesting that the 2×2 grid Perlin noise looks a bit like blue noise. If you look at the DFT it does a bit as well, but is missing the highest frequencies at the corners, and has quite a bit of low frequency noise.

White Noise

Here we use a cell size of 16×16 on 256×256 images, using 1, 2 and 3 octaves. Each octave uses the same (repeating) white noise vectors.

Here a different set of white noise vectors is used per octave, which doesn’t seem to change the quality much:

Blue Noise

Here a 16×16 blue noise texture is used to generate the angle of the 2D vectors for the grid, on the 256×256 image. A 64×64 blue noise texture and DFT is also shown to see things more clearly. The same blue noise texture is used for each octave. First is the blue noise texture and DFT, then the Perlin noise made with 1, 2 and 3 octaves.

The noise doesn’t look that different visually when using blue noise instead of white, but the DFT has a bunch of dark circles repeated it in, which i believe is because the blue noise has a dark circle in the middle, and we are seeing some kind of convolutional effect. In any case, the lack of clumping in blue noise doesn’t seem to really change anything significantly.

Here we use a “different” blue noise texture for each layer. We actually just use a low discrepancy sequence (R2 to find an offset to read for each octave. Using an LDS to offset reads into a blue noise texture makes for roughly maximally independent reads, which can act as independent blue noise for some usage cases (not 100% sure if that’s true here since there are different scales of the same texture involved, but meh).

Interleaved Gradient Noise

For the “low discrepancy sequence” route, we need a low discrepancy sequence which you plug in an 2D pixel integer index and get a scalar value out. I don’t know that common thinking calls IGN a low discrepancy sequence, or that something of this configuration could be considered a LDS, but I think of it as one because it has the property that every 3×3 block of values (even when they overlap!) have roughly all values 0/9, 1/9, … 8/9.

Here is IGN used to get the angle to make the vectors for the perlin noise grid, using the same noise values for each octave.

Here, R2 is used once again to make “independent” noise values per octave.

An interesting looking result but maybe not real useful. Maybe this just shows that you can plug different styles of noise into Perlin noise to get other looks in the results?

Bigger Renders

Here are larger renders of single octave white noise. First is a 16×16 grid, then a 64×64.

And here’s the same using blue noise – first the 16×16 blue noise texture used for the grid, then the 64×64 blue noise.

Mean Squared Error is Variance

It’s April and this is my first blog post of the year. 2020/2021 has been a hard time for me like it has been for so many other people. After being absolutely destroyed at the end of last year, I discovered I have issues with both anxiety and depression and am talking to a therapist working through the problems, essentially debugging my life and thought patterns to live a better life. The virus and the BS related to the last president pushed me to a breaking point that I just couldn’t brute force muscle through like I normally do. Much improved now though luckily!

So, onto the main topic…

When analyzing randomized things, I often find myself wanting to graph averages to show how well things converge, and also wanting to graph variance or standard deviation to show how much they swing above and below that average. Averages alone can hide that important information. Variance shows up as noise when rendering too, so low variance is a nice thing.

I’ve seen quite a few sampling papers only report variance, not averages, and I never really understood why. The other day someone casually mentioned that mean squared error is variance and it threw me for a loop.

After thinking about it a bit, I was convinced: mean squared error is in fact variance, and root mean squared error is standard deviation. Let me show you…

To calculate the variance of a stream of values, you keep track of:

  1. Average value
  2. Average squared value

Then, variance is just this:

Variance = AverageSquaredValue – AverageValue*AverageValue

And you can square root that to get the standard deviation.

(Which BTW, there is a nice and easy numerically stable way to keep a “running average” that you can read about here:

When we are talking about error, we know that the average value should be 0 if our process is unbiased, so we can modify the variance equation to be the below:

Variance = AverageSquaredValue

And since the value we are tracking is error, we can write it as:

Variance = AverageSquaredError

MSE is “mean squared error” where the word average above is the mean, so…

Variance = MeanSquaredError

And you can square root that to get the standard deviation of the error, which is also RMSE “Root Mean Squared Error”.

The nice thing about MSE being variance and RMSE being std dev is that if you are ok seeing squared error instead of regular error, you can have a single graph that communicates both error and variance in one.

I also find it interesting that squared error is used because that links it to “least squares” curve fitting (, which is pretty darn useful, and makes it feel a lot more ok to be looking at squared error instead of regular error. A benefit of using squared error is that it makes outliers a lot larger / more costly. This means that given the choice between one large error, or many little ones that equal the same amount of error, it will choose the many little ones instead. That means less noise in a render, and less variance.

This was a short post, but I have another one in mind I want to write next – and soon – that ought to be pretty interesting, combining my favorite noise for sampling (blue noise) and a commonly used noise for procedural content generation (Perlin noise).

Until then, stay safe!

Multiple Importance Sampling in 1D

This is a follow up to an article I wrote a few years ago on Monte Carlo integration and importance sampling in 1D:

The simple, well commented code that generated all the data for this post can be found at:

A challenge when doing Monte Carlo integration in rendering is that the function you are trying to integrate is often made up of other functions multiplied together. While you may know how to importance sample some of the parts individually, you ultimately have to choose which thing to importance sample, because you are generating random numbers according to whichever thing you choose.

In rendering, the three things usually being multiplied together are lighting, material and visibility (which makes shadows). Lighting and materials are things you can usually importance sample and are based on the type of light (like a spherical area light) and the material model (Like a PBR microfacet BRDF), while visibility is not usually able to be importance sampled because it is entirely due to the geometry in a scene as to whether a pixel can see a light or not.

If you importance sample based on lighting, you can get poor results when the material ended up being more important to the result. Likewise, if you importance sample based on material, you can get poor results when the lighting ended up being more important to the result.

Multiple importance sampling is a way to make it so that you don’t have to choose, and you can get the benefits of both. More generally, it lets you combine N different importance sampling techniques.


Before going into the explanation, here is how you actually get 1 MIS sample using the balance heuristic, when you have two importance sampling techniques:

F is the function being integrated. PDF1 / InverseCDF1 are for the first importance sampling technique. PDF2 / InverseCDF2 are for the second importance sampling technique. You do this in a loop N times, and take the average of those N estimates, to get your final estimate.

You can generalize to more techniques by just following the pattern. Each sampling technique generates it’s own x and y. Each sampling technique calculates the pdf for that x value for each of the other pdfs. The estimate is the sum of: each y value divided by the sum of each pdf for the corresponding x value.

Note that if part of the function F is expensive (like raytracing for visibility!) you don’t have to do that for each sample. You could get your estimate of lighting multiplied by material like in the above, and after combining them, you could then do your raytracing to multiply in the visibility term.

MIS Explained

You can get a single sample from a monte carlo estimator by randomly generating an x value and calculating the estimate as the function value at that x, divided by the PDF value of choosing that x.

\text{Estimate} = \frac{f(x)}{\text{PDF}(x)}

You may also remember that as the shape of the pdf (histogram) of the random numbers gets closer to the shape of the function you are trying to integrate, that you can get a closer estimate to the actual answer with fewer samples. This is called importance sampling.

Let’s say though that you want to integrate the function f multiplied by the function g and you are able to generate random numbers in the shape of f, and random numbers in the shape of g, but not random numbers in the shape of f multiplied by g.

You know that you can choose to importance sample based on f or g, but that the choice is better or worse situationally. Sometimes you want f, other times you want g.

The simplest way to combine these would be to just use them both for each sample and average them. You could also switch off so that even numbered samples importance sampled by f and odd numbered samples importance sampled by g. This is the same as giving each technique a weighting of 0.5.

We can do better though!

We can make an x value to importance sample based on f, and another x value to importance sample based on g, and then we can calculate the PDF values of each x for each PDF.

If we have good importance sampling PDFs, higher PDF values mean higher quality samples, while lower PDF values mean lower quality samples. We now have the means to give a weighting to a sample based on it’s quality as shown below, where we calculate the weight for sample “A”. Sample “B” would do the same.

\text{Weight}_A = \frac{\text{PDF}_A(x_A)}{\text{PDF}_A(x_A)+\text{PDF}_B(x_A)}

This is called the “balance heuristic”. There are other heuristics that you can use instead, which you can read about in Veach’s thesis (in the links section) and other MIS papers which have come out since then.

If we have a Monte Carlo estimate sample like this:


Some interesting cancelation happens if we multiply that by the weight.

\frac{f(x_A)}{\text{PDF}_A(x_A)} * \frac{\text{PDF}_A(x_A)}{\text{PDF}_A(x_A)+\text{PDF}_B(x_A)} = \frac{f(x_A)}{\text{PDF}_A(x_A)+\text{PDF}_B(x_A)}

That form is the same form seen in the code from the last section, where we also had a sample B that we added to it to get the final estimate.

You may be wondering why sample A and sample B are added together… shouldn’t they be averaged?

Well, if you look at the denominator in that last formula, two PDFs are added together. Each PDF has an expected value of 1, so the expected value of that sum in the denominator is going to be 2. That means that the estimate is going to be half as big as it should be. When you add two of them together, they are going to be as large as they should be. All that has happened is that instead of adding them together and dividing by two to average them, we have divided them by two implicitly in advance before adding them. We are still averaging the two samples. It isn’t exactly averaging, since the PDFs will vary from sample to sample, but on the whole, it’s still an unbiased combination of the two PDFs, which is why we still get the correct answer.

If three PDFs were involved, the weighted samples would be one third the size they should be, and there would be three to add together.

If four PDFs were involved, the weighted samples would be one fourth the size they should be, and there would be four to add together.

It generalizes to any number of importance sampling techniques involved.

One Sample MIS

If you are a fan of stochastic rendering like me, you may be wondering if you really have to do both (all) of the samples, or if you can use the weighting to choose one stochastically and end up with the correct result for less work.

Yes, you can indeed do this and in Veach’s thesis he calls this the “One-Sample Model” in section 9.2.4.

In this case, what you do is calculate the weight for each sample, and then divide each of those weights by the sum of the weights to get a probability for taking that specific sample.

After you roll a random number and choose the single sample to contribute to the estimate, you need to multiply the Monte Carlo estimate by the chance of choosing that item. You are left with something that uses multiple PDFs for importance sampling different parts of the function, but each sample evaluates the function F only once. Useful if F is costly to evaluate.

If you expand out weight1, weight2 and weight1chance, you’ll find that some things cancel out and you are left with the below for actually calculating the estimate. I have to admit I don’t have a good intuitive explanation for why that works, but the algebra says it does, and it checks out experimentally. If you have an explanation, leave a comment!

Piecewise Importance Sampling

Multiple importance sampling is a method for combining any number of importance sampling techniques to sample a specific function.

Something interesting though is that not every PDF involved has to cover the entire function.

What i mean is that you could have a PDF which sampled only from the left half of the domain of a function, and another PDF which sampled only from the right half of the domain of a function.

What would happen is that the inverse CDF for the first technique would only generate x values on the left half of the function to integrate, and the PDF would give zero for any value on the right have of the function.

The second technique would do the opposite.

MIS would not care about this in the least. It would function as normal and let you importance sample a function piecewise, if you could make PDFs that fit the parts of a function well, but weren’t able to make a PDF that fit the entire function well.

Veach’s thesis goes into other things as well, such as being able to give different sample counts to different techniques. It’s definitely worth a read!

Experiment #1 – Importance Sampling & Warm Up

Quick reminder, the code that made the data for these experiments is at:

First up we are going to integrate the function y=\sin(x)*\sin(x) from 0 to pi, doing 10,000 different tests, each test doing 5000 samples, and average the results. We are going to use regular Monte Carlo (mc) as well as importance sampled Monte Carlo (ismc), using the PDF y=\sin(x)/2. Below is the function we want to integrate, and the PDF that we are going to use to importance sample it.

We could show the absolute value of the error at each step (the error being averaged over all those tests) and get this. (data from out1.abse.csv)

That isn’t super easy to read other than seeing importance sampling seems to be less erratic and lower error more reliably. We can change it to be on a log/log plot which helps see decay rates better (especially when things like low discrepancy sequences are involved, which we’ll see later).

That’s an improvement, but there is a lot of noise, even after 10,000 tests. Monte Carlo is noisy by definition, so as you can see, sometimes it gets really low error, but then pops right back up in the next few samples. That erratic nature is not good and if you are doing integration per pixel, the variance will make the noise especially bad. In fact, variance is what we really care about. So long as the integration is converging to the right thing (is unbiased / has zero bias), variance will tell us how quickly it is converging on the right answer.

Here is a log/log variance graph. You can more easily see that the importance sampling is a clear win over the non importance sampled Monte Carlo Integration. (data from out1.var.csv)

Now that we see that yes, importance sampling is helpful, and we have our testing conventions worked out, let’s continue on to more interesting topics!

Experiment #2 – Multiple Importance Sampling

Next up, we are going to integrate the function y=\sin(x)*2x from 0 to pi. We are going to use regular Monte Carlo, but also importance sample using y=\sin(x)/2 again, and also y=x*\frac{2}{\pi^2}. We are also going to do multiple importance sampling using both of those PDFs in conjunction, and also do the “single sample method” of MIS. Here are the functions mentioned.

Here is the log/log variance graph.

Monte Carlo (mc, blue) is the obvious worst. Multiple importance sampling (mismc, green) is the obvious best. The second place worst is importance sampling by the line function (ismc2, yellow). The second place best is importance sampling by the sin based PDF (ismc1, red). The one sample method (mismcstoc, purple) seems to be basically the same as the red line. It takes half as many samples as mismc, so it isn’t surprising that it does worse.

It is good to see that multiple importance sampling is worth while though and does significantly better than the two importance sampling methods involved do by themselves.

Experiment #3 – Piecewise Importance Sampling

Next we are going to do piecewise MIS. We are going to integrate y=\sin(3*x)*\sin(3*x)*\sin(x)*\sin(x) using three PDFs for importance sampling where each is just y=\sin(x)/2 shrunken on the x axis to be 1/3 the size and shifted over so that each PDF is responsible for one third of the function domain. The first PDF for example is y=\sin(3*x)*\frac{3}{2} from 0 to pi/3.

Here is the function we are integrating, showing the 3 zones the PDFs cover:

Here is the first of the PDFs. The other two look the same but are shifted over on the x axis.

Here is the variance for regular Monte Carlo versus the piecewise importance sampling, showing that it is a significant improvement to do the piecewise IS here.

Experiment #4 – Low Discrepancy Sequences

Unsurprisingly it turns out that low discrepancy sequences are useful when doing multiple importance sampling. It would be fun to look at using LDS in MIS / IS deeper in a future blog post, especially because things change in higher dimensions, but here are some interesting results in the mean time.

Here is the first experiment, which compared Monte Carlo (mc, blue) to importance sampling (ismc, yellow), now also using low discrepancy sequences for both.

For low discrepancy Monte Carlo (mclds, orange), instead of using white noise independent random numbers 0 to 1 to make my x values, I start the x value at a random number in 0 to 1 for the first sample x value, but then I add the golden ratio to it and use modulus to keep it between 0 and 1 for each subsequent sample. This is the “Golden Ratio Additive Recurrence Low Discrepancy Sequence”. That beats both Monte Carlo, and importance sampled Monte Carlo by a significant amount.

For low discrepancy importance sampled Monte Carlo (ismclds, green), I did the same, but put that sequence through the inverse CDF to generate numbers from that PDF, using LDS as input. It’s worked well here in 1D, but mixing LDS and IS can be problematic in higher dimensions due to the LDS being distorted from the importance sampling warping, and then losing it’s low discrepancy properties.

Here is the second experiment, which compared MC to IS to MIS, now including low discrepancy sequences:

Everything improved by using LDS, but interestingly, the order of best to worst changed.

Not using LDS, multiple importance sampling was the winner. Using LDS, MIS is still the winner. Since there are two streams of random numbers needed for the MIS (one for each importance sampling technique), I used a different low discrepancy sequence for each. For the first technique, i used the golden ratio sequence. For the second technique, I did the same setup, but used the square root of two instead of the golden ratio. The golden ratio is the best choice for this kind of thing, because it is the most irrational number, but square root of two is a pretty good second choice.

Not using LDS, Monte Carlo was the worst performing, but using LDS, Monte Carlo is in the middle, and it’s the first importance sampling technique that does the worst. The second importance sampling technique is in the middle whether you use LDS or not though.

Here is the third experiment now with LDS, which compared Monte Carlo to a piecewise importance sampled function.

This MIS here needs 3 streams of random numbers, so for the LDS, I used the golden ratio sequence, the square root of 2 sequence, and a square root of 5 sequence. Once again, LDS helps convergence quite a bit in both cases!

I’m starting to run out of “known good irrational numbers” so I’m glad we are at the end of the LDS experiments. There are other type of low discrepancy sequences that don’t use irrational numbers, but then you start having to consider the LDS quality along with the results and all the permutations. If you want to go into a deep dive about irrational numbers, give this article of mine a read:

Before moving on, look at that last graph again. The amount of variance that 5,000 white noise samples has is the same variance that piecewise importance sampling had, when using only 10 low discrepancy samples. Without LDS though, even the MIS strategy took something like 800 samples to reach that level of variance.

In graphics, these samples could easily represent rays shot into the world for something like global illumination, soft shadows, or raytraced reflections.

It would be real easy to try the most naive Monte Carlo algorithm, find out that you need 5000 samples to converge and give up.

Facing this, you may bust out the MIS and try to do better, finding that you could cut the cost to about 1/6 of what it was, at 800 samples needed to converge. That’s still a ton of samples for real time rendering, so is still out of budget. It would be real easy to give up at this point as well.

If you take it one step further and figure out how to get a nice LDS into the MIS instead of white noise random numbers, you could find that you can decrease it even further, down to 1/80th of what MIS gave you, or 1/500th of the cost of the naive Monte Carlo.

10 samples is still quite a few if we are talking about per pixel raytracing, but that is in the realm of real time affordable.

Good sampling matters, and can help you do some pretty amazing things.

Experiment #5 – Blue Noise

Where low discrepancy sequences are deterministic number sequences that give you good coverage over a sampling domain, blue noise is randomized (non deterministic) number sequences that do the same.

There is some nuance to LDS vs blue noise, and when one or the other should be used. The summary is that regular blue noise converges at the same rate as white noise (there are variants like projective blue noise which do better at convergence) but that it starts with a lower error. Blue noise also has better noise perceptually, which is also more easily filtered (it is high frequency noise only, instead of full spectrum noise). So, the rule in graphics is basically that if you can converge with LDS, do that, else use blue noise to hide the error. Blue noise also does better at keeping it’s desirable properties when put through transformation functions, such as importance sampling.

Unfortunately, blue noise is pretty expensive to calculate, especially with the algorithm I’m using for it, so the sample and testing counts are going to be decreased for these tests to 100 tests, using 500 samples each. Blue noise is best for low sample counts anyways, so decreasing the sample count makes for a more appropriate comparison.

Here is the first experiment, which compared MC to ISMC. Now it has blue noise results, to go along with the LDS results.

The result shows that blue noise does better than white noise, but not as good as LDS.

Here is the second experiment, comparing MC to MIS, now with blue noise. You can see how again the blue noise quality is between white and LDS as far as variance is concerned.

Here is the third experiment, showing the effectiveness of the piecewise importance sampling, using MIS. Once again, blue noise has variance between white noise and LDS.


Here are some other great links for learning about MIS via different points of views and different explanations.

Veach’s thesis that introduced MIS and goes into quite a few other options for MIS, as well as more rigorous proofs on variance bounds and similar

Thanks for reading!

Frequency Domain Image Compression and Filtering

Over 4 years ago I wrote a short blog post on images in the frequency domain:

It’s time to revisit the topic a bit and add some more things.

If you are curious about how the Fourier transform works, which can transform images or other data into the frequency domain, give this a read:

The C++ code that goes with this blog post can be found at

Image Compression

When you transform an image into the frequency domain, you get a complex number (with a real and imaginary component) per pixel that you can use to get information about the frequencies (literal sine and cosine waves) that go into making the image. One piece of information is the “phase” or starting angle of that wave. You get the phase by using atan2(imaginary, real). The other piece of information is the “amplitude” of that wave, or how large the wave is in the image. The amplitude is the length of the 2d vector (real, imaginary).

A quick and easy way to do image compression then, is to convert an image to frequency space, find the lowest amplitude frequencies and throw them away – literally zero out the complex number. If you throw enough of them away, it’ll take less data to describe the frequency content of an image, than the pixels of the image, and you’ll have compressed the image.

The more aggressive you are at throwing away frequencies though, the more the image quality will degrade. This is “lossy” compression and is a simplified version of how jpg image compression works. Lossy compression is in contrast to lossless compression like you find in png files, which use something more like a .zip compression algorithm to perfectly encode all the source data.

In the code that goes with this post, the DoTestZeroing() function throws out the lowest 10% amplitude frequencies, then the lowest 20%, then 30% and so on up to 90%. At each stage, it writes all complex frequency values out into a binary file, which can then be compressed using .zip as a method for realizing the image compression. As the data gets more zeros, it gets more compressible.

The top row in the image below shows an original 512×1024 image, the DFT amplitude information, and the DFT phase information. The bottom row shows the same, but for an image which has had it’s lower 90% amplitude frequencies thrown away. The DFT data is 8MB for both (uncompressed), and compresses to 7.7MB for the top picture, but only 847KB for the bottom picture. The inverse DFT was used to turn the modified frequency data on the bottom back into an image.

Here is another image which is 512×512 and has DFT which is 4MB uncompressed. The top image’s DFT data compresses to 3.83MB, while the bottom compresses to 438KB.

While fairly effective, this is also a pretty naive way of doing frequency based image compression!

More sophisticated methods use the “Discrete Cosine Transform” or DCT instead of the DFT because it tends to make more of the frequency magnitudes zero, consolidating the data into fewer important frequencies, which means it’s already smaller before you start throwing away frequencies. DCT and DFT also pretend that the images go on forever, instead of just stopping at the edge. DFT acts as if those images repeat in a tiled fashion, while DCT acts as if they are mirrored at each repeat, which can also be a nice property for image quality.

Other methods break an image up into blocks before doing frequency based compression. Also, you can use wavelets to compress images, or principle component analysis or singular value decomposition. You can also fit your image with “whatever” basis functions you want, using L1 norm regularization to promote the coefficients of your fitting to be zero, to make the fit data be less sparse, just like DCT does compared to DFT.

Another thing you can do is use compressed sensing to skip a couple steps: You take a couple randomized but roughly evenly spaced samples from the image (blue noise or LDS are going to be good options here), and then you can eg find Fourier basis coefficients (DFT!) that match the sparse/irregular data samples you took. This is like throwing out low frequencies, but without having to DFT the whole data set, and then throw things out. It starts with sparse data and then fits it.

Bart Wronski has several write ups on his blog in this area, so give them a read if you are interested:

This is a great read showing how to fit data using L1 regularization and all the related information you might be interested in:

This video is a great overview of the random grab bag of other things I mentioned:

Image Filtering

In my previous post on this topic I showed how you could throw away frequencies that were farther than a certain distance from the center to low pass filter an image, aka blur it. I also showed how if you threw away frequencies closer than a certain distance, it would high pass filter an image, aka sharpen it.

That throwing away of frequency data based on distance is the same as multiplying the frequency data by a mask which has a 1.0 in some places and a 0.0 in others. You can generalize this to multiply frequencies by any number. In the below I restrict the multiplications to be between 0 and 1, but you could definitely go to larger numbers or even go to negative numbers if you wanted.

The below shows the patterns that the images are multiplied by in this section. Top row left to right is a low pass filter, then a stronger low pass filter (gets rid of more high frequencies than the other) and lastly is a notch filter or “band stop” filter. The bottom row is the complement, such that you get the bottom by subtracting the image from white (1.0). Left to right, the bottom row is a high pass filter, then a weaker high pass filter (lets more low frequencies in) and then a band pass filter which only lets certain frequencies through.

First up is the “Loki and Alan” picture. Frequencies and actual picture values filtered from the pictures on the top are present in the pictures on the bottom and vice versa. In this way, blurring compared to sharpening (and edge detection) are two sides of the same coin. It just matters which part you throw away and which part you keep.

Here is what the frequency magnitudes look like. Note that each image has the magnitudes put through a log function, and also normalized to be 1.0 max. This is why even though the high pass filters (and band pass) darken the middle, it doesn’t seem like it. The renormalization obscures that fact a bit, and the middle is brightest (largest amplitudes) which we saw when throwing out the lowest amplitudes in the last section.

Here are the same filters applied to the scenery image. The top right image has some strange patterns in it if you look closely (click the image to view the full size in another tab).

Image Convolution

In the last section, we made “images” by using a distance function, to make values to multiply the frequencies by to filter out certain frequencies.

In this section, we are going to take two images, put them into frequency space, multiply them together, take them out of frequency space, and see what kind of results come out.

There is something called the “convolution theorem” which tells us that multiplication in the frequency domain, is the same as convolution between the images. Convolution is an expensive operation, because you have to loop through all the pixels of one image, and at each pixel, loop through the pixels of the other image, and do some multiplications and additions. Convolution is so slow, that it can actually be quicker to take two images you want convolved to frequency domain, multiply them together, and then take them out of frequency space to be images again.

Convolution is used in graphics for things like blurs, sharpening, or applying bokeh for depth of field, so speeding it up can be a big help! Convolution is also used in audio for things like reverberation which makes audio sound like it was played inside of a cave or a big cathedral.

Technical note: the “kernel” image needs to be centered at pixel (0,0), not the center of the image. Also, the kernel image should be normalized so that summing up all of it’s pixels adds up to 1.0. You also need to zero pad (add a black pixel border to) both the source image and kernel image to be the size of source+kernel+1 on the x and y axis before DFT’ing so they are the same size, and to avoid wrapping problems. After you are done multiplying and inverse DFT’ing, you can remove the black border again.

Here are the 4 images we are going to use as kernel images: A star, a plus, a circle, and a blob.

Here are the DFT magnitudes of those images.

Here is the “Loki and Alan” picture convolved with those kernel images.

You can see that the images somehow take on the qualities of the kernel… the star one is very angular, the plus one is very “plus like” and the circular one is very circular. Note how the blob acts a lot like a low pass filter! In frequency space, it does actually look like one, so that makes sense:

Here is the scenery picture convolved by the same shapes.

If you think the above looks weird when doing convolution on images, you should give a listen to convolution being used in audio. When used for reverb it sounds good, and sounds correct, but if you use it to convolve arbitrary audio samples together, you can get some really interesting and bizarre sounds! You can hear that here:

The dark border around the image is an artifact from adding a black border around the images to make them the right size (zero padding). If you instead just make the convolution kernel image as large as the image you are convolving (and that is already a power of 2, since this FFT requires that), you’d get the below, which has part of the image “wrapping” across from the other side.

If you used the DCT (discrete cosine transform) instead, it would MIRROR the texture instead of wrapping it, so you’d get more similar pixels to what should be there most of the time, compared to DFT which wraps. Another way to solve this problem though is if you are doing convolution in image space, instead of frequency space, is you can throw away any samples that go outside of the valid area of the images. You want to sum up the weight of the samples you actually took though in this case, and divide the final convolution sum by that weight, to normalize it. That will make pixels near the border have higher weights than they should, but it can be a less jarring artifact than the black border, wrapping, or mirroring artifacts.

Truth be told, many of the operations in this article can be done in a handful of lines of python. I find a lot of value in implementing things myself though, as it helps me internalize the ideas to better understand when and how to use them, and how to avoid problems/mysteries that come up when things are used as black boxes. I feel the tide turning though after a recent look at the sea of algorithms relating to SVD,PCA and finding eigenvectors. That is some crazy stuff, and way too much for a single person to deal with, while still trying to be competent in other topics 😛