Many people know that when you blur an image, you are applying a low pass filter that removes high frequencies.
From this, it’d be reasonable to expect that applying a high pass filter would sharpen an image, right?
Well, it turns out that is not the case! High pass filtering an image gives you everything that a low pass filter would remove from the picture, and it gives you ONLY that. Because of this, high pass filtering an image looks quite a bit different than you’d expect.
So when you sharpen an image in something like photoshop, what is it doing?
It does do a high pass filter and then adds those high-frequency details to the original image, thus boosting the high-frequency content. It’s doing an “Unsharp mask” https://en.wikipedia.org/wiki/Unsharp_masking. You may need to open the original image and the one below in separate browser tabs to flip back and forth and see the difference.
The algorithm for sharpening an image is then:
Blur an image using whatever blur you wish (e.g., Box, Tent, Gaussian)
Subtract the blurred result from the original image to get high-frequency details.
Add the high-frequency details to the original image.
Algebraically, this can be expressed as:
image + (image – blurred)
2 * image – blurred
Blurring is most commonly done by convolving an image with a low frequency kernel that sums to 1. If we are assuming that path to blurring, we can actually build a sharpening kernel which encodes the equation we just derived. For “image”, we’ll just use the identity matrix for convolution which is all zeros except a 1 in the center. That gives us this:
2 * identity – blur
If we wanted to make a 3×3 box blur into a sharpening filter we would calculate it this way:
That makes this result:
You could also get a Gaussian blur kernel, like this one of Sigma 0.3 (calculated from http://demofox.org/gauss.html, it’s already normalized so adds up to 1.0) and calculate a sharpening filter from that:
That makes this result:
If you are wondering why a blur kernel has to add to 1, it technically doesn’t, but whatever it adds to is the brightness multiplier of the image it is being applied to. You can even use this to your advantage if you want to adjust the brightness while doing a blur. For instance, this kernel is a 3×3 box blur which also doubles the image brightness because it adds to 2.0.
When using the formulation of 2 * identity – blur to calculate our sharpening filter, if blur sums to 1, and of course identity sums to 1, our equation becomes 2 * 1 – 1 = 1, so our sharpening filter also sums to 1, which means it doesn’t make the image brighter or darker. You could of course multiply the sharpening filter by a constant to make it brighten or darken the image at the same time, just like with a blur.
You might have noticed that the blur kernels only had values between 0 and 1 which meant that if we used it to filter values between 0 and 1 that our results would also be between 0 and 1 (so long as the blur kernel summed to 1, and we weren’t adjusting brightness).
In contrast, our sharpening filters had values that were negative, AND had values that were greater than 1. This is a problem because now if we filter values between 0 and 1, our results could be less than 0, or greater than 1. We need to deal with that by clamping and/or remapping the range of output to valid values (tone mapping). In the examples shown here, I just clamped the results.
This problem can come up in low pass filters too (like Lanczos which is based on sinc), but doesn’t with box or gaussian.
You might be like me and think it’s weird that a low pass filter (blur kernel) sums to 1, while a high pass filter sums to 0. Some good intuition I got from this on twitter (thanks Bart!) is that the sum of the filter is the filter response for a constant signal. For a low pass filter, you’d pass the constant (0 hz aka DC) signal through, while for a high pass filter, you’d filter it out.
First off, fuck Putin. I wish the world was giving more direct support to Ukraine against the invasion (It seems like today, that is starting to happen though luckily!). There’s too much tolerance happening for bad behavior IMO. Violence is terrible, and that’s why it has to be stopped as quickly and decisively as possible. You stop the trouble makers, you don’t just hope they’ll stop making trouble. Ukraine is fighting back hard and IMO we are all hoping they are successful, but they shouldn’t have to do this alone.
Onto the math!
For some reason, the discrete cosine transform (DCT) has been confusing to me for a long time, even though I have been intimately familiar with the discrete Fourier transform (DFT). I expected that there would be more to it than there was.
This post is a follow up to my last post, which talks about how to adjust the position of points in a point set to change the frequencies present in the point set. The last post is actually significantly more complex than this one!
There are two web demos that go with this post, that work like the demos from last post, but using the DCT instead of the DFT:
Let’s start with a quick overview of the DFT. Here’s the formula for calculating the DFT of 1D signals.
N is the length of the signal, k is the frequency being evaluated (0 for DC, 1 for 1hz, 2 for 2hz, etc), n is the index of the current value in the signal, is the value at that index, and is the complex valued coefficient representing the phase and magnitude of frequency k in the signal.
You can also express the equation like this, which explicitly breaks the sum into a sum of imaginary and real parts:
From here, if you wanted to get the magnitude of the frequency in the signal, you’d treat the real and imaginary parts of the coefficient as x and y values of a vector and get the length of the vector. If you wanted to get the phase of the frequency (how much it is offset in the signal), you use atan2(imaginary, real).
Things get a lot simpler for the DCT: we only look at the real / cosine term:
All the symbols are the same except for which is now a scalar value which is the frequency magnitude. You don’t get phase information like you do with the Fourier transform, but no more complex math (ha!) to calculate the magnitude.
That fact makes it a lot easier to get the derivative – or how changing a specific value index in the signal affects a specific frequency. If we want to know how changing the value at index m affects frequency magnitude , all terms of the sum go to zero as constants except for the one involving index m, which makes the derivative this:
You can gather up this value for frequency k for each of the N values and get a gradient which will tell you how to adjust all values in the signal to increase or decrease frequency k.
You can also gather up this value for index m, for each frequency k, to get a gradient that tells you how changing this signal value affects all frequencies.
1D Point Set DCT
We can change the DCT formula to be for sparse values in 1D, instead of a dense N valued signal.
And once again, it’s super easy to get the derivative of, to know how much moving a specific point q affects the frequency.
To calculate a DCT of am MxN image, we have frequency j across the x axis, multiplied by frequency k across the y axis.
If we want to take the derivative of how this frequency pair changes as the sigal value at a specific location pq changes, it’s again pretty easy. All values in the sum are constants except for the one involving the value at pq. The cosine terms are also constants in this context.
2D Point Set DCT
We can change the 2D dense signal DCT into a 2D point set DCT that looks like this:
If we want to get the derivative of frequency as we move a specific 2D point q around, we need to take a partial derivative on the x axis, and a partial derivative on the y axis. This tells us how much the frequency magnitude changes as we move the point on the x and y axis.
There are some differences between using the DCT and DFT for frequency analysis and similar.
For one, the DFT assumes that the data you give it infinitely repeats. The DCT however assumes that the data you give it repeats forever too, but that each time it repeats, it is flipped like in a mirror.
Another difference is that because DFT has phase and DCT doesn’t, translation of data affects DCT frequency magnitude results, while it doesn’t affect DFT frequency magnitude results, but it does affect DFT phase results.
More concretely, imagine you have a 2hz cosine wave starting at x=0 (so, has a phase of 0 degrees). Both DFT and DCT will recognize this as a 2hz frequency with amplitude 1.
If you move this wave to the right so that it starts at x = pi/2, the DFT will still show a 2hz frequency with amplitude 1, but will now show a pi/2 phase. the DCT however, will show a 2hz frequency with amplitude -1!
If you play around in the demos that go with this post you can see this in action, that translation matters for DCT, but not for DFT, when looking at frequency magnitudes.
Lastly, DCT frequency magnitudes can be negative, where in DFT they can’t be negative. The demos are adjusted to account for this.
Once again, thanks for reading, and I hope you find this useful or at least interesting.
And here’s hoping Ukraine comes out on top soon, with friends coming to their aid and helping them rebuild. What a terrible and senseless situation.
As a quick recap, k is the frequency (1 for 1hz, 2 for 2hz, etc), is the complex frequency information for that frequency, n is the index of the current value, is the value at that index, and N is the total number of values.
If you have a 1D point set, one way to Fourier transform it is to make a 1D array of data, fill it with zeros, and then plot those points as ones into that 1D array. You then Fourier transform that array using the formula above.
It’s up to you what size to make that array. If you make the array smaller, the transform will be faster, but it will be less accurate in general. If the array is larger, the transform will be slower, but will be more accurate in general.
Here are some examples of different sizes arrays holding the points set (0.0, 0.25, 0.5, 0.75), where we assume the array holds the values [0,1):
See how the sized 9 array doesn’t have the points evenly spaced? If you have values that don’t evenly divide the array size, larger arrays make the plots more accurate.
Another way to transform these points though is to think of each of the points as a delta function that is zero everywhere, except at their value, where the function is one. We can then modify the Fourier transform to sum up the contribution of each point towards a specific frequency. Since we know the function of each point is zero everywhere except one location where it is one, we only have to evaluate it at one place. That gives us this:
Where again, k is the frequency being analyzed (1 for 1hz, 2 for 2hz, etc), is the complex frequency information for that frequency, P is the total set of points in the point set, and p is the value of the current point in the point set.
Adjusting 1D Point Sets in Frequency Space
To adjust frequency magnitudes of our point set, we are going to need to show the formula for calculating frequency magnitude and we are going to need to differentiate it to get the gradient of it, so we know which points to move in which direction to increase or decrease specific frequency magnitudes.
To calculate the magnitude of a specific frequency, we start with our equation from the last section.
Remembering that , we can turn that to this:
The magnitude of the frequency is the length of the vector (real, imaginary), so let’s break up the real and imaginary parts:
The magnitude would then be:
We’ll remove the square root to make it easier to differentiate, and get this:
Now we need to get the derivative of that function for each point p, which together is called the gradient. This tells us how far to move each point to make the frequency magnitude increase by 1. It also tells us the direction since a negative number would mean to move the point to the left and a positive number would mean to move the point to the right.
We can use the sum rule to break our function into 2 simpler derivatives and deal with them separately:
Let’s start by differentiating the real (cosine) term:
We can use the product rule to deal with the squaring:
That is all differentiated except for the last term:
If we are differentiating this function g for a specific p (we’ll call it ), all of the terms are constants and disappear, except for the term. This is easily differentiable because
We can plug our g result back into f and get this as the derivative of our cosine (real) term:
If we do the same steps for the sine (imaginary) term we end up with this:
We can then add f’ and h’ together to get the full equation we were looking for!
If you calculate this value for each point and put it into an array, you’ll have the gradient of the frequency k’s magnitude (squared). If you want to increase the frequency k’s magnitude, you add the gradient to the points. If you want to decrease the frequency k’s magnitude, you subtract the gradient from the points.
The gradient is only guaranteed to be accurate for an infinitesimally small step, so what I do in the web app is normalize this gradient vector, and multiply it by a step size before I move the points. I also multiply the step size by how many points there are so that the distance that each point moves doesn’t decrease as more points are added (otherwise they would have to share a unit length movement among N points, which gets shorter as N gets bigger). I also have a step count which allows this operation to be done M times, re-evaluating the gradient each time. This results in an interactive user controlled gradient descent.
You can play with the demo here to see it all in action:
The 2D Fourier transform is just the 1D Fourier transform done on each axis. Because of this, you have a horizontal frequency j and a vertical frequency k, for an image that is MxN pixels:
is the complex frequency information for the frequency, j is the horizontal frequency, k is the vertical frequency, is the pixel value at location (m, n).
We can change this to be a Fourier transform of points p again like this:
Where is the x component of the point and is the y component of the point.
We can use the identity to simplify the equation a bit:
… and we are done 🙂
Adjusting 2D Point Sets in Frequency Space
Let’s jump to breaking the function into real and imaginary parts.
The magnitude would then be:
We can square both sides again to make it easier to differentiate:
Remembering that we can differentiate each term independently, let’s start with the cosine (real) part again and start differentiation using the product rule:
We now have two variables to get derivatives for though, the x and y components of a specific point. We’ll call them and . We’ll start with . Remember that all terms of the sum except the ones with in them are constants and will become zero.
We can do the say with the y component and get:
We can do the same process with the sine (imaginary) part and get these two equations:
We then add the real and imaginary x functions together to get a value for x, and we add the real and imaginary y functions together to get a value for y.
So the gradient of the function now has a 2D vector for every entry instead of a scalar. I’m not sure if this is still called a gradient. To normalize it, I sum up the length of each of the 2D vectors and then divide them all by that value. I then multiply by the number of points for the same reason as before, and I also have a step size multiplier, and a step count, just as in the 1D case.
If you are up for a challenge, I have one for you!
Blue noise points are “randomized but roughly evenly spaced” and yet, I once stumbled on how to make a point set which shows a blue noise spectrum but has clumps. It’s a preset in the 1D demo and it looks like this:
Red noise on the other hand is defined as randomized but clumping points.
The challenge is this: Can you find points which are reasonably spread out on the numberline, or otherwise not clumping, but show a red noise spectrum?
In some TAA implementations, instead of taking the full 3×3 neighborhood around a pixel, only the 4 cardinal direction neighbors will be sampled, making a plus shape (+) of sampling. This can reduce memory bandwidth requirements because it cuts the neighborhood sampling in half, from 8 samples down to 4.
In this post I’ll show a low discrepancy grid that is optimized for this sampling pattern. The formula for it is below, where pixelX and pixelY are integer pixel coordinates.
z = ((x+3y+0.5)/5) mod 1
or as code:
float PlusShapedLDG(int pixelX, int pixelY)
return fmodf((float(pixelX)+3.0f*float(pixelY)+0.5f)/5.0f, 1.0f);
While we are talking about LDGs I also want to show another one based on a generalization of the golden ratio to 2D, made by Martin Roberts (https://twitter.com/TechSparx) which is calculated like this:
z = (x / 1.32471795724474602596 + y / (1.32471795724474602596 * 1.32471795724474602596)) mod 1
At the end of the post we’ll analyze these noise types along with some others:
Derivation of Plus Shaped Low Discrepancy Grid
It took a couple attempts at deriving this before I was successful.
We want a regular grid of values where each plus shape has every value 0/5, 1/5, 2/5, 3/5, 4/5. When I say every plus shape, I’m including overlapping ones. In TAA when a pixel looks at it’s plus shaped neighborhood, we want it to get an accurate as possible representation of the total possibilities for that pixel in that region of the screen. The pixels it finds should very accurately represent the actual histogram of what is possible in this area of pixels. The more accurate we make this, the better the neighborhood sampling history rejection/preservation logic will work.
I started out by putting symbols in a plus shape like this, planning to solve for the actual values later:
I next needed to figure out how to fill in the corners of these pixels. I opted to do so like this, trying to make the repeated values be as far away from the original values as possible.
You can see that at the center of each edge is the center of a plus shaped pattern which has 4 of the 5 letters already, so we can complete the plus by adding the 5th letter.
To fill out the rest of this grid, you can notice that there is a pattern of how letters are duplicated in the above: Their copy is either two to the right and one down, or two down and one to the left. You can use this pattern to complete this 5×5 square.
After filling out this 5×5 square you can see that both rules are true: symbols are repeated both two cells down one cell to the left, and also two cells to the right and one cell down.
Interestingly, if you continue growing this square outwards, it just repeats this 5×5 tile over and over, so we are done figuring out how to tile our values, but we still don’t know where the values should be or how to make a formula that calculates them.
At first I tried plugging in 0.0 for A, 0.2 for B, 0.4 for C, 0.6 for D and 0.8 for E. That made a really messy looking grid that I was unsure how to replicate with a formula.
Thinking about it differently, I looked at the first row which goes in order B,D,E,A,C and I made the values be in that order. B got value 0.0, D got 0.2, etc. That left me with this:
To make things easier to see, here are those values multiplied by 5:
It’s a bit easier to see a pattern here, isn’t it?
Starting at the upper left as (0,0), we can see that going to the right, the value increases by 1. Since this tile repeats infinitely, it means that when we go past 4, we go back to zero. So for that the formula would be z = x % 5.
We can also notice that taking a step down on the y axis, we add 3, but once again wrap around if we get past four. Putting this into the previous equation, it becomes z = (x + 3y) % 5.
We want this divided by 5 to be the values 0/5, 1/5, … 4/5, so our final equation becomes z = Fract((x + 3y)/5). Or z = ((x + 3y)/5) mod 1. Whichever notation you prefer.
Now for some subtlety. If you take the average of 0/5, 1/5, 2/5, 3/5, 4/5 you get 0.4. To make this unbiased, we want the average to be 0.5, which we can do by adding 1/10th to every value. That means our equation becomes z = (((x + 3y)/5) + 1/10) mod 1 or z = ((x + 3y + 0.5)/5) mod 1.
Another way to solve this problem could be instead of having values 0/5, 1/5, 2/5, 3/5, 4/5, you could instead divide by 4 to get 0/4, 1/4, 2/4, 3/4, 4/4, which would average to 0.5 as well. You may very well want to do that situationally, depending on what you are using the numbers for. A reason NOT to do that though would be in situations where a value of 0 was the same as a value of 1, like if you were multiplying the value by 2*pi and using it as a rotation. In that situation, 0 degrees would occur twice as often as the other values and add bias that way, where if you were using it for a stochastic alpha test, it would not introduce bias to have both 0 and 1 values.
Before analyzing this noise, let’s talk about the R2 LDG.
To make the R2 low discrepancy sequence, you divide the index by 1.32471795724474602596 and fract to get the x component of the LDS, and divide index by 1.32471795724474602596 * 1.32471795724474602596 and fract to get the y component of the LDS.
To make the R2 low discrepancy grid, you divide the integer x pixel coordinate by 1.32471795724474602596, divide the integer y pixel coordinate by 1.32471795724474602596*1.32471795724474602596, add them together and fract to get the final scalar value.
Interestingly, this works with any rank 1 lattice, so there is some exploration to be done here IMO, to find more low discrepancy grids and see what sort of properties they have.
In fact, you can express both the plus shaped LDG and this R2 LDG in this rank 1 lattice style LDG:
z = (x * A + y * B) mod 1
With the plus shaped LDG, A is 1/5 and B is 3/5.
With the R2 LDG, A is 1 / 1.32471795724474602596, and B is 1 / (1.32471795724474602596*1.32471795724474602596).
Here we’ll use various types of grid noises to turn greyscale images into black and white stippled images. We do this by testing each image pixel against the corresponding noise pixel. If the noise pixel is a lower value (darker) than the image pixel, we put a black pixel in the output, else we put a white pixel in the output.
Here we test noise values against the transparency value and if the noise is less, we write a magenta pixel, else we don’t. The percentage of pixels that survive the transparency test are shown, and ideally would match the transparency value for the best results.
One thing worth talking about is that the percentage of white noise pixels that survive the alpha test swings pretty wildly compared to what the actual transparency value is. This effectively makes the pixels more opaque or more transparent than they should be, which causes problems when filtering spatially and/or temporally. That is on top of how white noise clumps together and leaves holes, which make it harder to filter than more equally spaced data points.
Another thing worth pointing out is that the plus shaped noise is VERY wrong at 10% and 30% percent, but does very well at 20% and 40%. The reason for this is because of how the plus noise is discretized into 1/5th increments. The other noises have all values (0 to 255, because these are U8 textures) which means they work better at arbitrary opacities.
With all noises except the plus noise, as you smoothly increase the opacity, pixels will slowly start appearing. With the plus noise, as you smoothly increase the opacity, pixels will appear in large clumps, instead of appearing one by one. A way to deal with this could be to take a hint from stratified sampling, and instead of adding 1/10th to the noise to unbias it, you instead add a random number between 0 and 1/5th. It will still have the correct average, so won’t be biased, but the random numbers could break things up a bit. You could even use a blue noise texture as the source of those random numbers perhaps.
Here is a histogram of each noise, which shows what I’m talking about regarding the plus noise:
The plus shaped noise does very poorly in these 3×3 regions, because there are only 5 possible values, and we are looking at how unique the values are from 9 different pixels. It definitely is not optimized for this usage case.
R2 does quite a bit better, but not as good as IGN, which makes sense because R2 is meant for “general purpose use” as a low discrepancy grid, where IGN is meant specifically to have great LDS properties in a 3×3 region.
Plus Shaped Region Analysis
In this test, taking plus shaped samples and analyzing how their values lie on a numberlines, white noise shows the worst results as per usual. Bayer and also blue noise don’t do that great either.
Now, unlike the last test, where IGN beat R2, we can see that R2 beats IGN. This shows again that R2 is good in “general purpose uses” where IGN is optimized towards just 3×3 blocks.
Lastly, we see the plus noise doing the best here – in the situation it was optimized for, which is no surprise. Any randomization added to this noise to help break up the quantization artifacts will make this specific test have a higher standard deviation of distances. With good noise (like blue noise?) used to jitter, the standard deviation may only go up a little. Having the standard deviation go up a little bit probably would help results in general when using this noise. After all, the goal of low discrepancy sequences is to have LOW discrepancy (discrepancy being some variance in the spacing here) but not NO discrepancy, since having no discrepancy is regularly spaced sampling, which has some bad properties, including aliasing.
Plus Shaped Noise vs IGN
Jorge (maker of IGN) derived the same plus shaped sampling noise that I did (I did my derivation after he said he had found such a noise, and then we compared to see if we found the same thing). He put the noise through some tests, using a plus shaped neighborhood sampling TAA implementation and he found that IGN performed better than this plus shaped sampling noise. I’m not sure the details of his test, or how much better IGN did, but it would be interesting to do some analysis and share those details. I may do that at some point, but if someone else does it first, please share! I’m curious if the problems came up due to the discretized values of this plus noise, and if jittering the values using good noise helps the problems at all.
You might be wondering how IGN is fully floating point when the plus noise is discretized.
If we tried to derive IGN the same way as we did with the plus noise, you would want to make every 3×3 block of pixels to have every value 0/9, 1/9, … 8/9, even overlapping ones. If you work through this generalized sudoku, you’ll find that there are too many constraints and it actually isn’t solvable. A way to get around this is to have some numerical drift of the values over space, so that you spread the error of it not being solvable over distance. That is what IGN does and is why it isn’t a discretized noise, having only x/9 values. I’m not sure if IGN optimally distributes this error evenly over distance or not though. That would be an interesting thing to look at.
Hopefully you found this post interesting and have some new tools in your toolbelt.
If you ever need VECTOR valued noise but only have SCALAR valued noise, you can try putting your scalar values through a Hilbert curve to turn your scalars into vectors. In my experience, this isn’t as high quality as having true vector valued noise, but does actually work in preserving the scalar noise properties in the resulting vector valued noise somewhat so is a lot better than nothing.
If you try using this noise or try out any of the things mentioned above or similar, it would be great to hear how it goes for you, either here as a comment or on twitter at https://twitter.com/Atrix256.