Follow me on this train of thought 😛
1) There’s this thing called super sampling where you render an image at a larger resolution, so that you can properly downsample it to the right size (the size of your screen for instance) to avoid aliasing problems. The problem here is that you are rendering more pixels, so it is more expensive to render, which is usually a deal breaker for real time applications that are trying to push the envelope of performance – like modern games.
2) There’s a way to get around this with something called Temporal Supersampling where you use the last frame rendered to provide extra information for the current frame, so that in a way, you get supersampled data by spreading it out over 2 frames. (More info on supersampling: Temporal supersampling). You get better results by jittering (offseting) the pixels you render from frame to frame, by a sub-pixel amount. This is the usual monte carlo sampling kind of situation… find some cheap but well behaving pseudorandom number generator you can run in your pixel shader to offset each pixel by, or use a regular pattern of some sort that gives good enough results.
3) That gives you 2 samples if you only compost the last and current frame, but more samples is better of course. You could keep more frames from the past around, but that takes up the precious resource of memory. Apparently, when the hardware does MSAA (multisampling antialiasing), it has different configurations for different numbers of samples and it’s configurable somehow. If you have 2 samples, they may be 2 vertical dots, or 2 horizontal dots. If you have 3 samples, it might look like a “3” on a domino. If you have 5 samples it might look like a “5” on a domino.
4) Sometimes a corner will be sampled so that a sample can be shared across multiple pixels to increase efficiency. There is this really interesting thing called “flipquads” that samples on an edge for that same reason. You can see some info on here: An Extremely Inexpensive Multisampling Scheme. Basically, you only do two samples per pixel, sampling at 2 of the 4 sample locations on the edge of a pixel, so that the pixels that share the edge can use the results. Effectively, you are doing 2 sample per pixel, but getting 4 samples per pixel due to sample sharing.
5) If you combine flipquads with temporal supersampling, it means that you get 4 samples for the cost of 2, amortized over 2 frames. So, you essentially just render the normal amount of pixels (1 sample per pixel), compost frame N against frame N-1, and get the benefit of a 4 tap MSAA. So, it’s really cheap, and yes… it does actually help significantly, despite the fact that so many samples are redundant.
None of the above is anything new… I watched it all in various SIGGRAPH 2014 presentations earlier today from big name modern games – and man am i amazed what people are doing these days!
Now for the new part…
One way for raytracers to get better visual quality is to do multiple rays per sample, doing monte carlo sampling, where each of the rays in the group is perturbed by tiny amounts. Some details here: Advanced Topics in Computer Graphics: Sampling Techniques
In my own personal OpenCL real time raytracer, I don’t have the luxury of doing multiple rays per pixel – and in fact, I have a graphics option that allows you to render only half the screen (top / bottom) each frame alternating, to cut the number of rays down so that it runs faster!
What if a person was able to do temporal supersampling with a realtime raytracer, using flipquads to make it so it could get the information of 4 rays per pixel, while only taking a single ray cast per pixel each frame? Wouldn’t that be something?
There are some technical details to work out but I think there is some real magic here waiting to happen.
The biggest technical problem I foresee is reprojecting the pixels from the last frame to the current frame. This probably would work ok if your rays had a strict projection matrix governing them, but there may be difficulties with reflection and refraction, and honestly, I personally want to distort camera rays for game effects (like being underwater) so wouldn’t want to be stuck with a strict projection matrix. Maybe there’s some clever solution to make it all ok though…
Also – the link to flipquads is actually an explanation of “fliptris” a technique using 1.25 samples per pixel. If that were amortized across 2 frames, that means you would only need to cast 62% of your rays theoretically. That might be a nice performance win, while gaining the benefits of temporal supersampling and ultimately having 3 samples for each pixel!