Picking Fairly From a List of Unknown Size With Reservoir Sampling

There was an awesome graphics paper in 2020 that put together some fun and interesting statistics ideas to try and solve the “many lights” problem of ray tracing. The many lights problem is that when you are trying to shade a pixel on a surface, you technically need to shoot a ray at every light to see if it is shining onto the surface or not. This is relatively ok for a handful of lights, but becomes a big problem when you have 100,000 lights. The method in the paper has each pixel try a couple samples, share what was found with neighbors, and share with itself next frame to have some memory between frames so that it can improve where it samples over time.

The paper is a good read and you should check it out: “Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting” https://research.nvidia.com/sites/default/files/pubs/2020-07_Spatiotemporal-reservoir-resampling/ReSTIR.pdf

In this post, we are going to be talking about one of the methods the paper uses, which is reservoir sampling. This allows you to stream a list of data and fairly (uniform randomly) choose an item from the list as you go. This also works when you want to choose items in the list with different probabilities. You can also just choose a random subset of the list to look at, do the same process, and the results aren’t bad.

This is useful in the many lights problem because it allows a random subset of lights to be considered by each pixel, each frame, and over time they find the best lights to sample with limited ray counts.

There are lots of other uses for reservoir sampling though, within rendering and game development, and outside of it too.

The 237 line single file C++ program that made the data for this post can be found at: https://github.com/Atrix256/WeightedReservoirSampling

Uniform Reservoir Sampling

Choosing an item uniform randomly from a streaming list of unknown size is surprisingly easy. Starting at index 1, the chance of selecting an item is 1/index. This means that index 1 is chosen when seen, then there is a 50% chance (1/2) to select index 2 instead. After that, there is a 33% chance (1/3) to select index 3 instead, and a 25% chance (1/4) to select index 4, and so on.

The mathematics of why this works is illustrated well in this youtube video: https://www.youtube.com/watch?v=A1iwzSew5QY

int index = 0;
int chosenValue = 0;

while(MoreValuesAvailable())
{
  index++;
  int nextValue = GetNextValue();
  float chance = 1.0f / float(index);
  if (GetRandomFloat01() < chance)
    chosenValue = nextValue;
}

Weighted Reservoir Sampling

Adding non uniform sampling to the algorithm is not much more difficult. The chance to choose a specific item becomes the weight of that item, divided by the sum of the weights of the items seen so far. The weights don’t have to be a normalized PDF or PMF, you can use weights of any scale. If you use a weight of “1” for all the items, it becomes the same as uniform reservoir sampling.

float weightSum = 0.0f;
int chosenValue = 0;

while(MoreValuesAvailable())
{
  item nextValue = GetNextValue();
  weightSum += nextValue.weight;
  float chance = nextValue.weight / weightSum;
  if (GetRandomFloat01() < chance)
    chosenValue = nextValue.value;
}
PDF(x) =3x^2

Subset Uniform Reservoir Sampling

If you are trying to uniform randomly choose an item from a list that you know the size of, but is just too big, you can do uniform reservoir sampling on a random subset of the list. The larger the random subset, the better the results, but smaller subsets can work too – especially if you are amortizing a search of a larger subset over time.

int chosenValue = 0;

for (int index = 1; index <= sampleCount; ++index)
{
  int nextValue = GetRandomListValue();
  float chance = 1.0f / float(index);
  if (GetRandomFloat01() < chance)
    chosenValue = nextValue;
}
10% of the list was sampled in each test

Subset Weighted Reservoir Sampling

Adding weighting to the subset reservoir sampling is pretty easy too.

float weightSum = 0.0f;
int chosenValue = 0;

for (int index = 1; index <= sampleCount; ++index)
{
  item nextValue = GetRandomListValue();
  weightSum += nextValue.weight;
  float chance = nextValue.weight / weightSum;
  if (GetRandomFloat01() < chance)
    chosenValue = nextValue;
}
10% of the list was sampled in each test. PDF(x) =3x^2

Combining Reservoirs

If you have multiple reservoirs, they can be combined. The paper mentioned uses this to combine last frame reservoir samples with current frame reservoir samples.

void CombineReservoirs(int chosenValueA, float weightSumA, int chosenValueB, float weightSumB, int& newChosenValue, float& newWeightSum)
{
  newWeightSum = weightSumA + weightSumB;
  float chanceA = weightSumA  / newWeightSum;
  if (GetRandomFloat01() < chanceA)
    newChosenValue = chosenValueA;
  else
    newChosenValue = chosenValueB;
}

For uniform sampling in the above, weightSum is the number of items seen. It’s as if every item had a weight of 1.

Choosing Multiple Items

If you want to choose N items instead of just 1, you could run the single item selection code N times in parallel to choose those N items (Note: you could have the same item chosen multiple times). If you do that, you’ll notice that the weight sum / count will be the same for each of them, so you don’t actually need to store that per each and it can be shared. That will give you something like this for uniform reservoir sampling.

int index = 0;
int chosenValues[N];

while(MoreValuesAvailable())
{
  index++;
  int nextValue = GetNextValue();
  float chance = 1.0f / float(index);
  for (int i = 0; i < N; ++i)
  {
    if (GetRandomFloat01() < chance)
      chosenValue[i] = nextValue;
  }
}

4 comments

    • If you are picking a single item from a stream, it will pick a stream item with uniform (equal) probability. If you have an index or value in there multiple times, it has the effect of giving that index or value a higher probability of being selected. It’s like giving it a higher weight. So, if an item is in there twice, while the rest of the items are in there once, it will be twice as likely to be selected.
      If you are talking about picking several items from the same stream, it is possible to get the same item or value multiple times, even if it only appears in the stream once. This type of random selection is called “selection with replacement” which means if you were drawing items from a hat randomly, you’d put it back in the hat after selecting it. If you don’t want that to happen search for “selection without replacement”.
      Hope this helps!

      Like

      • Thank you for the reply!

        Yes, I was referring to the first case. I found the paragraph in the paper that mentions this. The authors employed weighted reservoir sampling with replacement, but didn’t elaborate much on it:

        Reservoir sampling algorithms are classified based on whether element xi may appear multiple times in the output set, i.e. if samples are chosen with or without replacement. Literature has mostly focused on sampling without replacement, as it is a fundamentally more difficult problem. Fortunately, we want independent selections xi for Monte Carlo integration, so we only consider weighted reservoir sampling with replacement below.

        https://research.nvidia.com/sites/default/files/pubs/2020-07_Spatiotemporal-reservoir-resampling/ReSTIR.pdf

        It’s amazing how reservoir sampling eliminates the need for storing past samples while also avoiding the cost of sampling, which, at best, still requires O(logN) (using a Fenwick tree).

        Thank you for the explanatory post about reservoir sampling. Much appreciated!

        Like


Leave a comment