# Fast & Lightweight Random “Shuffle” Functionality – FIXED!

In this post I’m going to show a way to make an iterator that will visit items in a list in a random order, only visit each item once, and tell you when it’s visited all items and is finished. It does this without storing a shuffled list, and it also doesn’t have to keep track of which items it has already visited.

This means you could have a list that is a million items long and no matter how many you have visited, it will only take two uint32s to store the current state of the iterator, and it will be very fast to get the next item in the list (somewhere around 1 to 4 times the cost of calling the rand() function!).

This is a follow up post to an older post called Fast & Lightweight Random “Shuffle” Functionality.

In that older post, things didn’t work quite like I expected them to so it was back to the drawing board for a little while to do some research and experimentation to find a better solution. I’m not going to go back over the basics I talked about in that article so go back and have a read there first if anything is confusing.

## High Level

In the last post on this topic we talked about how the high level goal was to map the index to a random number, and because we were randomizing the bits in a deterministic (and non destructive) way, we needed to iterate over the whole “next power of 2” items and reject any that were too large. Only doing this could we be sure that we visited every index. The problem I hit last time though was that I could not get the numbers to look random enough.

To solve this, i decided what i needed to do was ENCRYPT the index with a block cipher. When you encrypt data, it should come out looking like random data, even though the data you put in may be sequential or have other easily seen patterns. What else is great, is that when using a block cipher, each unique input should equate to a unique output which means that if we encrypt the full power of 2 range as input, we will get the full power of 2 range out as output, but just in a different order.

Once I realized this was a good solution, my next problem to tackle was that I knew of no block algorithms that would work for a variable number of bits. There are block cipher algorithms that will work for LARGE number of bits, but there is no algorithm I knew of where you can tell it “hey, i only want to use 4 bits” or “i only want to use 5 bits”.

In the end the answer I found was to roll my own, but use existing, well established technology to do so. In my specific case, I’m also aiming for high speed functions since I want this functionality used in real time gaming applications.

What I came up with in the end is not cryptographically secure, but using the same techniques I have laid out, you should be able to drop in a different block cipher algorithm if that is a requirement.

## Feistel Network

As it turns out, there is a basic primitive of cryptography called a Feistel Network. It’s used by quite a number of modern ciphers and hash functions, and it is surprisingly simple.

For a balanced Feistel Network, you split the data into a left and a right side, and do the following, which consists of a single round (you can do as many rounds as you like):

```Left[i+1]  = Right[i];
Right[i+1] = Left[i] ^ RoundFunction(Right[i], key);
```

After performing however many rounds you wish, you combine the left and the right sides again to get your encrypted data.

To unencrypt, the feistel network works much the same but only in reverse, looking like the below:

```Right[i] = Left[i+1];
Left[i] = Right[i+1] ^ RoundFunction(Left[i+1], key);
```

The neat thing about Feistel Networks is that the round function can be any deterministic function that performs whatever operations it wants – even destructive and irreversible operations such as division or bit shifts. Even so, the feistel network as a whole is reversible, no matter what you do in the round function, and you can unencrypt to get your origional data back.

This threw me for quite a loop and I couldn’t get my head around why this worked for a long while until I found a webpage that explained it pretty well. unfortunately I lost the link and cannot find it again but the basic idea is this… For each round of encryption, the right side is encrypted using the key and the left side. This means that at any point, no matter how many rounds you’ve done on your data, the right side should be able to be decrypted using the key and the left side. If you have the key, and you know how many rounds were used in encryption, you have all the data you need to decrypt it again. Hopefully that makes sense… I had to work it out on paper a little bit to see it fully.

The other great thing about Feistel Networks is that you can make them be however many bits you want. So, if i want each side of the Feistel Network to be 1 bit, I can do that. Or, if i want each side to be 128 bits, I can do that too!

You can also tune the quality / performance a bit by doing less or more rounds.

BTW the Tiny Encryption Algorithm uses a Feistel Network if you want to see a simple example in action.

With the “variable bit size support” problem solved, next I needed to come up with a round function that did a good job of taking sequential numbers as input and spitting out seemingly random numbers as output. Thanks to what I was saying before, the round function doesn’t need to be reversible, so there are a lot of options available.

I ended up deciding to go with a hash function, specifically Murmur Hash 2 (which I actually also used in my last post if you’d like to see some more information on it! The Incredible Time Traveling Random Number Generator).

Since the hash spits out numbers that might be anything in the range of an unsigned int, but I only want N bits, I just AND the hash against a mask to get the number of bits I want. There’s probably a higher quality method of smashing down the bits using XOR or something, but my quality needs aren’t very high so I just opted to AND it.

A downside of going with the balanced Feistel Network approach is that before this, I only had to round up to the next power of 2, but now, since each half of the data needs to be a power of 2, I actually have to make sure I have an even number of bits and have to round up to the next power of 4. This means that when it’s looking for valid indices to return in the shuffle, it may have to calculate up to 4 different indices on average before it finds a valid one. Not the greatest thing in the world, but also not the worst and definitely not a deal breaker in my eyes.

## The Code

The code can be found on github at source.cpp

Use it in good health (:

Here are some example runs of the program as well.

Note: since writing this blog post, I’ve found out that my setup isn’t all that random and you can do better by changing the round function and/or number of rounds. If you don’t get the results you were hoping with using my code, the technique is solid, even if my implementation isn’t the best.

1. By the way, another neat thing this shuffler can do is you can do a reverse query. What i mean is you can say “I’m shuffling 10,000 things, and my seed is 16453. What position in the shuffle will number 34 come in?”. You would just “decrypt” 34 and what comes out will be the index in the shuffle that it shows up in.

The biggest usage case i can see for this would be being able to query if you have already encountered a specific number in a shuffle yet or not. You could say “what position in the shuffle does 34 show up in. Is that number < our current position in the shuffle? if so, we have encountered it".

I think there are other usage cases too if you get creative (:

Like

2. m4xxed says:|

Hey man,

I really like the idea of a fast & lightweight shuffle functionality and how you managed to achieve it!
I just struggle on some lines of your code:

1,2,3: what did you #include?
56: is the “” intentional or a typo?
117 & 123: where did you define the methods “NextNumber()” and “LastNumber()”? What do they do?
127: where did you define the type “index” that you are referencing here?

I’d be really happy if you could clear that out for me!

Like

3. m4xxed says:|

I expected something like that. Thank you for the full code, it makes sense to me now.

I am using this functionality for a simulation of a coral reef where we need to update all entities in the reef in a random order. It will come in quite handy if it is really fast enough.

By the way I liked the seed you used in your main() for the MurmurHash2()-methods :D.
Will definitely use the same!

Liked by 1 person

• Very cool idea! Let me know how it goes!

Fair warning, I think the randomness quality could be improved, while also improving the speed, by using a different round function.

I don’t know what the magic sauce is specifically, but a colleague tried using this for something and the randomness wasn’t enough for his liking. It’s possible to make even cryptographically secure random shuffles with this, so it isn’t the technique, just my specific round function and number of rounds.

Like

4. Hi Alan, thanks for the article 🙂

I was wondering if it would be trivial to extend the system to return range instead of single value?
For instance, if I have a sky filled with stars and I would like to update “N” stars per frame. Calling the shuffler N times would work but the values might be scattered around (cache misses).

I am guessing making the hash function return values in multiples of N might work, just wondering if there’s a better way to achieve what I am looking for :p

Regards!

Like

• The best I can think of is if you knew this N in advance, is making an array that had N stars per entry, then using this to select which index of N stars to update.
There is locality sensitive hashing which maybe could be used here instead somehow but I can’t think of how.
One thought is that if you are doing this once a frame, you would actually want the stars that are updated to be scattered around for visual reasons, so that it wasn’t so obvious that a small section was being updated at a time. In that case, you’d probably want some kind of low discrepancy sequence to choose which N to update so they were more spread out than random white noise would give you.
An interesting problem you have!
I’ll keep thinking about it. If you come up with something it’d be cool to hear what you did!

Like

5. Amit Mehar says:|