Getting Ideal Disk Reads on the Cheap

If you are making a game, chances are you are loading assets and/or data from disk, and that you would like to load it in as fast as possible, so that players can play your game instead of staring at loading screens.

To make the fastest load possible, you want as few disk reads as you can get away with, you want them to be as large reads as possible, you want to read in the same order as the data is on disk, and if possible, you want to read from a single file that is kept open the whole time.

There are many reasons for why the above makes for ideal disk reading, but some of them are…

  • Every time you do a disk read, if the disk is in use by another process, you have to wait for that process to be done with the disk before you can start your read. The fewer disk reads you do, the fewer times you possibly have to wait on something else to relinquish the disk. Doing as large reads as possible makes for fewer disk reads too.
  • You want to read the data from disk in the same order that it’s stored on disk, because for some drive types (such as CDROM, DVD and hard disk drives) there is a physical “read head” that the drive has to move around to get to the data. The more that drive head has to move around, the longer you spend waiting for it to move, versus getting the data you want from the disk drive. Some drives, such as SSDs, have a zero seek time, but reading sequentially can also help out due to more efficient use of disk buffers.
  • You want to read the data from a single file when possible, versus having multiple files, because opening a file is expensive, and doing multiple reads from multiple files has a lot more erratic performance than doing multiple reads from a single file. Also, due to the fact that we usually have no way of knowing how files are actually laid out on disk, putting all the data into a single file is the only way to be sure that you can read data in order to minimize read head movements between reads (when the file is not fragmented on disk).

From Non Ideal to Ideal Disk Reads

So, how do you go from having a bunch of small reads from many different files, to fewer (or just 1) read from a single file?

Professionally made video games (especially console games) often will come up with a “packing process” to put all the files together into a single bundle, and possibly compress or encrypt it. Also, they’ll use this “pack time” opportunity to pre-calculate whatever they can that might be expensive at load time. For instance creating a Navigation Mesh or baking Lightmaps.

That is pretty cool, but is quite a bit of work. What if you don’t have the opportunity or willingness to make such a thing, or you want to get a sense of how much something like that would improve things?

Well, Paul, a buddy of mine, told me about a neat technique for doing this quickly and easily that after having heard it, I see references to ALL THE TIME now. It’s bizarre.

Basically, what you do is give your file read functions a special mode of operation where whenever they read data from disk, they also append that data to a special file.

After the load process is complete, you will then have a file that contains all the data your game wanted to read from disk during loading, in the order that it asked for it.

Next, you give your read and seek functions a special mode of operation where read will read from that special file, and seek will do nothing. You can open that file at the beginning of the load operation and close it at the end.

Following that, you are now a lot closer to ideal… you are reading from a single file, and you are reading sequentially.

For the rest of it (reading as few times as possible / doing as large reads as you can), if you have the RAM free, instead of having your disk read function read from the file, you could actually just load the whole file into RAM at once when loading starts (with a single disk read), and have your read function just serve data from memory. When loading is finished, you can free that memory from RAM, OR if you want to make subsequent loads faster (like if your game can have multiple plays in one session), you can keep it in memory so that the subsequent loads don’t even have to touch the disk drive.

Of course this assumes you have a deterministic read order, or that you can make it deterministic, etc etc, but it’s a pretty useful tool for the toolbox IMO.

Windows Does This Too!

Ok so as it turns out, the windows file cache (SuperFetch) uses a variation of this technique as well. Check it out!

From SuperFetch: How it Works & Myths

Let’s focus on decreasing boot times first. During the Windows boot process, the same files need to be accessed at different times. SuperFetch records which data and files need to be accessed at which times, and stores this data in a trace file. During subsequent boots, this information is used to make the loading of said data/files more efficient, resulting in shorter boot times.

SuperFetch performs more tasks to make the boot process more efficient. It also interacts with the defragmenter to make sure that the files accessed during the boot process are stored on the disk in the order they are accessed in. It performs this as a routine task every three days; the specific file layout is stored in /Windows/Prefetch/Layout.ini.