Question submitted by (07 July 2001)
|Return to The Archives
|I have the following problem: As in every game project, there are a lot of resources -- images, sound, text etc. Now my problem is how to organize them within the program. Of course I can make an array and then access every element in hard coded form. Of course that works, but it's not general and as soon as some files are changed / deleted etc. one has to change the whole project. I also considered using static identifiers for every image but that's not a good solution either. So I'd like to ask you how this is handled in other game projects in a more general way so that the program itself is more independent from the content of a particular file. Another question that has something to do with the above is that some games (e.g. Quake) use one big file which holds all resources. Is this better for management or is it just because it's uncomfortable to have hundreds of tiny files?
In every project I've ever been involved with, resources are dealt with in a
combination of the following two ways:
First, resources specific to a hard-coded effect (say, a lens flare effect) are directly referenced by the code that handles the effect. This means that if you look in "lensflar.cpp" you might see a call that looks something like image.load("lensflar.img");.
Second, resources are referenced by other resources. So rather than a texture or data file being hard-coded into the program itself, it is referenced by other data. For example, you might use a map file format for your levels that contains references to its own textures (by name.) These textures are specified by the level designer or artist, not the programmer. And since the level designer or artist is responsible for their maps and their textures, it only makes sense that they manage them through the level editor.
This is typically a form of hierarchy. For example, a level might also contain a series of references to files for models that are to be included with the level. Each model file would then contain references to the textures, animation data, etc. required to properly manage that model. Your level data might also contain animation data, trigger information, and even references to files for actors within the level (which, of course, might refer to a data file for each actor, containing AI information, more texture references, etc.)
And as for large WAD-like files, have you considered the ZIP file format? I personally use these myself. Using ZIP files allows me to use existing tools (like WinZip) to create them for me, saving me the time of creating such a tool myself. They also allow for hierarchical directory structures. You can use the ZIP file format for free (thanks to the late great Phil Katz) and you don't even need to worry about compression if you don't want to. Simply fill your ZIPs with non-compressed files. However, I've been told that zlib is a free library and that the decompression is faster than simply loading the raw data from the disk. You can get ZIP format information here.
Are ZIP files (or any other WAD-like file) better than loading from individual files? Yes, they have a few advantages. Primarily, they allow you to avoid the overhead of a call to open each individual file. In my case, when a user asks my ZIP file class to open a file, it simply finds the file in the stored-directory and performs an fseek() to the start of that file. Also, no extra file pointers are allocated, so you don't have to worry about closing them. This translates into fewer chances of a resource leak. The directory lookup can be optimized as well (more on this shortly.)
There are a few gotchas, though. First, you can spend some time writing a really nice implementation that handles all the primary operations (fopen/fread/fwrite/fseek/ftell/blah/blah/blah) so that each sub-file acts just like a real file (including EOF markers), or you can simply choose to use file formats that do not require fseeks. In my case, I chose to do niether. :) Instead, if I access a file (within the ZIP flle) that requires fseeks to parse it out properly, I simply use relative seeks (as soon as I get a file pointer from my ZIP class, I simply call ftell() to find the start of the file.) This saves me from having to write a full-out file system and also saves me from the overhead involved in such a class.
Another down-side to WAD-like files, is that you can eat up a lot of memory just storing the directory of files (stored-directory.) On a past project, we were eating up megabytes (with a capital M and a capital B) with our stored-directory because each file had a 256-char array allocated for it, and thousands of files. This, of course, was a poor implementation and was abrubtly corrected as soon as we realized where all that memory was going. Also, with a few thousand files, scanning the stored-directory to find the requested file can be slow. If I remember correctly, speeding this up with a binary tree cut our loading times by almost in half.
There is another optimization you can do that will speed up your loading times AND reduce memory even further. That is to use a hash function to munge a filename (with full path, if necessary) into a unique 32-bit value.
An example implementation would load the stored-directory from the WAD-like file and munge each of the filenames into a unique 32-bit value. Once the entire stored-directory was loaded (and munged) all of the actual filenames were thrown away and only the munged values were kept. The stored-directory was then sorted. When a caller requested a file, the filename would be munged (using the same hash function) and the resulting 32-bit value would be found within the sorted data using a binary search. A hash table would also work.
This can produce very fast file lookups even in the worst cases. Just be careful that your hash function (used to munge your filenames) is case-insensitive. In my case, I simply convert the input strings to lower-case before I munge them. Also, it's important that each munged value is unique. I'm not an expert on these things, so I just threw something together that I thought would produce "relatively unique" (laugh if you want :) values. Then, when I sort the data, I look for duplicates. If any duplicates are found, the software performs an assert, letting me know that I need to hack together a new hash function, hopefully one that works better. :)
Response provided by Paul Nettle
This article was originally an entry in flipCode's Ask Midnight, a Question and Answer column with Paul Nettle that's no longer active.