flipcode - 8-bit Rendering With Translucency Maps

8-bit Rendering With Translucency Maps
by (21 August 1999)

Introduction

Recently it came to be that I was hired for contract work on a port of a Playstation game to the PC (which shall remain nameless). At the time, it sounded like a fairly simple task. Todays computers are, after all, much faster than the Playstation, right? Yeah, well, this is true for any computer bought this year, but due to some bizarre twist of fate that wasn't to be our target platform. It seems the company I'm working for is a huge fan of Deer Hunter--not because they have ever *played* the game, but because of its tremendous success. If you look on the back of the box, I believe you'll see the minimum requirements for that game: P133, 16MB ram. Note, this doesn't include a 3d accelerator, because, well, how many people buying games at Walmart would have one? So, of course, if Deer Hunter has those requirements, so must the Playstation port as well. Perfectly sound reasoning, right? Well, I guess it is from a management point of view.

Well, also due to a sore lack of reasoning on my part I took the job (along with a few friends). < STUPIDITY >I mean, a P133 has got to be faster than the Playstation, right? And D3D software mode can't be all that slow, right? < /STUPIDITY > As you may have guessed, the only option was to implement a software rasterizer of our own. For speed concerns we decided to use 8-bit paletted mode for the renderer. We need every little ounce of speed we can get, and none of us are by any means another Carmack or Abrash. This raised another issue, however. The Playstation supports both translucency and Gouraud shading, which we'd need to have in our port as well. How does one do these things in 8-bit paletted mode? Well, that's the topic of this article. Herein details the method we used to implement our rasterizer, as well as its limitations and possible alternatives.

First, you may ask why you'd ever need to implement a software rasterizer, and an 8-bit one, no less. With the exception of my present situation, I can't really think of a definite answer to that question. God forbid any of you are ever in my present situation (as it turns out, the software rasterizer is the least of our problems... all I'll say is that spaghetti C and PSX assembly commented in Japanese is not a pretty sight). However, the technique I'll describe here is very effective in creating unique effects both in 3d and 2d, allowing for things that simply cannot be done in real-time non-paletted modes. Also, it never hurts to have lower minimum requirements, as long as the visual quality stays relatively the same (which is quite possible, if implemented correctly). A lot of the methods covered in this article you may already know, but perhaps haven't really found any decent use for up until now. Other things (hopefully some things) you may not have even thought of as yet, but will find useful.

Let's start with the basic problem: we want to create an 8-bit game, but not have it look like it was made back in the 80's (bright, distinct, ugly colors). That's what we'd get if we went with a standard 332 palette for our game (3 bits for red, 3 for green, and 2 for blue). A 332 palette has 256 colors (duh), but most of those colors are spread out into a range of colors that just aren't very useful and/or don't fit together well (for instance there are 7 shades of pure red, but only 2 shades of gray [not counting black and white]). This palette does have one primary advantage, though: all of the blend operations that can be performed in 16+ bit modes can be performed equally well in this mode (with notable gradient loss, however). This is definitely the easiest method of implementing translucency and shading in an 8-bit game, but the detail loss is prohibitively large. I would not suggest ever using this method but for the most simple of games. Anyways, these days it is all about visual quality, and users just won't stand for primitive graphics, even if it does mean reduced hardware requirements.

The only alternative is to use a palette optimized for the graphics in your game. Creating an optimized palette is a fairly simple task and is handled by most graphics manipulation packages (Photoshop, Debabelizer, etc.). I believe there is also quite a bit of shareware that will allow you to do this (correct me if I am wrong). With an optimized palette, the game will suddenly look a whole lot better. Even if the textures vary drastically in colors from one another, an optimized palette will allow for much better quality than any form of fixed palette (such as 332). But then, how do we implement translucency and shading using an optimized palette? Unlike 332 mode, there is no quick mathematical trick to determine what the blend of two colors will be, so there is no efficient way to do translucency and shading. Of course, what I just said is not completely true, otherwise I wouldn't be writing this article.

There are actually two ways (that I know about) of blending two colors together from an arbitrary optimized palette in an efficient manner (ie. without searching through the entire palette for the closest blend match during every color blend). One is to use an ordered optimized palette, and the other is to use a 2d blend look-up table. The former entails creating the optimized palette, then sorting the entries in such a manner that allows two color indices to be combined via some mathematical operation between the two (preferably a simple one, like an add or an or), the result of which will be another palette index. This method is, however, quite inaccurate with most palettes (although completely accurate with some palettes), and is inflexible. Since I decided not to use this technique, I'll not be covering it here. The latter involves precomputing a look-up table for all of the possible blends between two color indices. This method has the benefit that it is 100% flexible (*all* possible blends between two palette indices can be described by the table), and that it is fairly fast. It is not perfect, however, since it requires 256x256 (64K) entries for each blend type, which has the dual effect of being hard on memory conservation and cache efficiency. Both of these problems will be discussed as well.

The easy part -- implementation. For every polygon or sprite we wish to draw translucently, there is a translucency map associated with it. To keep things simple, the translucency map is allocated as single linear 64K byte array. This linear array represents a 2d 256x256 array, so in order to index an element at row r and column c we simply need to shift the most significant index left by 8, like so: A[(r<<8)+c] (this is assuming that the array is stored such that the first row is 0-255, second row is 256-511, etc.). This is an useful operation, so lets just make a macro for it:


#define INDEX_TRANSMAP(T, r, c)		(T)[((r)<<8)+(c)]

From this point on, all a translucency blend requires is the following: *destinationPtr = INDEX_TRANSMAP(transMap, *sourcePtr, *destinationPtr); If, for example, the translucency map transMap describes an average blend, the output value of INDEX_TRANSMAP will be the closest match in the palette to that blend. Although this does not guarantee the optimal output, with most palettes it will be pretty damn close. The only thing left is generating the translucency map. This too, is a fairly simple task, but should not be done at run-time, since generation can take quite a while (it's O(n^3), where n is likely 256, the size of the palette). Basically, given a source palette and destination palette (they need not be the same) for every source/destination index pair, the blend is computed and the destination palette is searched for the closest match to the output. The only thing that needs to be specified is how each pair is to be blended. Thankfully, I already have a decent utility for creating translucency maps that should be included at the end of this article. It is capable of setting opacity percentages on a per-color basis, defining custom blend equations, and a preview window of the resulting blend. Below I've listed some of the pros and cons of using a translucency map.

Pros

Any blend method possible within the palette can be represented with a blend look-up table.

The calculation time is constant regardless of what kind of blend is performed. In non-paletted blends you are limited to blending operations that can be performed in real-time, like adds, ors, multiplies, etc. Anything more complex, such as divides and non-linear operations would usually be considered too expensive. With a translucency map, no such constraint exists.

Source and destination color keying is unnecessary since it can be encoded directly into the translucency map. This eliminates one test and jump from the innermost loop of the drawing routine. Depending on how tight the loop is, this can make a big difference. For instance, in my own blitting code, translucent blits were 30% to 50% faster than plain (non-translucent) color-keyed blits. Of course, results may vary...

Simulated alpha channel. If the sprites/textures maintain a sort of color/translucency value consistency, a translucency map can be created that allows certain portions of the sprite/texture to be more or less opaque than other portions. For example, say we had a bitmap of flames rendered against a black background. We could apply a translucency map to it that forced the higher-intensity areas of the bitmap to be more opaque than the less intense areas. Anything completely black would be transparent, while the flames would solidify near the brighter center portions. Granted, this is not really alpha channel, but in many cases it looks very nice.

Light maps. Grayscale bitmaps can be used with a translucency map that uses the source pixel's intensity as an intensity modifier for the destination pixel (either a lighten or a darken amount).

Gouraud shading for RGB or grayscale vertex colors. Instead of using INDEX_TRANSMAP(transMap, *sourcePtr, *destinationPtr) for the blend method, you can use INDEX_TRANSMAP(transMap, *sourcePtr, interpolatedColor), where interpolatedColor is the color interpolated from the vertex colors during Gouraud shading. The vertex colors can be represented by any fixed RGB palette like 332 or by grayscale, and interpolated just like vertex colors in non-paletted modes. RGB vertex colors sacrifice granularity for color (which isn't so bad since it's used on top of a texture), while grayscale sacrifices colored light for smoother gradients (which isn't so bad either, since colored gouraud shading doesn't look that great in any mode). The vertex colors can also be used with different shading blend maps to allow for different effects (such as sharper lights or fades from one color to a different color).

Fog. The depth value of a pixel can be used as an index into a translucency map that fades to the fog color.

Anti-aliased fonts. Take white fonts anti-aliased against a black background, then use a translucency map that sets output opacity based on input intensity. Voila, anti-aliased fonts. Different translucency maps can be used to change the color and general look of the font.

Cons

Translucency map size. Each translucency map is 64K... That means that having just 16 of them would take up a meg of memory. Eh, who cares? I'd be hard pressed to need a whole 16 separate translucency maps, and even if I did, that's only a small portion of what my game would take up. Memory gets cheaper all the time.

Cache coherency. This is a tricky one. If you're blasting the cache already, there really isn't much need to worry about the translucency map blasting it as well. If you aren't then this could be a problem, but still may not be. I guess you'd really have to performance analyze it to be sure. At any rate, on my old P166 this didn't seem to be a problem.

Discrete translucency levels. Each translucency blend type requires a separate translucency map, and you can only have a finite number of them. I'd say this would be a problem if you were running in true color, where discrete levels of color in general aren't too noticable. Since this is in paletted mode, though, finer levels of translucency don't really help very much, since there are only so many colors in the palette to go around.

Destination reads. Since a destination read is required for every pixel, on-hardware surfaces are a no no. Video memory reads are prohibitively slow, so all of the images/textures, including the backbuffer, must be in system memory. Of course, this is generally the case for any sort of non-hardware supported blends, since destination surface reads are required regardless of display mode.

Closing

Ok, now I'm getting tired of writing this... so I hope I've gotten most of the relevant details across. If I haven't, then feel free to email me at jd_wild@uclink4.berkeley.edu with any questions. I doubt many will use this information since everybody seems to concentrate mostly on 3d hardware these days, but just on the off-chance that someone out there is still trying to make a good ole 2d game (or even a 3d game with modest hardware requirements), I thought I'd share this.

Download article_transmaps.zip (34k)

Addendum

Well, it has been brought to my attention that a little more detailed description of the translucency map implementation might be helpful. This is my own fault, I should have realized this before. I'm sorry if I over-simplified the problem, but I'll attempt to remedy that here.

First, I would really suggest downloading article_transmaps.zip, as experimenting with different translucency maps can give you an idea of what can be done with them. If you're interested in generating translucency maps yourself (without this tool), then you are pretty much on your own, because I will not be covering the topic here any further. However, if you'd really like to check out how it is done for yourself, just email me with a request for the utility source code -- I'll be glad to send it your way.

As for actually using translucency maps, there are a number of ways that it can be done. The basic mechanism is quite simple. Whenever performing any blitting or rasterizing operation, the main purpose of the routine you're working with is to write bits (presumably the correct ones) to the destination surface. In a standard blit, this would merely require copying lengths of scanlines of a bitmap to a destination area of the same size, like so (assume that all surfaces being used are 8-bit):


for (int count = 0; count < srcHeight; count++)
{
	memcpy(dstPtr, srcPtr, srcWidth);
	srcPtr += srcPitch;
	dstPtr += dstPitch;
}

This loop iterates through the entire height of the source image, each time copying the line of pixels to the destination with memcpy. Note: pitch refers to the actual byte-width of the surfaces being used, which need not be the same as the blit width (and usually won't be). But this function is only useful for straight, unaltered blits from one surface to another. It is thus not very helpful to us. If we wish to alter the bitmap during blit on a per-pixel level, as we would need to for color-keyed blits, we'll need to handle each pixel individually, like so:


unsigned char* finalSrcPtr = srcPtr + (srcHeight * srcPitch);
unsigned char* endLineSrcPtr = srcPtr + srcWidth - srcPitch;
int srcIncrement = srcPitch - srcWidth;
int dstIncrement = dstPitch - srcWidth;
for (; srcPtr < finalSrcPtr; srcPtr += srcIncrement, dstPtr += dstIncrement)
{
	endLineSrcPtr += srcPitch;
	for (; srcPtr < endLineSrcPtr; srcPtr++, dstPtr++)
	{
		if (*srcPtr == colorKey) continue;  // Skip write if current pixel is of color colorKey
		*dstPtr = *srcPtr;
	}
}

The code above is more complicated than it needs to be due to it being somewhat optimized. The basic premise is the same as the previous though, except that instead of using memcpy to copy an enter row all at once, we iterate through each byte (pixel) in the row one at a time. This is slower than the previous method, since pentium string instructions cannot be used to copy the entire line at once, but we have to do this if we wish to analyze the surface on a per-pixel basis. For color-keying we do just that. Before it is written, each pixel is first checked to see if it is the same as the color key. If it is, we skip it.

Now to the point. If we wish to do translucency map blends all we need change is: one, remove the color-key check because it will now be redundant, and two, replace the copy "*dstPtr = *srcPtr;" with "*dstPtr = INDEX_TRANSMAP(transMap, *srcPtr, *dstPtr);". The resulting code is:


#define INDEX_TRANSMAP(T, r, c) (T)[((r)<<8)+(c)]

unsigned char* finalSrcPtr = srcPtr + (srcHeight * srcPitch);
unsigned char* endLineSrcPtr = srcPtr + srcWidth - srcPitch;
int srcIncrement = srcPitch - srcWidth;
int dstIncrement = dstPitch - srcWidth;
for (; srcPtr < finalSrcPtr; srcPtr += srcIncrement, dstPtr += dstIncrement)
{
	endLineSrcPtr += srcPitch;
	for (; srcPtr < endLineSrcPtr; srcPtr++, dstPtr++)
	{
		*dstPtr = INDEX_TRANSMAP(transMap, *srcPtr, *dstPtr);
	}
}

Thus, each destination pixel will now be set to the blend (as defined by transMap) of the source pixel with the destination pixel. Depending on what you've defined transMap to be, this single blit function might be used for average blits, color-keyed blits, additive blits, whatever. The blit function now requires a couple extra reads (one from the destination surface, and another from the translucency map), a shift and two adds. This speed loss is mostly (if not completely) offset by the speed gained from removing the conditional (which, in tight loops, can tend to muck up the pipeline). And if you're still worried about speed, you can unroll the loop to make it a bit faster still (which from my tests gains about 15% to 20% in performance).

Now, how do I apply this to my 3d game? Simple. In the polygon rasterizer there should be a line very similar to the copy in the color-keyed blit, "*dstPtr = *srcPtr;". To get the same effects available with translucent sprite blits, all we need to do is replace that line the same way we replaced the that line in the blit routine. dstPtr will remain the same (it is just a pointer to a byte in the destination surface), but now srcPtr may refer to a location in texture memory (or, in the case of non-texture polygons, can be merely replaced by a color). Now each pixel drawn for the polygon will be blended with the background as defined by the translucency map being used.

For effects other than translucency, we just alter the parameters being blended. For example, if we want to implement gouraud shading, we'd use INDEX_TRANSMAP(transMap, *srcPtr, interpolatedColor), where interpolatedColor is the color (either grayscale or RGB, scaled to the range 0 to 255) interpolated from the vertex colors during Gouraud shading. You may have noticed, however, that this method has one limitation: you cannot have a translucent Gouraud shaded polygon. You can have one or the other, but not both. But, if we're willing to sacrifice a little more speed, it is quite possible to accumulate translucency map blends, like so: "INDEX_TRANSMAP(transMap1, INDEX_TRANSMAP(transMap2, srcColor, blendColor), dstColor);". This has the affect of first blending srcColor with blendColor (which could be the Gouraud texture/interpolated color blend), then feeding that output to another blend with dstColor (which could be the translucency blend). Thus, for a little speed loss, we can do two blends at once. Of course, there is no reason why more than two couldn't be done at a time.

I hope this helps explain a little more clearly the ideas I brought up in my article. The other techniques I mentioned (such as anti-aliased fonts, lightmaps, and simulated alpha) are all implemented with the standard source/destination blend detailed. Of course, there are many other things that can be done with translucency maps, so I encourage those who are interested to experiment. As before, if you have any questions or comments, email me: jd_wild@uclink4.berkeley.edu.