flipcode - Thoughts On Visibility Determination

Thoughts On Visibility Determination - In Dynamic, Semi-Static, and Static Scenes
by (16 June 1999)

Introduction

A few months ago, I released a document about visibility determination for arbitrary scenes, using an octree for spatial subdivision and a c-buffer for occlusion culling. The idea was to find unprocessed octree nodes using the gaps in the c-buffer, by casting rays through these open spaces. Clipping against octree node boundaries wasn't neccessary, so I thought.

I got feedback, from Paul Nettle, Edward Kmett, Conor Stokes and Jaap Suter. The algorithm doesn't work. Since then, I discussed many techniques with many people, read many docs, and generally got wiser. :)

And now the time has come to correct the previous document. In this new document I describe a technique for visibility determination that should work. I emphasize that none of the described techniques are new, possibly even the combination of techniques I present here is not new. Neither can I guarantee that it will work, although I'm pretty sure it's right this time. Finally, I do not guarantee that this is the perfect solution (I present two solutions, by the way, each with it's own weaknesses), so, in short, this document is a discussion of a potentially interesting algorithm, and it's purpose is to give the discussion about visibility algorithms an impulse.

Spatial Subdivision

The two algorithms that I present here both use the octree (or kd-tree) for spatial subdivision. A short description of both subdivision algorithms:

Octree

First, the bounding box of the entire scene is determined. This box is enlarged so that it becomes a perfect cube (some octree implementations do not do this). This box becomes the root node. If a node (including the root node) contains more than a specified number of polygons (e.g., 30-50) the node is split in 8 equally sized child nodes. Now the node doesn't contain polygons anymore, but only 8 child nodes. The polygon set is distributed over the child nodes, and the child nodes are subdivided again if they still contain more than the specified amount of polygons. Besides using a fixed number of polygons as a termination criterium for node subdivision, other criterea can be used as well: For example a minimum node size, or a maximum recursion depth.

Kd-Tree

The kd-tree looks very much like the octree data structure. This time however, the root node (which doesn't have to be a perfect cube anymore) is not split in 8 childnodes, but it is split in half, using alternating x/y/z-axis aligned splitting planes. The splitting planes do not neccessarily split a node in half, but rather in two halves containing about the same amount of polygons. >From here on, I shall consider the octree and kd-tree as two almost identical structures, although I consider the kd-tree to be a better spatial subdivision. The two techniques have so much in common, that choosing one is really not important for the two algorithms.

Rendering The Tree

A kd-tree (or an octree, kd-tree assumed from here on) can be traversed front-to-back relatively easily. Traversing is performed using a recursive function that determines at which side of a splitting plane the camera is located; these splitting planes are generated when the octree or kd-tree is compiled - a kd-tree contains a single splitting plane per node, the octree contains three splitting planes.

For octree traversal, check each of the three planes in turn to find the correct order to process the child nodes. The 'camera side' of each plane is processed first, then the opposite side. This way, we quickly find the kd-tree node that contains the camera. This is obviously the closest node, so we process this node first. This node also has no child nodes, so the recursive function returns, and the opposite side of the last encountered splitting plane is processed. This way we find the next closest node. Repeating this process, we visit every node, in the correct order. Note that to traverse the trees in back-to-front order, we simply pick the side of the plane that the camera is NOT in: Now the node with the camera will be the last node that is visited. This might be usefull in some cases (but not in the two algorithms that I discuss below).

Also note that neither the kd-tree nor the octree provide us with any kind of visibility information; they only provide us with the nodes sorted by depth. This even doesn't mean that the polygons are sorted: The polygons in the nodes still need some processing if we want them perfectly sorted.

Algorithm I

Using the octree or kd-tree as the basis of the algorithm, we can now start performing visibility tests to terminate the traversing of the tree when a processed node would not contribute to the image on screen (i.e., it's completely obscured by previously drawn polygons in previously processed nodes). By discarding nodes we can skip large portions of the scene - a discarded node may have many child nodes. This algorithm assumes no clipping of polygons to node boundaries. Since we can now not guarantee correct sorting of the polygons, we need to fix this somehow. In hardware, we could simply use the z-buffer. For software, Paul Nettle's s-buffer is a better choice. The s-buffer allows random- order insertion of polygons. Inserted polygon spans are clipped against existing spans, so that the s-buffer always contains only visible portions of the inserted spans. In case you are not familiar with the s-buffer, I suggest you read Paul's document.

The good thing of the s-buffer is that we can also use it to test whether a polygon would change the s-buffer if we inserted it. This means that we now have a basic visibility test for the polygons: A polygon that changes the s-buffer is visible, all other polygons are invisible. Instead of testing polygons against the s-buffer, we can also test node boundaries against the s-buffer: Simply insert the imaginary octree node faces as if they where polygons, and see if the s-buffer would change. Now we do have a good termination criterium for octree traversal: An invisible node can't have visible child nodes or polygons. Note that we cannot use the hardware z-buffer for this: There is no (fast) way to test whether a polygon would change the z-buffer. So, for hardware rendering, simply maintain a software s-buffer too. This sounds expensive, but for hardware rendering, we do not have to perform span rasterization, so the time needed to insert polygons in the s-buffer is not the same as performing software rendering.

Notes On Algorithm I

This algorithm has some interesting characteristics. First, the octree (or kd-tree) is very easy to construct, since no splitting occurs. In fact, building the tree can be done while loading the scene from disk. It's also very easy to rebuild parts of the map at runtime: Polygons can thus be moved freely around. This means that there is no need to have a distinction between moving objects (actors) and static scenery, and this implies that moving objects will occlude geometry (nodes) the same way as the static scenery does. The s-buffer is expensive, especially for complex scenes, since it gets slower when more spans are already in the lists. But the algorithm does support very 'nasty' scenes, like imported 3DSMax files with very small polygons, intersecting polygons, tiny gaps and so on. Everything will be sorted out by the s-buffer. And, a hole will only cause further traversal of octree nodes when it is at least a single pixel on-screen: That means that the occlusion is pixel-based. For very large, truely dynamic scenes with relative low local complexity, this seems to be a good algorithm. Note that it is also very simple to implement. The s-buffer insertion routine is probably the hardest part. Finally note that this algorithm will perform relatively better in software than in hardware, since the overhead for hardware rasterization is rather big.

Algorithm II

There exists an alternative for the s-buffer. This alternative is called the 'c-buffer', which stands for 'coverage buffer'. It has several advantages over the s-buffer, but is also limited in some ways: The c-buffer represents the covered pixels on the screen. Inserted spans may not be (partially) in front of existing spans, since no depth information is stored in the c-buffer. The c-buffer assumes that inserted spans are already drawn on the screen, so no texture info is stored. The only thing that the c-buffer does is clipping newly inserted spans against existing spans, because spans are considered to be inserted front-to-back. The c-buffer is therefore much faster: It's insertion routine performs simpler clipping, and stores less, and inserted spans are never removed because of newly inserted spans. Since the c-buffer does not have any information about the contents of the spans except for which pixels are covered, spans can be merged when new spans touch existing spans. That means that a scene that fills the entire screen will result in a c-buffer with a single span per screen line. The c-buffer performs thus much better with scenes that have many polygons, especially when these polygons are inserted with some spatial coherence. A good thing is that we have some spatial coherence: Polygons are inserted per octree node, and it's therefore unlikely that they are far apart.

There is one big problem with the c-buffer: It requires the spans to be inserted in the perfect front-to-back order. This means that we have to split polygons against octree node boundaries, and that the polygons inside the node need to be sorted perfectly. Painters is not enough here: We need so-called "mini-bsp's" for each node. Now the scene compilation process becomes much more time consuming: Each inserted polygon is clipped against every node that it is in, and once a node contains 50 polygons or less, these 50 polygons need to be processed using a BSP compiler. The c-buffer can be used to test for (partially) visible nodes, just like the s-buffer. So, now we have a cheaper algorithm for visibility calculations, and we can still use it for 'dirty' scenes, with intersecting polygons, tiny holes and so on, but at a cost: The scene needs quite a bit of preprocessing. It should still be possible to perform this processing while loading the scene.

Notes On Algorithm II

Compiling a random scene to an octree with splitted polygons and perfect sorting is a relatively expensive process. It is obvious that this results in a much less dynamic scene: Recalculating parts of the scene is possible, but it takes some time. If many nodes are affected by the changes, this might take too much time to call the renderer 'real-time'. In practise, I think it is possible to process explosion damage and one-time changes in the geometry, like adding buildings, but it is not very well possible to move large things smoothly through the scene.

It is probably best to move the dynamic parts of the scene out of the basic algorithm, and draw them using z-buffering once the scene is finished. This still allows modifications to the static scenery, like the explosion damage I mentioned. It also allows for real dynamic objects that animate and/or move every frame. These objects cannot occlude nodes however, since they are not considered while traversing the octree. The price for inserting them in the octree is probably higher, though.

Note that dynamic objects can still be occluded by the environment: By placing an actor in the octree nodes that intersect it's bounding box or -sphere, the actor can be discarded if none of the nodes that it is in are marked as visible.

This algorithm should perform much better in terms of rendering speed: Maintaining the c-buffer is considerably cheaper than maintaining an s-buffer, and drawing the actors once the scenery is drawn using the z-buffer is probably cheaper than adding 500 or more actor polygons to the s-buffer every frame (not to mention the changes in the octree).

This algorithm is also better suited for hardware rendering, since the z-buffer in hardware is virtually 'free'. Just rendering actors, and discarding the possible occlusion of nodes caused by their presence is very likely faster than the full occlusion culling that the previous algorithm provides.

Comparison Of Algorithm I & II

Both algorithms are interesting ways to perform visibility calculations for rendering 3D scenes at interactive rates. Neither algorithm 1 nor algorithm 2 return a perfect visibility test: Algorithm 1 does however return a 'perfect set' for a raster image. This perfection comes at a price, though, especially when actors with high polygon counts are inserted into the s-buffer. Note that really dynamic scenes are rare: Not every game requires lots of moving polygons. Most scenes still consist of mainly static polygons, and that's not just because engines can't handle more dynamic scenery. Note that true dynamic scenes make various other often used techniques impossible: Lightmaps generated using radiosity are usually not calculated at runtime. Both algorithms provide great freedom for the input set. The algorithms do not rely on perfectly sealed worlds or precisely aligned volumes, like Quake does. They simply render the input. This can ease the creation of scenes for computer games dramatically.

Extension Of Algorithm II

The final part of this document describes a highly experimental extension of the second algorithm. I must emphasize, even more than at the beginning of this document, that this is not tested, it's just interesting, I hope. Before Chris Babcock (another cool 3D researcher) explained to me that octree boundary planes can be tested against the s- and c-buffers, I tried to cull nodes against a beamtree. This beamtree was to be filled with polygons from the mini-bsp's at the final nodes of the kd-tree. This approach introduces many problems, although it would be resolution independent, which is a very nice feature. Edward Kmett pointed out that especially bad input sets and numerical instability would cause great problems with the beamtree. However, it is still possible to extend the second algorithm with this beamtree, as Conor Stokes explained to me this morning: If we build the beamtree using only the 'large occluders', we can first test the nodes against the beamtree (effectively, see if they are behind one or more of the solid regions in the beamtree), and if this test fails, we can still test the nodes against the s- or c-buffer. Good thing about this approach is that testing a node against a simple beamtree is much cheaper than testing the node faces against the s- or c-buffer. Bad thing is that when the node is visible anyway, we have just wasted extra time. This obviously requires some experiments.

The first thing that I asked Conor Stokes was: "What is a big occluder?" - A small polygon can be a large occluder when it is very close, and a large polygon can be a small occluder when it's far away, or looked at from a big angle. The answer is simple: Count the pixels that it covers in the c-buffer. If it exceeds a fixed constant, add it to the beamtree.

Another Addition: Mixing 1 And 2

Martin Hristov just sent me a note: It is possible to mix algorithm one and two. The c-buffer needs perfectly sorted polygons, and a mini-bsp can do that. However, the mini-bsp makes it very hard to modify a node at run-time. There's however another technique for getting perfect sorted spans to the c-buffer: The s-buffer... That's right, we can use a 'local s-buffer' at the octree node level for drawing just the spans in the octree node, and once this set of spans is completed, it can be sent to the screen and the c-buffer. Now we have a faster s-buffer, because it always contains as much polygons as an octree node contains (rather than all the polygons that are visible at once), and the nodes are much more dynamic. We still need to split polygons against nodes, but that can be done at run-time, much faster than recalculating the whole mini-bsp. Also note that ONLY nodes with moving polygons need to be processed using the local s-buffer technique: All other nodes can still be handled with a mini-bsp. This enhancement makes algorithm 2 better than algorithm 1 in most cases. Only in case a lot of polygons are moving around so that even just splitting is too much, the use of a global s-buffer becomes interesting.

Final Words

The algorithms that I presented here both provide a way to perform visibility calculations for arbitrary input sets with low local complexity. The second algorithm scales well with increasing complexity, but it is not resolution independent, and will therefore be considerably slower at higher resolutions. The first algorithm allows big changes in the scene at runtime. Both algorithms are relatively easy to implement. If you have any thoughts that you wish to share, or if you see any problems that I overlooked, please contact me:

E-mail: phantom@goright.to

Happy coding,
Jacco Bikker - a.k.a. "THE PHANTOM"

Article Series:

Thoughts On Visibility Determination - In Dynamic, Semi-Static, and Static Scenes