Thoughts On Visibility Determination - In Dynamic, Semi-Static, and Static Scenes
by (16 June 1999)
|Return to The Archives
A few months ago, I released a document about visibility
determination for arbitrary scenes, using an octree for
spatial subdivision and a c-buffer for occlusion culling.
The idea was to find unprocessed octree nodes using the
gaps in the c-buffer, by casting rays through these open
spaces. Clipping against octree node boundaries wasn't
neccessary, so I thought.
I got feedback, from Paul Nettle, Edward Kmett, Conor Stokes and Jaap Suter. The algorithm doesn't work. Since then, I discussed many techniques with many people, read many docs, and generally got wiser. :)
And now the time has come to correct the previous document. In this new document I describe a technique for visibility determination that should work. I emphasize that none of the described techniques are new, possibly even the combination of techniques I present here is not new. Neither can I guarantee that it will work, although I'm pretty sure it's right this time. Finally, I do not guarantee that this is the perfect solution (I present two solutions, by the way, each with it's own weaknesses), so, in short, this document is a discussion of a potentially interesting algorithm, and it's purpose is to give the discussion about visibility algorithms an impulse.
The two algorithms that I present here
both use the octree (or kd-tree) for spatial subdivision.
A short description
of both subdivision algorithms:
First, the bounding box of the entire scene is determined. This box is enlarged so that it becomes a perfect cube (some octree implementations do not do this). This box becomes the root node. If a node (including the root node) contains more than a specified number of polygons (e.g., 30-50) the node is split in 8 equally sized child nodes. Now the node doesn't contain polygons anymore, but only 8 child nodes. The polygon set is distributed over the child nodes, and the child nodes are subdivided again if they still contain more than the specified amount of polygons. Besides using a fixed number of polygons as a termination criterium for node subdivision, other criterea can be used as well: For example a minimum node size, or a maximum recursion depth.
The kd-tree looks very much like the octree data structure. This time however, the root node (which doesn't have to be a perfect cube anymore) is not split in 8 childnodes, but it is split in half, using alternating x/y/z-axis aligned splitting planes. The splitting planes do not neccessarily split a node in half, but rather in two halves containing about the same amount of polygons. >From here on, I shall consider the octree and kd-tree as two almost identical structures, although I consider the kd-tree to be a better spatial subdivision. The two techniques have so much in common, that choosing one is really not important for the two algorithms.
Rendering The Tree
A kd-tree (or an octree, kd-tree assumed from here on) can
be traversed front-to-back relatively easily. Traversing
is performed using a recursive function that determines at
which side of a splitting plane the camera is located;
these splitting planes are generated when the octree or
kd-tree is compiled - a kd-tree contains a single splitting
plane per node, the octree contains three splitting planes.
For octree traversal, check each of the three planes in turn to find the correct order to process the child nodes. The 'camera side' of each plane is processed first, then the opposite side. This way, we quickly find the kd-tree node that contains the camera. This is obviously the closest node, so we process this node first. This node also has no child nodes, so the recursive function returns, and the opposite side of the last encountered splitting plane is processed. This way we find the next closest node. Repeating this process, we visit every node, in the correct order. Note that to traverse the trees in back-to-front order, we simply pick the side of the plane that the camera is NOT in: Now the node with the camera will be the last node that is visited. This might be usefull in some cases (but not in the two algorithms that I discuss below).
Also note that neither the kd-tree nor the octree provide us with any kind of visibility information; they only provide us with the nodes sorted by depth. This even doesn't mean that the polygons are sorted: The polygons in the nodes still need some processing if we want them perfectly sorted.
Using the octree or kd-tree as the basis of the algorithm, we
can now start performing visibility tests to terminate the
traversing of the tree when a processed node would not
contribute to the image on screen (i.e., it's completely
obscured by previously drawn polygons in previously processed
nodes). By discarding nodes we can skip large portions of
the scene - a discarded node may have many child nodes.
This algorithm assumes no clipping of polygons to node
boundaries. Since we can now not guarantee correct sorting of
the polygons, we need to fix this somehow. In hardware, we
could simply use the z-buffer. For software, Paul Nettle's
s-buffer is a better choice. The s-buffer allows random-
order insertion of polygons. Inserted polygon spans are
clipped against existing spans, so that the s-buffer
always contains only visible portions of the inserted spans.
In case you are not familiar with the s-buffer, I suggest
you read Paul's document.
The good thing of the s-buffer is that we can also use it to test whether a polygon would change the s-buffer if we inserted it. This means that we now have a basic visibility test for the polygons: A polygon that changes the s-buffer is visible, all other polygons are invisible. Instead of testing polygons against the s-buffer, we can also test node boundaries against the s-buffer: Simply insert the imaginary octree node faces as if they where polygons, and see if the s-buffer would change. Now we do have a good termination criterium for octree traversal: An invisible node can't have visible child nodes or polygons. Note that we cannot use the hardware z-buffer for this: There is no (fast) way to test whether a polygon would change the z-buffer. So, for hardware rendering, simply maintain a software s-buffer too. This sounds expensive, but for hardware rendering, we do not have to perform span rasterization, so the time needed to insert polygons in the s-buffer is not the same as performing software rendering.
Notes On Algorithm I
|This algorithm has some interesting characteristics. First, the octree (or kd-tree) is very easy to construct, since no splitting occurs. In fact, building the tree can be done while loading the scene from disk. It's also very easy to rebuild parts of the map at runtime: Polygons can thus be moved freely around. This means that there is no need to have a distinction between moving objects (actors) and static scenery, and this implies that moving objects will occlude geometry (nodes) the same way as the static scenery does. The s-buffer is expensive, especially for complex scenes, since it gets slower when more spans are already in the lists. But the algorithm does support very 'nasty' scenes, like imported 3DSMax files with very small polygons, intersecting polygons, tiny gaps and so on. Everything will be sorted out by the s-buffer. And, a hole will only cause further traversal of octree nodes when it is at least a single pixel on-screen: That means that the occlusion is pixel-based. For very large, truely dynamic scenes with relative low local complexity, this seems to be a good algorithm. Note that it is also very simple to implement. The s-buffer insertion routine is probably the hardest part. Finally note that this algorithm will perform relatively better in software than in hardware, since the overhead for hardware rasterization is rather big.
There exists an alternative for the s-buffer. This
alternative is called the 'c-buffer', which stands for
'coverage buffer'. It has several advantages over the
s-buffer, but is also limited in some ways:
The c-buffer represents the covered pixels on the screen.
Inserted spans may not be (partially) in front of existing
spans, since no depth information is stored in the c-buffer.
The c-buffer assumes that inserted spans are already drawn
on the screen, so no texture info is stored. The only thing
that the c-buffer does is clipping newly inserted spans
against existing spans, because spans are considered to be
inserted front-to-back. The c-buffer is therefore much
faster: It's insertion routine performs simpler clipping,
and stores less, and inserted spans are never removed
because of newly inserted spans. Since the c-buffer does
not have any information about the contents of the spans
except for which pixels are covered, spans can be merged
when new spans touch existing spans. That means that a
scene that fills the entire screen will result in a c-buffer
with a single span per screen line. The c-buffer performs
thus much better with scenes that have many polygons,
especially when these polygons are inserted with some
spatial coherence. A good thing is that we have some spatial
coherence: Polygons are inserted per octree node, and it's
therefore unlikely that they are far apart.
There is one big problem with the c-buffer: It requires the spans to be inserted in the perfect front-to-back order. This means that we have to split polygons against octree node boundaries, and that the polygons inside the node need to be sorted perfectly. Painters is not enough here: We need so-called "mini-bsp's" for each node. Now the scene compilation process becomes much more time consuming: Each inserted polygon is clipped against every node that it is in, and once a node contains 50 polygons or less, these 50 polygons need to be processed using a BSP compiler. The c-buffer can be used to test for (partially) visible nodes, just like the s-buffer. So, now we have a cheaper algorithm for visibility calculations, and we can still use it for 'dirty' scenes, with intersecting polygons, tiny holes and so on, but at a cost: The scene needs quite a bit of preprocessing. It should still be possible to perform this processing while loading the scene.
Notes On Algorithm II
Compiling a random scene to an octree with splitted
polygons and perfect sorting is a relatively expensive
process. It is obvious that this results in a much less
dynamic scene: Recalculating parts of the scene is possible,
but it takes some time. If many nodes are affected by the
changes, this might take too much time to call the renderer
'real-time'. In practise, I think it is possible to process
explosion damage and one-time changes in the geometry, like
adding buildings, but it is not very well possible to move
large things smoothly through the scene.
It is probably best to move the dynamic parts of the scene out of the basic algorithm, and draw them using z-buffering once the scene is finished. This still allows modifications to the static scenery, like the explosion damage I mentioned. It also allows for real dynamic objects that animate and/or move every frame. These objects cannot occlude nodes however, since they are not considered while traversing the octree. The price for inserting them in the octree is probably higher, though.
Note that dynamic objects can still be occluded by the environment: By placing an actor in the octree nodes that intersect it's bounding box or -sphere, the actor can be discarded if none of the nodes that it is in are marked as visible.
This algorithm should perform much better in terms of rendering speed: Maintaining the c-buffer is considerably cheaper than maintaining an s-buffer, and drawing the actors once the scenery is drawn using the z-buffer is probably cheaper than adding 500 or more actor polygons to the s-buffer every frame (not to mention the changes in the octree).
This algorithm is also better suited for hardware rendering, since the z-buffer in hardware is virtually 'free'. Just rendering actors, and discarding the possible occlusion of nodes caused by their presence is very likely faster than the full occlusion culling that the previous algorithm provides.
Comparison Of Algorithm I & II
|Both algorithms are interesting ways to perform visibility calculations for rendering 3D scenes at interactive rates. Neither algorithm 1 nor algorithm 2 return a perfect visibility test: Algorithm 1 does however return a 'perfect set' for a raster image. This perfection comes at a price, though, especially when actors with high polygon counts are inserted into the s-buffer. Note that really dynamic scenes are rare: Not every game requires lots of moving polygons. Most scenes still consist of mainly static polygons, and that's not just because engines can't handle more dynamic scenery. Note that true dynamic scenes make various other often used techniques impossible: Lightmaps generated using radiosity are usually not calculated at runtime. Both algorithms provide great freedom for the input set. The algorithms do not rely on perfectly sealed worlds or precisely aligned volumes, like Quake does. They simply render the input. This can ease the creation of scenes for computer games dramatically.
Extension Of Algorithm II
The final part of this document describes a highly
experimental extension of the second algorithm. I must
emphasize, even more than at the beginning of this document,
that this is not tested, it's just interesting, I hope.
Before Chris Babcock (another cool 3D researcher) explained
to me that octree boundary planes can be tested against the
s- and c-buffers, I tried to cull nodes against a beamtree.
This beamtree was to be filled with polygons from the
mini-bsp's at the final nodes of the kd-tree. This approach
introduces many problems, although it would be resolution
independent, which is a very nice feature. Edward Kmett
pointed out that especially bad input sets and numerical
instability would cause great problems with the beamtree.
However, it is still possible to extend the second algorithm
with this beamtree, as Conor Stokes explained to me this
morning: If we build the beamtree using only the 'large
occluders', we can first test the nodes against the beamtree
(effectively, see if they are behind one or more of the
solid regions in the beamtree), and if this test fails, we
can still test the nodes against the s- or c-buffer.
Good thing about this approach is that testing a node
against a simple beamtree is much cheaper than testing the
node faces against the s- or c-buffer.
Bad thing is that when the node is visible anyway, we have
just wasted extra time. This obviously requires some
The first thing that I asked Conor Stokes was: "What is a big occluder?" - A small polygon can be a large occluder when it is very close, and a large polygon can be a small occluder when it's far away, or looked at from a big angle. The answer is simple: Count the pixels that it covers in the c-buffer. If it exceeds a fixed constant, add it to the beamtree.
Another Addition: Mixing 1 And 2
|Martin Hristov just sent me a note: It is possible to mix algorithm one and two. The c-buffer needs perfectly sorted polygons, and a mini-bsp can do that. However, the mini-bsp makes it very hard to modify a node at run-time. There's however another technique for getting perfect sorted spans to the c-buffer: The s-buffer... That's right, we can use a 'local s-buffer' at the octree node level for drawing just the spans in the octree node, and once this set of spans is completed, it can be sent to the screen and the c-buffer. Now we have a faster s-buffer, because it always contains as much polygons as an octree node contains (rather than all the polygons that are visible at once), and the nodes are much more dynamic. We still need to split polygons against nodes, but that can be done at run-time, much faster than recalculating the whole mini-bsp. Also note that ONLY nodes with moving polygons need to be processed using the local s-buffer technique: All other nodes can still be handled with a mini-bsp. This enhancement makes algorithm 2 better than algorithm 1 in most cases. Only in case a lot of polygons are moving around so that even just splitting is too much, the use of a global s-buffer becomes interesting.
The algorithms that I presented here both provide a way
to perform visibility calculations for arbitrary input sets
with low local complexity. The second algorithm scales well
with increasing complexity, but it is not resolution
independent, and will therefore be considerably slower at
higher resolutions. The first algorithm allows big changes
in the scene at runtime. Both algorithms are relatively
easy to implement.
If you have any thoughts that you wish to share, or if
you see any problems that I overlooked, please contact me:
Jacco Bikker - a.k.a. "THE PHANTOM"