Too Fast To Be Fun - Part 2 - PDA Programming
by (08 February 2001)

Return to The Archives

First, I want to thank everyone who e-mailed me in the last week - I got about 20 messages of people, most of which fully agreed with the first article. I also got a lot of information on some of the subjects I proposed, especially realtime raytracing: There are some very cool people working on that; from one of them (Thomas Ludwig) I even received a ready-to-be-published article describing lots of ways to speed up a realtime raytracer.

Nevertheless, I won't write about realtime raytracing in my first chapter, since I want to experiment with raytracers before writing about them - But I promise there will be an article on them. My appetite is definitely wettened. :)

Programming for PDAs

The first topic that I want to write about is 'PDA programming'. More specifically, coding for WindowsCE. Let me explain why I like PDA's: First, they are first-class geek devices. :) Second, they resemble old PC's: No floating point instruction set, no hardware accelerated 3D, and most importantly: There's almost no software for them. And that's odd, since they sell very well: Compaq recently estimated that the demand for their popular iPaq outnumbered stocks 25 times.

Recently, a lot of coders 'discovered' the iPaq. In a period of a mere three weeks there was a true 'software explosion': A Nes emulator, a gameboy emulator, a SNES emulator, another SNES emulator, an iPaq overclocking tool, a PC Engine emulator, a NeoGeo Pocket emulator, a port of MikMod (.s3m modfiles player), a port of Quake 1 and a lot more. Most of it is available for free, and for some of these, source code is available. At the beginning of this three week period, I bought an iPaq. So I had a good time. :)

I myself contributed a bit to the CE coding scene too: I wrote Lemmings for CE, and last night, I finished a little 3D engine. I'll write a bit about both.

By the way, I used Microsoft Embedded Visual Tools (the capitalization isn't correct, but as long as Microsoft is not spelled as MicroWarez or something like that, I think they show some weird attitude by calling the thing eMbedd3d v1sUa1 t00ls or something like that). This is an integrated IDE and compiler for all Windows CE devices; it's a free download (available here, ~300Mb).


When writing software for a palm computer, you have to consider more than just the hardware limitations like speed and the lack of an FPU. Data input for example is a huge problem: Basically, you have a touch screen, and that's it. Another problem is the small screen: The iPaq has a 240x320 display, and that's not just small, those are odd dimensions as well. So, a horizontal shoot 'm up is not a good idea; it looks bad on the narrow display, and continuesly pounding the touch screen is not a good idea. Lemmings, on the other hand, requires only mouse input, and the screen size can be lived with if the levels are designed properly. Other game types that work well on PDA's are RPG's, like Zelda, puzzles, board games and racing games.

I wrote the first version of Lemmings for the Philips Nino. That's an older Windows CE device, with the same display dimensions, but with less memory (8Mb), a slower processor (~80Mhz, I believe) and a 2bit display.

The processor speed is no problem. Lemmings is not a very processor-intensive game, and I had no problem at all getting it to run fast enough.

Memory, on the other hand, is another matter. If you have only 8Mb, a level that takes up 250Kb is a problem, especially if you want four of those. That pretty much rules out side way scrolling. I tried side way scrolling, by the way, and it just didn't work: The tapping on extra on-screen buttons just take the fun out of the game; a level that is small enough to fit on one screen is simply more fun to play.

To take maximum advantage of the available memory, I considered using a trick that I once used for a side way scrolling shoot 'm up: A c-buffer. Now what has a c-buffer to do with a shoot 'm up, I hear you ask? :) Well, here's what: Suppose, you have eight layers. Each layer, except for the last one, has large black areas, where you can see the next layer. The filled pixels between these black areas can thus be stored as spans of pixels, plus their position and length. The average Lemmings level contains a lot of black pixels, so this way of storing the level actually compresses it to less than a quarter of the original size. Nice. :) I didn't intend to use multiple layers for the Lemmings game, so displaying a level is straightforward. For the parallax scroller, I use a c-buffer to determine what parts of each scanline is already filled with pixels, so the next layer is effectively 'clipped' against the drawn layers, resulting in zero overdraw. The number of layers that can be displayed this way of course depends on the complexity of each layer, but in practice, ten or more layers are easily fast enough.

The next problem is the 2bit display. Four colors is just not enough for some decent graphics, and so I hacked around a bit. A typical LCD display is very slow; a blinking rectangle for example is only perceived as 'blinking' if the rate at wich it is turned on and off is less than ten times a second. The Nino actually allowed much faster writes to the display buffer than that, so I figured that if I would blink twenty times a second, I would effectively get a greyscale. :) This works, indeed, so now I can display sixteen greyscales; the flicker is noticeable, but not annoying, and the overall effect is much better than 4 colors.

Then came the iPaq. That machine is four times faster than a Nino, has four times the memory, and a sixteen bit display. To make things worse, the iPaq has a newer version of WindowsCE (3.0 instead of 2.01), and the display buffer is rotated by 270 degrees. Porting Lemmings to this device is easy, but in an ideal world, one would like Lemmings to run on both devices, without modifications to the game source. So, I wrote a little lib (called EasyCE) that offers the game a 16bit framebuffer, and displays it correctly on both the Nino (using greyscale emulation) and the iPaq (using a rotated blitter) and all other WinCE devices. The Lemmings source code contains no platform specific code, and no #ifdef's. It's even easy to port to Win32, using OpenPTC, for example.

I consider Lemmings to be 'done', and I added it to my portfolio. :) I've added the sources for Lemmings, so if you feel like doing a windows port, go ahead. Be my guest. Should be a doddle. (Editor's Note: The images and sounds that work with the demo are not included in the above zip file. If you would like those, contact The Phantom here)

A 3D Engine For The iPaq

As I already mentioned, the iPaq does not have a floating point coprocessor. And no 3D hardware, of course. So there was a clear challenge for the prophet of software rendering, a.k.a. "The Phantom". :)

About two years ago I was still working for Lost Boys Interactive, a multimedia company located in Amsterdam. At that time, I was working on a software rendering engine that should have come close to the performance of a VooDoo (how foolish). Just before the cancelling of the project, it could display a 50K poly 3DS model of a bull at 15 frames per second, on a P200.With phong shading enabled, it still ran at 12fps. I want that on my iPaq. :) This became even more important when someone released a 3DS file of the iPaq in all it's glory; 50K polygons describing every subtle bend and hole that I touched so often on my beloved companion. Wouldn't it be nice to be able to draw this model in realtime on my iPaq? And, the model just asks for it, on the display of the rendered iPaq, there should be another iPaq, and so on...

So, I need some good 3D code. However, nobody writes fixed point code any more.

[A bit about fixed point, for those of you who are not familiar with it: Fixed point math basically means that you use whole numbers only. To get the desired accuracy, you use large whole numbers. For example, instead of 1.734 you could also write 1734, without losing any information. Of course, you need to remember that you multiplied by 1,000, so if you multiply two of those numbers, you need to divide the result by 1000 (because the result would be 1,000,000 times to large, instead of 1,000). On a computer, multiplying by powers of 2 is more efficient than multiplying by powers of 10, so you don't multiply by 1,000 but for example by 1024 (2^10). In that case, you are working with 22.10 fixed point math; you use 10 bits for the fractional part, and 22 for the whole part. So, after a multiply of two of those numbers, you shift the result 10 bits to the right to get the desired 22.10 fixed point answer.]

The problem with fixed point math is of course keeping the precision high enough, while not running out of bits. Multiplying two numbers illustrates this problem: 22.10 times 22.10 results in an intermediate result that is 44.20, that's 64bit. If you don't have 64bits, you loose something: The highest 32 bits of the whole part. So, the temporary value is in fact 12.20 fixed point, and in such an intermediate value you can only fit the result of a multiply of two 6.10 fixed point values. That means that there's little space for your whole numbers, even if you use only 10 bits for the fractional part. Fixed point math is thus continuesly juggling of bits (and I haven't even mentioned negative numbers).

So, now I have my 3D engine. It's extremely stable, sub-pixel and sub-texel accurate, wich basically means that there are no shaky pixels anywhere. So far it's just a spinning cube, but I will add a nice 3DS loader tonight, and phong shading tomorrow, so I will have my spinning iPaq on an iPaq on an iPaq before the weekend. And all this with integer math only.

[Sub-pixel and sub-texel accuracy: Not all lines starting at (1,1) and going to (100,2) look the same. Actually, they do if you always pass your line code integer coordinates, but if you pass floating point coordinates, it becomes slightly more complicated. The example line goes one pixel down, and a hundred pixels to the right. So, where's the scanline change? Visually, it does matter if the 'step' is at x=10 or x=90. You can make the line appear to move smoothly if you take this into account. So, a line that starts at (1,1.9) and ends at (100,2.0) will make the step almost immediately, while a line that starts at (1.0) and ends at (2.9) will make it just before reaching x=100 (note that 2.9 will still be rounded to 2). The same goes for texture U and V. When you rasterize a polygon, you would normally interpolate an x position from the first scanline to the last scanline that each edge covers. A scanline has an integer y position, so if you want to be more accurate, you take the difference between the integer y position and the real y position into account. The x position of the edge on the first scanline is thus corrected as follows:

float diff = (fy - iy);
x += diff * deltax; 

where fy is the floating point y coordinate of the first scanline, iy the integer coordinate of the first scanline, and deltax is the value that you add to x when you go to the next scanline.

The same happens when you start drawing the actual spans; the left side of the span is stored as a floating point value, while you start drawing on an integer position. So:

float diff = (fx - ix);

u += diff * deltau; v += diff * deltav;

where deltau and deltav are the values that you add to u and v when you proceed to the next pixel.]

Of course, there are easier ways to solve the accuracy problems, even when using fixed point numbers. You could use a 64bit intermediate value. There are nice data types in C and C++ that support this. I didn't want to use this right away though, because I have no idea how this would get compiled (I don't want to replace the floating point emulator by a 64bit emulator), and I don't want to use assembly. Assembly would have solved the whole problem, since virtually every 32 bit processor supports 64 bit intermediate results, just for the purpose of fixed point math; but sadly not every WinCE device has the same processor, wich means that I would need to write assembly for the Hitachi SH3 and SH4 processor, the MIPS and Intel's ARM and StrongARM. That's not platform independent, and I don't like that.

So I'm happy with the current results. I'm going to stress-test it as soon as possible.

You can download the 3D engine here, the zip file contains project files for the Embedded Visual Tools and Visual C++. I've also included an executable, in case you're really lazy. :)

Final Words

That's it for now. I hope that I have made clear what's so interesting about PDA's: Old skool coding, made neccessary by lots of contraints. The WinCE platform can theoretically be coded more or less like a Win32 machine, but if you want to write good code for it, you need to write radically different code. And if you do, WinCE PDA's (especially the iPaqs) are awesome machines.

Article Series:
  • Too Fast To Be Fun - Part 1
  • Too Fast To Be Fun - Part 2 - PDA Programming

    Copyright 1999-2008 (C) FLIPCODE.COM and/or the original content author(s). All rights reserved.
    Please read our Terms, Conditions, and Privacy information.