« w00t, one year!

latest

Announcing 32×32 »

Over-optimized   17 May 2009

A blog post I wrote a while ago on developing Geomex 1.1 which I kinda forgot about.

When I first implemented the drop shadows I talked about earlier, it wasn’t very promising. It looked good, but it caused a very noticeable drop in performance. Like, I lost 10 frames per second. Geomex didn’t feel as snappy as it used to. For a while I just decided to (err) drop the drop shadows, but then I decided, “what the heck, let’s get optimizing!”

I did a lot of good stuff. In this article I want to talk about how I optimized one function in particular: drawSubtile. A sub-tile is a quarter of each tile that the shapes are made out of. In this section of the function, we’re figuring out the coordinates for a sub-tile. The simple answer is that they start at (x, y) and end at (x+TILE_SIZE/2, y+TILE_SIZE/2). The twist is that we’ve got two variables, xflip and yflip, which swap these coordinates around on their respective axes. So, for example, if xflip=YES, then the coordinates are (x+TILE_SIZE/2, y) and (x, y+TILE_SIZE/2).

Here’s what we started off with:

    short int x0, y0, x1, y1;

    if (!xflip)
    {
        x0 = x;
        x1 = x + TILE_SIZE/2;
    }
    else
    {
        x0 = x + TILE_SIZE/2;
        x1 = x;
    }

    if (!yflip)
    {
        y0 = y;
        y1 = y + TILE_SIZE/2;
    }
    else
    {
        y0 = y + TILE_SIZE/2;
        y1 = y;
    }

    short int verts[] = { x0, y0,  x1, y0,  x1, y1,  x0, y1 };

It’s pretty much what you’d expect the code to look like. If the axis isn’t flipped, store the smaller value in index 0 and the larger value in index 1, otherwise store them the other way around. It’s simple to read, but actually quite slow. If-statements tend to do that. Of course, you never really put effort into eliminating if-statements because it’s just not worth losing the readability for an insignificantly minor speed boost. However, this is a special case. When a function is called more than 10,000 times a second and it’s running on a handheld device, every little speed boost counts!

So I set off to eliminate the if-statements still in disbelief that I was actually bothering to do this. Result:

    short int xx[] = { x, x+TILE_SIZE/2 };
    short int yy[] = { y, y+TILE_SIZE/2 };

    short int x0 = xx[ xflip];
    short int x1 = xx[!xflip];

    short int y0 = yy[ yflip];
    short int y1 = yy[!yflip];

    short int verts[] = { x0, y0,  x1, y0,  x1, y1,  x0, y1 };

I looked at my implementation and laughed. It’s completely correct, but it’s just so silly! The way this works is by exploiting the fact that a boolean (true/false) value can be used as an array index (where true=1, false=0). So if xflip is set, x0 = xx[1], which is the flipped value. Similarly for x1, we use the index !xflip (read as: not xflip, or the opposite of xflip) which, in this example, gets us xx[0].

It’s simple, it’s silly, but it worked really well. I actually felt a noticeable speed boost! My frame-rate counter proved this as well. I was pretty pleased.

I should have left it at that. But no, I just had to have another go at it. I knew I could do it just a little bit better. I knew further optimization wouldn’t give me a significant performance boost (see: Law of diminishing returns), but I had to do better. I had to over-optimize the solution. And so, an over-optimized solution was produced:

    short int xx[] = {x, x};
    short int yy[] = {y, y};

    xx[!xflip] += TILE_SIZE/2;
    yy[!yflip] += TILE_SIZE/2;

    short int verts[] = { xx[0], yy[0],  xx[1], yy[0],
                          xx[1], yy[1],  xx[0], yy[1] };

The previous solution eliminated if-statements. This one minimizes the number of assignments. I’ll leave it as an exercise to the reader to figure out how this one works. If you “got” the last solution, this one shouldn’t be difficult to figure out.

I stared at my code, stroked my chin, sipped my tea. I’d optimized, I’d over-optimized, and I finally concluded this couldn’t be optimized any more.

Leave a Reply