The nice thing about my AVR example is, AVR is very simple and deterministic, so one can count the cycles very easily, using the datasheet timings. And it's in ASM to begin with, so you get what you write. No worries about what the compiler will make of it.
A tree of IF statements is... one way to do it, I suppose. But if that's all you can think of, you have much to learn about the richness of programming!
Also, if you're absolutely concerned about one concern alone -- like clock cycles, or code and data size, or memory usage, there are many optimizations that can be applied there. Loop unrolling is one time saver; when you're doing transfer functions (i.e., one parameter comes in, a value goes out), a memory map (lookup table) can save tremendously. But lookup tables aren't very practical on embedded systems (with limited memory), so you can't go too crazy there.
Taking advantage of your available resources is always the best path. AVR has hardware multiply, which makes some things fantastically easy, even though the math works out worse. One 16/16 division is worth about six 16x16 multiplies!
Tim