For an example of an efficient approach, here's how the divide-by-10 might be implemented:
http://www.seventransistorlabs.com/uDivTen.asm.txtNaturally, it's cryptic assembly, but hey, you asked.
Long-hand division (not shown here) is done the same way you do long division by hand, with the simplification that, since the running remainder can only possibly have a one or a zero in the relevant place (conveniently arranged so it's the sign bit), the actual operation is a shift and conditional subtraction. On AVR, it takes about 220 CPU cycles.
The above method is accurate for a constant divisor. It uses the precomputed reciprocal, and multiplies the dividend by it. The extra (fraction) bits are the remainder. Because AVR has a hardware multiply, this operation can be very fast, even though it's a roundabout method, conceptually.
A compiler probably won't output a function like this, for constant division, but will use the long-hand function instead (built into a library). A compiler will use an in-line operation where the division is a power of 2, however: those reduce to a shift-right operation. You will often see programmers write "x = y / 16" as "x = y >> 4", explicitly.
Completing an operation such as "sprintf("%n", var_is_uint16_t)", in standard C and libraries, should take on the order of several thousand clock cycles to finish. (There's also a lot of wasted cycles and code space, spent jumping over the very complicated parts of sprintf, if it's a fully C-compliant sprintf.)
Tim