Are the clock-related fuses set correctly? Given that you get some sort of output, it is likely not set to external oscillator (a somewhat common mistake). Does the clock rate match the F_CPU setting used while compiling the bootloader? Any clock division fuses set that might mess with this?
Do you have a scope or logic analyzer that you could use to look at the output to see if the baud rate is off? If the bootloader expects an 8 MHz clock, and you have CKDIV8 set, then your actual baud rate might be 1/8th of the expected baud rate. If you use a 12 MHz crystal instead of 8 MHz, then the baud rate might be 50% fast.