Is performance an issue?
What a normal UART does is poll the line at 16x the baudrate clock (here 5x16=80Hz or 10400x16=166400Hz) for a start bit, and then 8 clocks later, start collecting samples 16 clocks apart, i.e. in the middle of each bit, including the start bit which is used for false start bit detection.
So if your 16MHz micro can cope with an interrupt every 100 clocks or so, usually lasting about say 10 to 20 clocks, then why not do it all in software on an absolutely bog standard repeating timer interrupt (set it up once in initialization code then leave it). Yes, this is inefficient. But it's also hardly any code and difficult to make a mistake.
Of course you can fiddle with the different peripherals in various interrupts (get an edge triggered interrupt for the 5 baud start bit, in this set a timer interrupt to occur in the middle of each bit, then after 10 of these enable the UART) but I suggest to start simple and let your design evolve as you add features, so that you're doing the minimum work and writing the simplest and slowest code that still meets the realtime requirements of the application.
cheers, Nick