In a previous thread I asked for advice about something similar, and so xeta sent me a PM about this thread, but I've just checked my PM box now.
In my application I didn't need any feedback control (so no latency problems), just acquisition for visualization on a PC, and I decided to use what I had by hand and was easy to use, a PIC18F2550/4550. I made few tests, but couldn't go much above 1kS/s (8bit/sample), although I don't remember exactly the limit. I think there still is some room for improvement, but I don't think it's too much.
It was done within the Microchip USB Framework CDC (serial emulation), in a quite simple way. The acquisition started on a certain char received from the PC, and was terminated after receiving another command. I started the A/D conversion inside a timer interrupt routine, paying attention to trigger the AD GO/DONE bit as the first instruction. After triggering the conversion I pushed the previous converted value in an array (inside the interrupt too), while the free running code sent the bytes at a certain buffer fulfilment (something like 5 bytes, I think). An overlapped writing on the output buffer would be registered on a flag set, so I could be sure about the throughput.
I chose the serial emulation for its easiness of use and portability on the PC side. In fact my code was in Matlab, and the communication part is very simple, few lines indeed.
If I had to start that thing right now, I'd just use a microcontroller as a coordinator (perhaps the same, because of its simplicity), an external ADC and an FTDI232 as UART to USB (which goes up to nearly 1Mbit/s). If the system was a one-off and I had a little more budget, I'd use a C2000 (perhaps in a ControlCard, to make things simpler) and an FTDI too.