I was surprised to have to read to reply #47 before T3sl4co1l suggested CAN bus. I have been working on a CAN bus system for a client throughout this year, and it has been quite robust and reliable.
For this application I think 250 kbps (first version CAN) would be fine. It is necessary to check what the official maximum wire length is at 250 kbps, one source that I have says 87m and another says 250m.
I understand that the requirement is for isolation, there are isolated CAN transceivers available. It would be necessary to check the wire length separately for the isolated transceiver. I could look into these issues further if asked.
Anyway, the most compelling reason to use CAN instead of Ethernet is because CAN is real-time and Ethernet is not. Of course the non real-time medium can still be used, by arbitrarily picking a latency and providing that much delay and buffering to avoid artifacts from packets arriving earlier than the worst case scenario. But why do so? You're adding enormous complexity in order to get a substandard result.
With CAN you know that audio packets will take priority over other (control) traffic so you know exactly how long they will take to arrive. It would be prudent to provide one or two frames' worth of buffering just in case, however the frame size of CAN is deliberately set to just 8 bytes (with about 2 bytes overhead for address, check bit, stuff bits, etc). So at say 16 kHz x 12 bit samples the latency is about 16 samples or 1ms. There is no way that Ethernet could match that.
RS485 might, but I don't see a compelling reason to use RS485 over CAN bus given that CAN basically starts with capabilities similar to RS485 and adds additional robustness (such as higher tolerance to clock rate error, etc) and a higher layer protocol (address field, check bit, flags/stuffing etc) that you'd have to do yourself in software with RS485.
One good thing that can be said about RS485 is that it drives the bus high for 1 and low for 0, allowing faster speeds in general than CAN bus which uses a (slower) pull-up resistor for 1 and only drives low for 0. On the other hand, the CAN bus way greatly simplifies media access and collision management, which adds significant software complexity in RS485 and requires bus grant schemes etc that hurt real-time performance. Hence, I would go with CAN bus here.
One thing that can be quite tricky with digital audio distribution is sample rate error that builds up over time. Say you have an ADC and DAC at each end each running off a locally generated 16 kHz sample clock, then if the DAC at one end is slightly faster than the ADC at the other end, the data transfer will underrun and filler samples might have to be generated, if the DAC is slower then the data transfer will overrun and samples might have to be dropped.
There are many ways to deal with this, some of those ways require a timebase to be sent over the medium, and this is something that CAN bus does brilliantly. Our application has a number of radio transmitters on the CAN bus whose transmissions must be tightly timed l, and we are using a higher level protocol called UAVCAN on top of the basic CAN addressing and medium access protocol. UAVCAN has a time server built in that works really well, using the hardware to timestamp the sending and receiving of CAN frames.
If only audio is to be sent over a dedicated connection of at least 3 wires (or 4 including power), then a small and cheap microcontroller will do it. The STM32F4's have pretty good number crunching ability if you want to do compression, but the CAN controller has some frustrating errata that wasted a lot of my time. 8051 derivatives with CAN can be had for next to nothing but may require external ADC/DAC if quality is a concern. There may be a middle course... NXP? Or you mentioned specific chips, so if using those you could add an external CAN chip. I have a USBtin that uses an MCP2515 plus MCP2551 transceiver, these are cheap and common.
cheers, Nick