The usual approach is an FPGA with at least one transceiver to do the embed/deembed and an LMH something or other to act as line equaliser and driver, but IIRC Genem/Semtech had a part that specifically did audio embed/deembed but it will be ancient by now, might be worth asking about it still.
I would have thought that a very modest Artix 7 would get it done with no fuss, we use the -35s but we also do video and scaling in there, so something smaller may well work for you.
Regards, Dan.