No, dedoding the manchester using a software only solution is really not easy. That is also why there is a great amount of MCUs that can produce the SPDIF data stream, rather than decode it. Also, usually the SPDIF transmitter peripherals are simple, the receivers are very complex peripherals, that either utilize a dedicated analog PLL to recover the bit clock, or use very high oversampling rates to sync to the stream (that's for example how the STM32F446 SPDIFRX peripheral works).
As the SRC4392 really needs some power (like 0,25 to 0,3W), would it be a solution for you, to use a dedicated SPDIF receiver (DIR9001) to convert to an I2S bus, and thes use a different SRCxxxx chip that has also I2S input and output, to be able to switch it off, while having still the I2S data available?
If I remember the DIR9001 is quite low power (<70mW it seems), that should be left running with a small dedicated MCU, that could evaluate the I2S data.
But I guess it depends also on your budget for the project, if you can affor using a separate chip for the SPDIF decode, separate resampler and a dedicated MCU.
There are also other solutions there, like using a MCU with integrated peripheral SPDIF receiver. The MCU would decode the SPDIF and send out the I2S data. Then you could use the SRCxxxx after it, omitting the need of separate SPDIF-to-I2S chip.
The STM32F446 also has enough grunt to run the ASRC algorithm inside, allowing to omit the SRC4392 entirely. The integrated SPDIFRX also has 4 selectable inputs I think. But this depends also on your abilities and enough developoment time for this solution.
Bottomnote: Considering that digital SPDIF sources will also produce only zero samples when not playing might be wrong too. Due to calculation inaccuracies, a small amount of noise might be produced for example in some audio processing filters, namely the ones using feedback paths (commonly used biquad cascade equalizers). Also, some audio sources might be only lower bit resolution (16bit), but utilizing noise shaping algorithms to increase the dynamic range a bit, for example at the "volume setting" output stages.