I have to admit that when some cpu vendor says something like "we've added saturated multiple and accumulate instructions for DSP applications", I have trouble imagining what sort of algorithms they're supposed to help with.
Well, one use is in filters. If overflows would result in instability then you can use saturation arithmetic to help prevent that. You might get some nonlinearities, but better that than an outright oscillator.
Anyways, for this case I found it to be easier to just use a model that was proven to be stable. Or rather, proven that a stable configuration exists. And then the actual probably-stable configuration is conjured up out of optimization, simulation and a
"Yeah, that'll probably do" proclamation.
For those interested, the
delsig toolbox really helps with the above conjuration.
I mostly concentrated on the CIFB topology, since I found that to be relatively easy to map onto both mcu and fpga. Of course on the mcu it's a big loop and on the fpga it's a pipeline, but you get the idea.