Consider also that you have to keep state to correctly apply a filter:
if you filter is
n long, you need at least
n-1 samples
before the first in the current buffer.
The FIR routines in CMSIS will do that for you (on you to provide buffers, but the documentation is reasonably clear).
CMCIS DSP might probably not be perfect etc. etc., but it provides most building block needed, and is reasonably efficient.
Unfortunately, some "greek" is needed to understand what is going on.
The Analog resource I pointed to previously is quite decent.
The ToC is
here.
I use the Lyons "
Understanding Digital Signal Processing" as a reference, the approach is not overly complicated, with many examples.
Familiarity with the idea of convolution is good, luckily 3Blue1Brown just released a very nice (as always) video:
As for downmixing, Ok I thought of something more complicated (FIR decimation).
Be careful with levels, of course.
I imagine you'll be using integers? If you want to do some more advanced processing, consider an FFT based processing chain: applying filters becomes a multiplication rather than a convolution (watch the video!), and that's (often) a computational net gain if you need to apply more than one.
Generating filter coefficients in the frequency domain is no harder than in the time domain, even simpler sometimes.
For a measure of FFT efficiency, and other DSP on STM32 topics, there's the classic
AN4841.
As with time domain, you need to save (at least part of) the previous buffer, the two commonly used methods are overlap-and-save and overlap-and-add, only the last is treated in the Analog book, Lyons describes both.