RISC-V still has to catch up to what ARM has in this respect. An unfortunately I see no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs.
Than you're not looking.
There is an Embedded ABI being worked on, with fewer volatile registers -- probably a0..a3 and t0..t1 instead of a0..a7 and t0..t6 for Linux, thus cutting down the number of registers that need to be saved on an interrupt to six (plus ra) instead of fifteen. Also, all registers from 16..31 become callee-save, making the ABI identical between rv32i and rv32e. This is being worked on by people in the embedded community.
There is the CLIC (Core Local Interrupt Controller), a backward-compatible enhancement that was *specifically* designed with the needs and inputs of the embedded community. I've already pointed you to this. It provides direct vectoring to C functions decorated with an attribute, plus with a very small amount of ROMable code that can be built into a processor it provides vectoring to *standard* ABI C functions along with features such as interrupt chaining (dispatching to the next handler without restoring and re-saving registers), late dispatch (if I higher priority interrupt comes in while registers are being saved), and similar latencies to those ARM cores provide.
SiFive has developed a small 2-stage pipeline processor core *specifically* for deeply embedded real-time applications. There is no branch prediction .. all taken branches take 2 cycles. Other suppliers such as Syntacore and PULPino have similar cores, for example the Zero RISCy in the new NXP chip. SiFive has also developed an extension for the 3-series and 5-series cores to disable branch prediction for embedded real-time tasks (and of course turning part of the the icache into instruction scratchpad).
The Vector Extension working group had been bending over backwards to accommodate the wishes and needs of the embedded community and make the very lowest-end implementations simpler and better performing. As a simple example, the high-end people such as Esperano, Barcelona Supercomputer Centre, and Krste at SiFive wanted predicated-off vector lanes to be set to zero i.e. vadd dst,src1,src2 should not have to read dst. The embedded guys wanted the predicated-off vector lanes to be "left untouched". In this and in several other areas the design has been modified to better suit small embedded cores.
The Bit Manipulation Working Group is almost exclusively looking at things that primarily the embedded community want.
Claims that there is "no focus on MCUs at all at the moment, not even a recognition that MCUs are different from MPUs" is so far from the clearly obvious truth that it just about has to be trolling.