I admit I haven't thought about that lazy save/restore thing. How would you do this efficiently on RISC-V?
There are many strategies possible (policy), depending on what you want to optimise for, but the mechanism is well described over four pages in section 3.1.6.6 "Extension Context Status in mstatus Register" on e.g. p25 and following of riscv-privileged-20211203.pdf
Read that and ask if you have any remaining questions.
Note that according to the V ABI the vector state is undefined after any function call, of which system calls are a subset, so on any syscall the OS is entitled to set the current VS to "off" (and future use of V after the syscall returns will trap, at which point the OS can clear the state to 0s) or alternatively clear all the registers right away and set the current VS to "initial", which means it won't need to be saved on the next task switch if no more V instructions are executed. So the (rather large) V state has to be saved and restored only on a timeslice expired task switch (if dirty), not on a synchronous system call task switch.
So VS and FS differ in that way, by OS/ABI policy, not in hardware.
> zero potential for save/restore bugs as long as you don't use the FPU in tasks that are defined not to support it
That means you have to know for sure what all the code in all the libraries you call does. Practical in small embedded software, not in desktop software. How do you know printf() doesn't run any FP instructions at all, even if there are no %f in the format string? Or any other library you didn't write.
On the other hand, your scheme does allow the skilled developer to lie to the OS, say you don't use FP, but then actually do some small amount of FP computation, using a limited register set, and saving and restoring only the registers actually used (and the FP state: rounding mode, accrued exception flags, ...)
That is, if the OS doesn't police it by turning the FPU off for threads that say they don't use it, and trapping and printing an error message if they do. That would also solve the "how do I know printf() doesn't use FP?" problem.
Or I guess you could start with "uses FP" turned off for all threads, and turn it on the first time they use an FP instruction -- and then leave it on permanently, saving and restoring their FP registers on every task switch from then on.
It might be ideal to have your "I use FP" / "I don't use FP" flag be instead "I use FP heavily, please eagerly restore my registers" / "I don't use FP, or rarely, please restore my FP registers lazily".