Alright, I've done the test with Bruce's primes on the CH32V307 @144 MHz.
Here are the results:
// 319.155 sec WCH32V307 @ 144 MHz (-O1)
bytes 46.0 billion clocks
// 293.020 sec WCH32V307 @ 144 MHz (-O3)
bytes 42.2 billion clocks
(Normally, Bruce's primes is meant to be compiled at -O1. I tested also at -O3 to see the difference.)
Compiled with mainline GCC 13.2 (riscv64-unknown-elf).
I was always a bit surprised by the results with the Cortex M3 and M4 in this list (haven't tested primes myself on these). It's rather surprising that at least the M4 seems to do worse than a "modest" RISC-V.
Note that benchmarks are benchmarks and tell you only a small story. CoreMark, for instance, definitely shows a significant advantage for the Cortex M4. So, don't get hung up on exact numbers here - what it shows is that the Qingke V4 is definitely on par with a Cortex M3/M4.
Small note: I derived the test code (outside of Bruce's function) from the one I had written for my own RISC-V core, and it used rdcycle and rdinstret instructions to get the number of cycles and number of retired instructions (to compute the CPI). This has shown me that neither counters seem to be implemented on the Qingke V4, both always return 0. I had to use the SysTick timer instead, confiugred as up counter with HCLK as time base, to get the number of cycles. And didn't find anything to get the number of executed instructions. Not that it should matter for "normal" work with it, but I found this odd.