Author Topic: NEORV32 (RISC-V) soft-core "coremark" result (Read 4366 times)

FlyingDutch · « **on:** November 07, 2023, 12:05:18 pm »

Hello Forum,

today I run "coremark" benchmark for soft-cpu "NEORV32" (CPU based on RISC-V ISA) - see project link:

https://github.com/stnolting/neorv32

I loaded binary program with "coremark" benchmark for soft-cpu NEORV32 using built-in bootloader. The size of compiled (exe) program with "coremark" was 20 KB. FPGA project had been loaded to Spartan7 (QMTECH FPGA board). FPGA device is XC7S15-1FTGB196C and project clock is 50 MHz.
Because the binary program size I have to increase processor-internal instruction memory (using FPGA BRAM) to 20 KB - see code:

Code: [Select]

entity neorv32_test_setup_bootloader is
  generic (
    -- adapt these for your setup --
    CLOCK_FREQUENCY   : natural := 50000000; -- clock frequency of clk_i in Hz
    MEM_INT_IMEM_SIZE : natural := 20*1024;   -- size of processor-internal instruction memory in bytes
    MEM_INT_DMEM_SIZE : natural := 6*1024     -- size of processor-internal data memory in bytes
  );

I loaded binary to soft-cpu and run (in memory) coremark. Thew "coremark" sources were in examples of soft-core project:
https://github.com/stnolting/neorv32/tree/main/sw/example/coremark

I had to wait some time for results (the coremark was calculated for 2000 iterations). On the screenshot below is result of "coremarK":

NEORV32 (RISC-V) soft-core "coremark" result

As one can see the score is 14 iteration per second. Because NEORV32 soft-core has clock 50 MHZ then we have (14/50): 0.28 iter/second /1MHz.This is weak result. For example ARM Cortex-M1 has "coremark" 1.85 iter/second/1MHz . Here below are results for many popular MCUs:
https://en.wikichip.org/wiki/coremark-mhz?utm_content=cmp-true

https://www.st.com/content/st_com/en/arm-32-bit-microcontrollers/arm-cortex-m4.html

Here is "coremark" project on GitHub:
https://github.com/eembc/coremark/tree/main

The NEORV32 softcore had implemented instructions for integer multiply and divide, but hadn't FPU (floating point unit). I added the FPU to soft-core FPGA project, but ths fact do not change anything in "coremark" results.

I was wondering how is "coremark" result for Xilinx "Microblaze" soft-cpu and I found such results:
https://www.jblopen.com/microblaze-benchmarks-part-1-coremark-performance/

As one can see the averaged "coremark" result is about 2.0 iter/sec/MHz - which also is weak result.

Is it possible to conclude from this that soft-cpu's implemenrted on FPGA generally has very weak performance. Few post below were one related to new "Microblaze" soft-core based on RISC-V ISA - I am wonderig what is "coremark" result for this new soft-core.
Maybe someone has differnet results of "coremark" benchmarks for several soft-cpu's ?

Best Regards

betocool · « **Reply #1 on:** November 07, 2023, 02:03:45 pm »

I guess the ability to get started quickly on the NeoRV32 was more important than performance? There is a comment on the website about that, specifically about not being the most performant CPU.

This just opens your doors to explore other Risc-V's! I'm not sure what architecture the Microblaze is, but one drawback is that it will only run on Xilinx devices.

My own personal 2c's only.

Cheers,

Alberto

FlyingDutch · « **Reply #2 on:** November 07, 2023, 02:48:34 pm »

Hi,

MicroBlaze™ is AMD 32/64-bit RISC Harvard architecture soft processor core with a rich instruction set optimized for embedded applications - the old versions. Whereas
the "MicroBlaze V" processor is based on a 32-bit RISC-V instruction set architecture (ISA), and is highly optimized for AMD FPGAs. The fact that worries me the most is that even in such a giant company like AMD (former Xilinx) the advanced soft-core CPU has weak performance. If one looks at the link I have given in the previous post:
https://www.jblopen.com/microblaze-benchmarks-part-1-coremark-performance/
then can see that without caches (DDR memory) the "coremark" is even worse at 0.040 iter/sec/1MHz. I planned to use NOERV32 soft-core in a "real" project, but now I am afraid if its performance would be enough.

Best Regards

colorado.rob · « **Reply #3 on:** November 07, 2023, 03:54:22 pm »

Why not use a more performance-oriented RISC-V core?

https://github.com/SpinalHDL/NaxRiscv

FlyingDutch · « **Reply #4 on:** November 07, 2023, 04:59:57 pm »

Yes with "coremark" score 5.02 Coremark/Mhz, it makes more sense to use it. I will try this soft-core.

Thanks @colorado.rob

colorado.rob · « **Reply #5 on:** November 07, 2023, 05:09:13 pm »

Quote from: FlyingDutch on November 07, 2023, 04:59:57 pm

Yes with "coremark" score 5.02 Coremark/Mhz, it makes more sense to use it. I will try this soft-core.

There's always a trade-off. It is almost certainly going to require more resources. And may end up with a lower Fmax. I don't frequently need a high-performance soft core. In fact I am tempted to try SERV (https://github.com/olofk/serv) the next time I need a soft core.

langwadt · « **Reply #6 on:** November 07, 2023, 05:13:32 pm »

Quote from: FlyingDutch on November 07, 2023, 02:48:34 pm

Hi,

MicroBlaze™ is AMD 32/64-bit RISC Harvard architecture soft processor core with a rich instruction set optimized for embedded applications - the old versions. Whereas
the "MicroBlaze V" processor is based on a 32-bit RISC-V instruction set architecture (ISA), and is highly optimized for AMD FPGAs. The fact that worries me the most is that even in such a giant company like AMD (former Xilinx) the advanced soft-core CPU has weak performance. If one looks at the link I have given in the previous post:
https://www.jblopen.com/microblaze-benchmarks-part-1-coremark-performance/
then can see that without caches (DDR memory) the "coremark" is even worse at 0.040 iter/sec/1MHz. I planned to use NOERV32 soft-core in a "real" project, but now I am afraid if its performance would be enough.

Best Regards

if memory is the bottleneck, is "per MHz" really a valid measure? run it slower so the memory can keep up and you'll a "better" performance

ejeffrey · « **Reply #7 on:** November 07, 2023, 05:30:54 pm »

Quote from: FlyingDutch on November 07, 2023, 02:48:34 pm

The fact that worries me the most is that even in such a giant company like AMD (former Xilinx) the advanced soft-core CPU has weak performance.

This suggests a significant misunderstanding of the situation. Microblaze (and presumably Microblaze V) is not an "advanced" cores in that sense. It is a single issue, in-order, pipelined CPU. It's microarchitecture would have been "state of the art" in the late 80s or early 90s. What it is, is highly optimized to get good performance/area at a reasonable clock frequency on a Xilinx FPGA. It takes about 1000 LUTs (look up tables) depending on the configuration. NaxRiscV is a super-scalar, out-or-order executing CPU with register renaming. It acheives considerably better performance than Microblaze but at the cost of 12000 LUTs, well over 10x the resource usage of the microblaze. This doesn't make one better than the other, it means they are designed with different goals. The small area cores are vastly more commonly used in FPGAs than large super-scalar OoO processors since they leave most of the logic area for your actual custom logic, and hard processors are always going to be much faster.

Quote

then can see that without caches (DDR memory) the "coremark" is even worse at 0.040 iter/sec/1MHz. I planned to use NOERV32 soft-core in a "real" project, but now I am afraid if its performance would be enough.

Sure, any CPU will be slow if you remove all the caches. If you somehow disabled the caches in a modern x86-64 CPU it probably wouldn't run much faster than that. But that's a silly thing to contemplate: basically the entire thing is caches: not just the L1/L2/L3 cache and the TLB cache, but caches for decoded instructions, caches for branch prediction, store buffers, caches for speculative results, and so on. The actual ALUs are a tiny fraction of the actual area.

asmi · « **Reply #8 on:** November 07, 2023, 06:35:16 pm »

Yeah, in 99% cases softcore in FPGA designs is used to orchestrate other hardware blocks, and to manage HMI. You don't need much in a way of performance for that. And when you do - that's what Zynqs are for.

SiliconWizard · « **Reply #9 on:** November 07, 2023, 08:51:16 pm »

Quote from: colorado.rob on November 07, 2023, 05:09:13 pm

Quote from: FlyingDutch on November 07, 2023, 04:59:57 pm
Yes with "coremark" score 5.02 Coremark/Mhz, it makes more sense to use it. I will try this soft-core.
There's always a trade-off. It is almost certainly going to require more resources. And may end up with a lower Fmax. I don't frequently need a high-performance soft core. In fact I am tempted to try SERV (https://github.com/olofk/serv) the next time I need a soft core.

Yes, they are giving figures there: https://spinalhdl.github.io/NaxRiscv-Rtd/main/NaxRiscv/performance/index.html
For the performance configuration, that yields 13.3 KLUTs on a Artix-7. When simpler RISC-V cores rarely exceed 1K to 2KLUTs. So, obviously, your decision, depending on the target FPGA and your application.
Its Fmax is 155 MHz also on a Artix-7 (which is a mid-range FPGA). So even if 5 Coremark/MHz is impressive, you'll get the same performance with a much simpler core with half the Coremark/MHz running at twice the Fmax.
Synthesizing it on silicon would be pretty interesting, though.

Also, if you ever want to understand its code or modify it, it's SpinalHDL so you'll have a learning curve before you're even able to understand the first line of code.

But yes, in general what you need entirely depends on the context. There's no way we can provide a general answer other than this.

ejeffrey · « **Reply #10 on:** November 07, 2023, 09:59:02 pm »

Quote from: SiliconWizard on November 07, 2023, 08:51:16 pm

For the performance configuration, that yields 13.3 KLUTs on a Artix-7. When simpler RISC-V cores rarely exceed 1K to 2KLUTs. So, obviously, your decision, depending on the target FPGA and your application.
Its Fmax is 155 MHz also on a Artix-7 (which is a mid-range FPGA). So even if 5 Coremark/MHz is impressive, you'll get the same performance with a much simpler core with half the Coremark/MHz running at twice the Fmax.

Indeed, VexRiscV by the same author in max perf configuration gets 200 MHz, 2.57 Coremark/MHz, and only ~2k LUT on the same artix 7 FPGA. So the OoO processor is 50% faster for 6x the area.

Another factor especially with FPGAs is that you might actually want to optimize for frequency -- for instance if you want to have the CPU and custom logic in the same clock domain, and you don't want to slow down the clock.

brucehoult · « **Reply #11 on:** November 07, 2023, 10:04:59 pm »

Quote from: SiliconWizard on November 07, 2023, 08:51:16 pm

For the performance configuration, that yields 13.3 KLUTs on a Artix-7. When simpler RISC-V cores rarely exceed 1K to 2KLUTs. So, obviously, your decision, depending on the target FPGA and your application.

And then there is SeRV, implementing RV32I using 125 LUTs and 164 FF on Artix-7.

It's bit-serial. Most instructions take 32 cycles, some such as shifts and branches take 64 cycles. Registers are stored in part of the RAM space.

But it's very small. And fast enough for many FPGA purposes. And has a high fmax, so doesn't limit the rest of your design.

QeRV was just announced this month (week). It's the same but widening the data path from 1 bit to 4 bits (like z80 :-) ). It's 3x faster at a cost of an extra 15% LUTs (so 145 or so?)

Someone · « **Reply #12 on:** November 07, 2023, 10:58:21 pm »

Quote from: brucehoult on November 07, 2023, 10:04:59 pm

Quote from: SiliconWizard on November 07, 2023, 08:51:16 pm
For the performance configuration, that yields 13.3 KLUTs on a Artix-7. When simpler RISC-V cores rarely exceed 1K to 2KLUTs. So, obviously, your decision, depending on the target FPGA and your application.
And then there is SeRV, implementing RV32I using 125 LUTs and 164 FF on Artix-7.

Sure, so make a metric like coremark/1k.lut or something.....

SiliconWizard · « **Reply #13 on:** November 08, 2023, 02:36:07 am »

"Serial" CPUs are fun (and yes, very economical).

colorado.rob · « **Reply #14 on:** November 08, 2023, 05:39:36 pm »

Quote from: brucehoult on November 07, 2023, 10:04:59 pm

QeRV was just announced this month (week). It's the same but widening the data path from 1 bit to 4 bits (like z80 :-) ). It's 3x faster at a cost of an extra 15% LUTs (so 145 or so?)

And that's why I come here. I've not heard of QERV prior to today. Thank you.

https://riscv.org/news/2023/11/qamcom-boosts-risc-v-beyond-the-edge-with-qerv/


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: NEORV32 (RISC-V) soft-core "coremark" result (Read 4366 times)

FlyingDutch

NEORV32 (RISC-V) soft-core "coremark" result

betocool

Re: NEORV32 (RISC-V) soft-core "coremark" result

FlyingDutch

Re: NEORV32 (RISC-V) soft-core "coremark" result

colorado.rob

Re: NEORV32 (RISC-V) soft-core "coremark" result

FlyingDutch

Re: NEORV32 (RISC-V) soft-core "coremark" result

colorado.rob

Re: NEORV32 (RISC-V) soft-core "coremark" result

langwadt

Re: NEORV32 (RISC-V) soft-core "coremark" result

ejeffrey

Re: NEORV32 (RISC-V) soft-core "coremark" result

asmi

Re: NEORV32 (RISC-V) soft-core "coremark" result

SiliconWizard

Re: NEORV32 (RISC-V) soft-core "coremark" result

ejeffrey

Re: NEORV32 (RISC-V) soft-core "coremark" result

brucehoult

Re: NEORV32 (RISC-V) soft-core "coremark" result

Someone

Re: NEORV32 (RISC-V) soft-core "coremark" result

SiliconWizard

Re: NEORV32 (RISC-V) soft-core "coremark" result

colorado.rob

Re: NEORV32 (RISC-V) soft-core "coremark" result

Share me