Author Topic: VexRISC V - increasing pipeline depth (Read 4368 times)

glenenglish · « **on:** December 28, 2023, 03:34:09 am »

Anyone done this- increase pipeline depth from 5 to 6 stages?

The 5 stage pipeline really puts the screws on the timing in the FPU. Might simply need some target architecture mods. Current supplied implementation runs 23 logic levels deep on LUTs !

I will learn Spinal HDL it seems...
-glen
see attachement for pipeline / FPU steps.

SiliconWizard · « **Reply #1 on:** January 01, 2024, 01:40:06 am »

I've designed a RISCV core originally with a 5-stage pipeline and had to change it to 6-stage to reach higher Fmax as well, so I get the need.
Unfortunately, I don't know much about VexRISCV nor SpinalHDL. And from the lack of replies you got, I'm guessing that no one does here, or at least not enough to help.

The pipeline looks like a pretty textbook 5-stage RISC pipeline. In your case, I guess the stage to focus on is the Execute stage. Could be split in two.
But wouldn't just pipelining the FPU itself and keeping the original pipeline for the core a better approach here?

dolbeau · « **Reply #2 on:** January 01, 2024, 08:53:35 am »

Quote from: glenenglish on December 28, 2023, 03:34:09 am

Anyone done this- increase pipeline depth from 5 to 6 stages?
The 5 stage pipeline really puts the screws on the timing in the FPU.

You might raise the issue in the repo, you're more likely to find some answers there. Depending on how urgent/important this is to you, you could fund the improvements to VexRiscv, that's how some of the features happen (e.g. the debug stuff was funded by Efinix).

asmi · « **Reply #3 on:** January 08, 2024, 12:35:33 am »

Quote from: SiliconWizard on January 01, 2024, 01:40:06 am

Unfortunately, I don't know much about VexRISCV nor SpinalHDL. And from the lack of replies you got, I'm guessing that no one does here, or at least not enough to help.

This is why I would never touch anything not designed using conventional HDLs with a long pole.

Quote from: SiliconWizard on January 01, 2024, 01:40:06 am

The pipeline looks like a pretty textbook 5-stage RISC pipeline. In your case, I guess the stage to focus on is the Execute stage. Could be split in two.

This is often the hardest pipeline stage to pipeline because it requires many downstream changes like additional stall/forward logic (which would not be needed if originally it was 1 cycle stage).

SiliconWizard · « **Reply #4 on:** January 08, 2024, 12:45:27 am »

Quote from: asmi on January 08, 2024, 12:35:33 am

This is often the hardest pipeline stage to pipeline because it requires many downstream changes like additional stall/forward logic (which would not be needed if originally it was 1 cycle stage).

Yes. Which is why I then suggested to focus on the FPU itself if it's causing timing issues. From the posted diagram, the FPU already appears to be pipelined (which is pretty much unavoidable here anyway), so just adding a stage or modifying the existing stages a bit in the FPU should be enough and also much easier to do. (Didn't say easy, and that will still require to learn SpinalHDL first.)
I'd be surprised if that wasn't already possible to configure the number of pipeline stages in the FPU as it is though.

I'm also not a big fan of those non-standard HDLs, but as SiFive started it (I think), a number of open-source RISC-V cores are written with SpinalHDL, so I can understand why some would like to reuse that.

brucehoult · « **Reply #5 on:** January 08, 2024, 01:30:48 am »

Quote from: SiliconWizard on January 08, 2024, 12:45:27 am

I'm also not a big fan of those non-standard HDLs, but as SiFive started it (I think), a number of open-source RISC-V cores are written with SpinalHDL, so I can understand why some would like to reuse that.

SpinalHDL has nothing to do with SiFive. They use Chisel, which was first released three years before SiFive was founded.

The advantages of Chisel over Verilog are similar to the advantages of C over assembly language. Chisel produces an IL called FIRRTL (Flexible Intermediate Representation for RTL). The FIRRTL can be fed through various tools, including the CIRCT optimiser (which uses LLVM, with involvement from Chris Lattner, especially in the period when he worked for SiFive). The tools can optimise the HDL for different targets e.g. FPGA vs ASIC, and can output different formats e.g. Verilog or VHDL.

High level design languages such as Chisel and SpinalHDL make it much easier to manipulate your design, and things such as modifying which things are in which pipe stage is a prime example of that -- in much the same way that it's far easier to combine / split / inline functions in C than in asm.

Can you get better results in asm or Verilog? Sure, given enough people and time, both of which are in short supply, especially as µarch get more complex.

Resisting the use of Chisel (and to a lesser extent others) now is about as short-sighted as refusing to use C on a VAX or 68k or MIPS in the 1980s. Using Chisel is a huge part of the reason SiFive has been able to go from cores comparable to ARM7TDMI (e.g. E31) to comparable to Cortex-X2/X3 (with designs similar to M0, M3, M4, M7, A35, A55, A72, A76, A78 falling out along the way) in seven years vs the thirty years it took Arm, with 10% as many employees.

ejeffrey · « **Reply #6 on:** January 08, 2024, 04:11:03 am »

Quote from: asmi on January 08, 2024, 12:35:33 am

Quote from: SiliconWizard on January 01, 2024, 01:40:06 am
Unfortunately, I don't know much about VexRISCV nor SpinalHDL. And from the lack of replies you got, I'm guessing that no one does here, or at least not enough to help.
This is why I would never touch anything not designed using conventional HDLs with a long pole.

I know Verilog but not spinalHDL or scala yet when I had to make some changes to vexriscv I found it dramatically easier to figure out what to change than any Verilog processor core I have looked at. And the level of out of the box customization that actually works with vexriscv is for all practical purposes impossible in conventional HDLs. If you look at highly configurable cores "written in" Verilog they are an usually unholy abomination of perl, tcl, and Verilog because the code generation features in verilog are a laughable joke that were frankly unacceptable in the 80s when Verilog was designed, and hasn't really gotten better.

I understand the reservations people have about chisel / spinalHDL and other similar higher level RTL tools, and that they are worried about the tooling and verification support and so on. That all makes sense, as well as the issue that there are a lot more HDL engineers than those that know Verilog and VHDL than the proposed replacement of the week. But spinalHDL (and chisel) compiles to Verilog, and using an actual sane programming language to describe your code generation is better than using a pile of ad hoc scripts using string manupulation to do the same thing.

laugensalm · « **Reply #7 on:** January 08, 2024, 09:47:19 pm »

I've ended with up to 8 stages to handle some off the standard instructions for some experiments (borrowed from the excellent Blackfin architecture), so I guess this is not such a bad thing to do, if you can arrange instructions such that branch penalty is low or you're in the situation where you can use hardware loops. I've also played with variable length pipelines, but that can get really messy and is prone to cause congesting hazard logic. And after all, you might not want to have to touch the compiler toolchain for your custom extensions.

Indeed this gets terribly complex a.k.a. unmaintainable with classic V*HDLs and the choice of a matching high level HDL depends mostly on existing designs or the desired transfer language. I personally prefer Python over Scala though.

Nowadays I'd tend to assume that the functional programming approach is the path with least overhead for several reasons, one of them the fact that code can be *run* (versus AST-parsed like VHDL) in order to simulate or transpile to a synthesizeable intermediate language (be it Verilog or some other RTL description). But this is getting off the actual topic.

dolbeau · « **Reply #8 on:** January 15, 2024, 03:12:01 pm »

Quote from: ejeffrey on January 08, 2024, 04:11:03 am

I know Verilog but not spinalHDL or scala yet when I had to make some changes to vexriscv I found it dramatically easier to figure out what to change than any Verilog processor core I have looked at. And the level of out of the box customization that actually works with vexriscv is for all practical purposes impossible in conventional HDLs.

Same here. I had never worked with SpinalHDL, or really any HDL (being a SW guy), and yet I was able to implement B, K (minus the random generator) and some of P in the core. And that includes a third read port to the register file and a split (even/odd) write port for double-width (64-bits) output. My own variant of the core for FB acceleration includes custom operation like a small 4x8 bits vector operator doing [a*b/255] in parallel (to support acceleration of Xrender) and double-register (64 bits) load and store to the cache...

coppice · « **Reply #9 on:** January 15, 2024, 03:25:11 pm »

Quote from: laugensalm on January 08, 2024, 09:47:19 pm

(borrowed from the excellent Blackfin architecture)

I wouldn't call Blackfin excellent. An interesting experiment in general purpose/DSP focussed hybrids that ultimately ran out of steam might be more descriptive. It tried to scratch the same itch that lead to the Freescale 56800 and TI C2000, but tried to scale the idea for larger applications.

glenenglish · « **Reply #10 on:** January 16, 2024, 04:17:04 am »

Thanks all to the responses.
wow there is quite an experienced team here.
I went chasing on
https://github.com/SpinalHDL/ and https://app.gitter.im/#/room/#SpinalHDL_VexRiscv:gitter.im
and found all the design folks there.
indeed there is also a VexiiRISC (in order, multiple issue) , and a Nexvex that is an out of order multiple issue RISCV that makes 4.5 coremarks....

The custom instruction stuff is the most exciting aspect of RISCV- for me at least. I'm going to persevere with Efinix to see how that goes. It's been pointed out to me that the lack of distributed RAM in the Efinix fabric is a bit of an issue, and indeed I did fid this compared to XIlinx in my SDR implementation synthesis tests. It takes alot of XLR cells to make a 24 wide x 4 deep RAM (for a elastic FIFO) . Not quite so bad for my pipelined designed- the logic can go out to a few block rams but this chews up alot of resources getting there and back. Obviously this was an Efinix fabric tradeoff. -glen


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: VexRISC V - increasing pipeline depth (Read 4368 times)

glenenglish

VexRISC V - increasing pipeline depth

SiliconWizard

Re: VexRISC V - increasing pipeline depth

dolbeau

Re: VexRISC V - increasing pipeline depth

asmi

Re: VexRISC V - increasing pipeline depth

SiliconWizard

Re: VexRISC V - increasing pipeline depth

brucehoult

Re: VexRISC V - increasing pipeline depth

ejeffrey

Re: VexRISC V - increasing pipeline depth

laugensalm

Re: VexRISC V - increasing pipeline depth

dolbeau

Re: VexRISC V - increasing pipeline depth

coppice

Re: VexRISC V - increasing pipeline depth

glenenglish

Re: VexRISC V - increasing pipeline depth

Share me