Author Topic: Typical speed of FPGAs (Read 27689 times)

legacy · « **Reply #25 on:** August 04, 2017, 03:30:47 pm »

Yup.

"booth" is another interesting algorithm composed by parallel multipliers for integer signed numbers.

Formally it computes MAC (multiply and accumulate), in this way

z = (x*y) + u

u: n+1 bits
x: n+1 bits
y: m+1 bits
z: n+m+1 bits

It's interesting because it can boosted by DSP slices if your fpga happens to have them available

legacy · « **Reply #26 on:** August 04, 2017, 04:02:30 pm »

Quote from: hans on August 04, 2017, 12:59:36 pm

15k LE generated just for that operation. Whoops!

eheheh, I had the same shock when I tried to implement a modified version of "bkm".

I have been researching a good way to calculate the complex exponential of a complex number, since the result is interesting!

ans=cmplxexp(x0.re, x0.im)

With a pure imaginary part (x0.re = 0), you get a trigonometric couple of function, { cos(phi), sin(phi) }
With a pure real part (x0.im = 0), you get the real exponential

Combining them you can get hyperbolic functons, { cosh(phi), sinh(phi) }
Manipulating them you can get the square root, logarithm, tan, tanh, etc, etc

It's a very powerful block of math which offers a lot of function-implementations!

But! It's damn complex, unstable(1), and even staying with the whole arithmetic was fixed point, QN8.24, the algorithm requires nine huge LUT full of values of pre- calculated function sampled at specific points, plus a lot of correction value to help the algorithm to stay stable.

In short it consumes a lot of BRAM.

(1) this problem has been solved recently (with not formally) with an accelerant serie accelerating the convergence at the cost of introducing distortions. You need a control accelerant serie to smooth them, but it works stable, and it's damn fast!

32bit data size? it takes 32 clock cycles! And you get the result on the 33th!

Bad news: it also consumes a lot of multiplier and adder, and a lot of logic. Something like 20K LE (definitively too much area!!!), with a maximal speed of 130MHz on my Spartan3E

~ ~ ~ ~ ~ ~

Reading papers, I discovered that Intel has a similar technology implemented in their CPU starting from 80487. It was BKM-base, but their papers were never published as they are "industrial secrets". So, I wonder how did they solved the convergence problem? And how did they implemented it without wasting an area of silicon that, in my case, on fpga, it would take five time the area of the whole CPU_core ?

Intel is a commercial company. There will be no answer to my questions, neither their modified BKM algorithm will be published or explained in details.

A pity, but that's life

Conclusion:
since I need to implement the whole softcore, and since it eats resources, I am limited about resources available for the Math-(fixedpoint)-CoProcessor, thus I am implementing a soft version of Cordic, which just computes the two most used trigonometric functions: { COS, SIN }.

In the future, I will try to put the modified-BKM inside a dedicated fpga, like if it was a "80487" companion chip

joeqsmith · « **Reply #27 on:** August 04, 2017, 04:31:38 pm »

Quote from: suicidaleggroll on August 04, 2017, 02:55:30 pm

Quote from: joeqsmith on August 04, 2017, 01:49:24 am
I bought a Digilent ARTY board about a year ago to play with the new tools from Xilinx. It was fairly inexpensive and came with a voucher for the license (really the reason I wanted it).

FYI - that voucher is pretty pointless. All of the Xilinx Artix models are covered under the free Webpack license, which does basically everything Design Edition does:
https://www.xilinx.com/products/design-tools/vivado/vivado-webpack.html

I think when I had checked into it, the ILA was not included.

rstofer · « **Reply #28 on:** August 04, 2017, 05:47:44 pm »

For the Artix 7, Xilinx has published timing specs

For the definitions - the CLB definitions start at page 58

https://www.xilinx.com/support/documentation/user_guides/ug474_7Series_CLB.pdf

For the values - see page 27 for details about the CLBs themselves, not including anything else in the path:

https://www.xilinx.com/support/documentation/data_sheets/ds181_Artix_7_Data_Sheet.pdf

I have no idea what to do with that information. Timing is probably swamped in routing and unless I spend a lifetime figuring out how the routing algorithm works, there probably isn't a lot I can do about it. What I could do is define my project in terms of CLBs and place the CLBs where I think is best and then let the router make the interconnects. Or, I can just hope that Xilinx knows best how to synthesize, place and route.

I'm going to take the easy way out and let the toolchain do the work. My major CPU project runs at 50 MHz and it wouldn't be a stretch to get it to 100 MHz. The CPU isn't pipelined in any way and it is a straightforward implementation of the instructions, as described in the original documentation. The original computer, released in '65, ran at about 400 kHz. I'm running 125x and all of the original software runs unaltered.

That's the real issue: Software! So you have this really great CPU, but do you have a native operating system, compilers, macro assembler, file system, input and output peripherals and an entire library full of application code? If not, why build it? If the CPU doesn't evolve into a fully function SYSTEM, why bother?

BTW, the way you beat delay accumulation is to pipeline. Do a small it of work and register the result. The following stage uses data in the register to do a small bit of work and registers that result. It's a PITA to work out the details but that's the way you get speed.

As a first approximation to FPGA speed, look at the price. There a reason that Xilinx Virtex devices cost more than their Spartan devices. Part of it is speed, part of it is BlockRAM size and a lot of it is IO functionality. At some point, increases in CLB propagation delays simply aren't possible so the manufacturers try other ideas to get overall throughput up.

rstofer · « **Reply #29 on:** August 04, 2017, 06:11:03 pm »

Quote from: joeqsmith on August 04, 2017, 04:31:38 pm

Quote from: suicidaleggroll on August 04, 2017, 02:55:30 pm
Quote from: joeqsmith on August 04, 2017, 01:49:24 am
I bought a Digilent ARTY board about a year ago to play with the new tools from Xilinx. It was fairly inexpensive and came with a voucher for the license (really the reason I wanted it).

FYI - that voucher is pretty pointless. All of the Xilinx Artix models are covered under the free Webpack license, which does basically everything Design Edition does:
https://www.xilinx.com/products/design-tools/vivado/vivado-webpack.html

I think when I had checked into it, the ILA was not included.

Yes, the ILA feature is included, even in the WebPack edition. I have run a couple of examples and, if I could ever figure out the constraints file, it might be a useful tool.

The other nice thing about Vivado WebPack is that it includes a lot of IP cores including Microblaze and several peripherals including everything required to implement Ethernet. Not every Microblaze peripheral is included and in some cases there are limitations. One example is that 100 Mb Ethernet is included free, 1000 Mb is not.

For those who have used Eclipse and the GNU toolchain, the SDK will have a comfortable feel. I haven't spent any time with the System Debugger other than to note that it holds the CPU in reset until the Run button is pushed (on the SDK) and the UART communicates with the SDK console (or any other terminal application), or, at least it does on the Digilent Nexys4 DDR board.

I'm not too sure how fussy Xilinx is about the voucher. I'm about to find out since I am building a new PC to accommodate Vivado. The process of regenerating the block layout and creating a bitfile is truly grim on my Quad Core 2.8 GHz I7 680. I can't afford REAL speed but I thought a Quad Core 4.2 GHz I7 1770K would be a step up. Theoretically, I can overclock that chip up to 5 GHz so Vivado would only be half as grim. Still grim, but only half as grim as it is today.

I was looking at the new AMD Ryzen Threadripper 1950X 16 core 32 thread chip (what a beast!) but Vivado will only use 8 threads and I don't do a lot of multitasking so what's the point? Still, sweet chip!

I really miss ISE (sigh)... And, yes, it's still installed to support my Spartan 3 projects.

brucehoult · « **Reply #30 on:** August 04, 2017, 06:41:44 pm »

Quote from: rstofer on August 04, 2017, 06:11:03 pm

The other nice thing about Vivado WebPack is that it includes a lot of IP cores including Microblaze and several peripherals including everything required to implement Ethernet. Not every Microblaze peripheral is included and in some cases there are limitations. One example is that 100 Mb Ethernet is included free, 1000 Mb is not.

I'm hoping/assuming one can use those ethernet and DDR memory etc IP blocks with your own designs, not only MIcroblaze (or ARM in the case or Zynq).

Quote

The process of regenerating the block layout and creating a bitfile is truly grim on my Quad Core 2.8 GHz I7 680. I can't afford REAL speed but I thought a Quad Core 4.2 GHz I7 1770K would be a step up. Theoretically, I can overclock that chip up to 5 GHz so Vivado would only be half as grim. Still grim, but only half as grim as it is today.

Assume you mean 860, not 680. I build a machine with one of those in January 2010. It was damn nice at the time, easily thrashing an 8 core 2.26 GHz Mac Pro I'd been loaned before it for pretty much everything. I overclocked it to 3.44 GHz base speed (3.6 speedstep). Geekbench3 10625 https://browser.geekbench.com/geekbench3/332494

I went from that to a 4790K, geekbench3 around 18000, so a big step up https://browser.geekbench.com/geekbench3/1160100

And then after I moved countries and didn't take that still very decent machine with me, I've built a 6700K, geekbench3 20300 https://browser.geekbench.com/geekbench3/5266295

Not that big a step from the 4790K to the 6700K, but it ends up near to twice the speed of the 860.

The really big step was from Nehalem to Ivy Bridge. It's only been tweaks since then.

rstofer · « **Reply #31 on:** August 04, 2017, 07:00:09 pm »

Quote from: hans on August 04, 2017, 12:59:36 pm

Just for kicks I did a straight multiply/divide in VHDL using 32-bit operands. I think it resulted in about 15k LE generated just for that operation. Whoops! I think our complete CPU only took like 3k LE, although that was still quite mediocre.

But you can get 18x18 signed multiply in a single cycle by using the internal multipliers. No such help for the division. It took me a very long time to find the algorithm for non-restoring signed 32/16 division. I finally found it is a book that just so happened to include pseudo microcode. I just translated it and its been working since.

I must be dense; I just couldn't wrap my head around some of the published algorithms. Too darn old...

For the Spartan 3, Xilinx has a document about multipliers and at page 6, describes a 35x35 design that looks like it requires 3 stages of pipelining. Of course, if there are a lot of things to multiply, they still get a new result on every clock.

https://www.xilinx.com/support/documentation/application_notes/xapp467.pdf

Division is just plain hard.

rstofer · « **Reply #32 on:** August 04, 2017, 07:41:47 pm »

In terms of Vivado, here's a link to most of the reference material. There will be a test later!

https://www.xilinx.com/products/design-tools/vivado.html?resultsTablePreSelect=documenttype:SeeAll#documentation

joeqsmith · « **Reply #33 on:** August 04, 2017, 10:43:58 pm »

Quote from: rstofer on August 04, 2017, 06:11:03 pm

Quote from: joeqsmith on August 04, 2017, 04:31:38 pm
Quote from: suicidaleggroll on August 04, 2017, 02:55:30 pm
Quote from: joeqsmith on August 04, 2017, 01:49:24 am
I bought a Digilent ARTY board about a year ago to play with the new tools from Xilinx. It was fairly inexpensive and came with a voucher for the license (really the reason I wanted it).

FYI - that voucher is pretty pointless. All of the Xilinx Artix models are covered under the free Webpack license, which does basically everything Design Edition does:
https://www.xilinx.com/products/design-tools/vivado/vivado-webpack.html

I think when I had checked into it, the ILA was not included.

Yes, the ILA feature is included, even in the WebPack edition. I have run a couple of examples and, if I could ever figure out the constraints file, it might be a useful tool.

Now, but again I don't believe it was included in the early releases.

Quote from: rstofer on August 04, 2017, 06:11:03 pm

I was looking at the new AMD Ryzen Threadripper 1950X 16 core 32 thread chip (what a beast!) but Vivado will only use 8 threads and I don't do a lot of multitasking so what's the point? Still, sweet chip!

That is strange, my voucher license allows for 12. Is this a limit of the Webpack? Maybe they removed the limit with the newer release.

rstofer · « **Reply #34 on:** August 04, 2017, 11:43:21 pm »

OK, right off the bat, I am not an expert on Xilinx licensing! Even doing version upgrades terrifies me because I know I will need to get a new license and although the web site is vastly improved, it is still far from intuitive. But it works! At least it has so far. I just need to remember to include the checkbox for ISE 14.7 so license file covers both platforms.

Looking at the Digilent site, I don't see any reference to the voucher. I wonder if that is a thing of the past?

I agree, I don't think ILA was available early on. I'm not sure how much of the IP was available either. The fact that a simulator is included in the WebPack is quite a step up from ISE. I don't use simulation but it's nice to know it is there.

At some point, Xilinx realizes they are in the chip business, not the software business. I think the are beginning to loosen up. I realize they have a dumpster load of money involved in creating Vivado but that was then, this is now. It's a sunk cost, get over it! Get the software out there, it's not like it helps your competitor. The more people are talking about it, the more sales will eventually follow (except in my case). Especially make student versions available. Graduates tend to bring what they know.

I haven't been a big fan of Vivado. It is terribly slow, attention span shattering slow, VASTLY slower than ISE but maybe with more CPU horsepower I can overcome that. The workflow is pretty smooth and once I get more comfortable with the IDE, I'm sure I'll start to like it a lot more.

And I don't understand TCL. I have no idea how to create an XDC file! I think I may actually have to read a manual. Manuals are boring! I can copy and paste signal definitions but when adding the ILA, I have no idea what I am doing. The IP automation helps with some of it but I recall having to mess around in there to get it to work. Time to hit the books!

joeqsmith · « **Reply #35 on:** August 05, 2017, 05:11:07 am »

Quote from: rstofer on August 04, 2017, 07:41:47 pm

In terms of Vivado, here's a link to most of the reference material. There will be a test later!

https://www.xilinx.com/products/design-tools/vivado.html?resultsTablePreSelect=documenttype:SeeAll#documentation

Just install DocNav.

Quote from: rstofer on August 04, 2017, 11:43:21 pm

OK, right off the bat, I am not an expert on Xilinx licensing! Even doing version upgrades terrifies me because I know I will need to get a new license and although the web site is vastly improved, it is still far from intuitive. But it works! At least it has so far. I just need to remember to include the checkbox for ISE 14.7 so license file covers both platforms.

Looking at the Digilent site, I don't see any reference to the voucher. I wonder if that is a thing of the past?

Yes, I believe they now reference the webpack. If they really limit the number of CPUs with the webpack as you suggest, that seems like a miss on their part because again, I do not see a limit with the voucher license.

Quote from: rstofer on August 04, 2017, 07:41:47 pm

I agree, I don't think ILA was available early on. I'm not sure how much of the IP was available either. The fact that a simulator is included in the WebPack is quite a step up from ISE. I don't use simulation but it's nice to know it is there.

For the most part I was using a third part simulator until iSim. The new version with Vivado is fairly nice. Still no code coverage from the little I played with it but looks like they have made some good progress with it over the old ISE. Watching it simulate a fairly long run now. I need to download the 2017 chain. I am just playing with the 2015 for now.

Quote from: rstofer on August 04, 2017, 07:41:47 pm

At some point, Xilinx realizes they are in the chip business, not the software business. I think the are beginning to loosen up. I realize they have a dumpster load of money involved in creating Vivado but that was then, this is now. It's a sunk cost, get over it! Get the software out there, it's not like it helps your competitor. The more people are talking about it, the more sales will eventually follow (except in my case). Especially make student versions available. Graduates tend to bring what they know.

I've been hearing these sort of comments for 20 years now. Things have changed. Some of the early Altera CPLD tools (DOS based schematic) were going to be well out of the reach of a hobbyist. Things have really improved for us. Third party simulators have had analog view for some time. Screen shot of the sim I am currently running. I went through the GPIO and microblaze Ethernet demos but then starting working on my own. I'm still at the blinking LED stage but making some progress. Really just trying to learn the basic flow with the simulation and ila. The actual ARTY board just sits there. Need to hook a scope up to it or something.

Quote from: rstofer on August 04, 2017, 07:41:47 pm

I haven't been a big fan of Vivado. It is terribly slow, attention span shattering slow, VASTLY slower than ISE but maybe with more CPU horsepower I can overcome that. The workflow is pretty smooth and once I get more comfortable with the IDE, I'm sure I'll start to like it a lot more.

And I don't understand TCL. I have no idea how to create an XDC file! I think I may actually have to read a manual. Manuals are boring! I can copy and paste signal definitions but when adding the ILA, I have no idea what I am doing. The IP automation helps with some of it but I recall having to mess around in there to get it to work. Time to hit the books!

I agree about the execution times of Vivado but that seems to be the norm with most tools I use. Of course the PCs get faster and the capability of the tools improves.

joeqsmith · « **Reply #36 on:** August 05, 2017, 12:25:52 pm »

Woke up and it actually made it through the test script without crashing. A little over 2 hours to simulate. Not bad. Using the analog feature appears to slow down the waveform viewing.

Adding the ila also seems to add a fair amount of time but at least for this very simple design, seems to work fine when looking at the three LED outputs.

I have seen the tool crash. A friend of mine told me how the MIG will crash if running under Win10. I was able to replicate the fault. Running the tools under Win 7 seems to be fine, which I verified as well. They are running two PCs. Generating the MIG under Win7, then loading the results onto the newer PC to work around the problem.

Attached the scope to the Arty's LEDs using 10X probes and the 6" ground clips. To the untrained eye, it's a little hard to tell this looks anything like the simulation or ila.

Maybe its time to move beyond the blinking LEDs. The Arty has some controlled differential pairs.

joeqsmith · « **Reply #37 on:** August 05, 2017, 03:39:58 pm »

Still the flying leads but at least its not off the LEDs and now you can make out the waveform patterns.

Really not a bad board for getting your feet wet with the tools.

rstofer · « **Reply #38 on:** August 05, 2017, 04:33:52 pm »

It's a good thing I didn't know that the MIG would crash on Win 10. The Microblaze EchoServer example at Digilent used the MIG to interface with the DDR and it runs all code from the DDR. I built it using Vivado 17.2 on Win 10.

Not to say there isn't some condition(s) under which it croaks under Win 10 but at least for that particular experiment MIG worked fine.

joeqsmith · « **Reply #39 on:** August 05, 2017, 04:52:03 pm »

Quote from: rstofer on August 05, 2017, 04:33:52 pm

It's a good thing I didn't know that the MIG would crash on Win 10. The Microblaze EchoServer example at Digilent used the MIG to interface with the DDR and it runs all code from the DDR. I built it using Vivado 17.2 on Win 10.

Not to say there isn't some condition(s) under which it croaks under Win 10 but at least for that particular experiment MIG worked fine.

Maybe 17.2 corrected it. If you want to give it a try, open the MIG generator. Just use the defaults until you reach the select the Pin/Bank menu and select Fixed. Now select Read XDC/UCF. Let me know what happens.

rstofer · « **Reply #40 on:** August 05, 2017, 07:10:17 pm »

Quote from: joeqsmith on August 05, 2017, 04:52:03 pm

Quote from: rstofer on August 05, 2017, 04:33:52 pm
It's a good thing I didn't know that the MIG would crash on Win 10. The Microblaze EchoServer example at Digilent used the MIG to interface with the DDR and it runs all code from the DDR. I built it using Vivado 17.2 on Win 10.

Not to say there isn't some condition(s) under which it croaks under Win 10 but at least for that particular experiment MIG worked fine.

Maybe 17.2 corrected it. If you want to give it a try, open the MIG generator. Just use the defaults until you reach the select the Pin/Bank menu and select Fixed. Now select Read XDC/UCF. Let me know what happens.

I don't know what to expect. I don't get any opportunity to set defaults or select Read XDC.

I created a new project and then used "Create Block Design" to add the mig_7 gadget. I got only a box with clock and reset signals. Then I added a Microblaze and ran "Run Block Automation". Finally, the mig had the other signals and a bus connection to the DDR. I right clicked on DDR2 and clicked 'Make External' to bring those signals out of the FPGA.

I need to study up on board.prj files because most of the pin definitions are in this file and are provided, in this case, by Digilent.

I know it all works out because the Digilent Echo Server runs. The only .XDC file in the source tree is eth_ref_clk.xdc which I created and contains only a definition for the ethernet reference clock output.

I suspect I didn't address what you were aiming for. In any event, nothing bad happened and Vivado seems quite happy.

joeqsmith · « **Reply #41 on:** August 05, 2017, 08:22:30 pm »

No problem. You don't need to do anything really beyond opening the IP Catalog and run through the MIG. No need to play with the Microblaze or other IP. If you wanted to just play with their Ethernet Echo demo, just open the block diagram, select the MIG and customize. Again, just go through the menus until you get to the pin selection and then select the Read XDC/UCF. It's going to warn you about overwriting your settings but go ahead. What happens?

rstofer · « **Reply #42 on:** August 05, 2017, 10:33:39 pm »

Quote from: joeqsmith on August 05, 2017, 08:22:30 pm

No problem. You don't need to do anything really beyond opening the IP Catalog and run through the MIG. No need to play with the Microblaze or other IP. If you wanted to just play with their Ethernet Echo demo, just open the block diagram, select the MIG and customize. Again, just go through the menus until you get to the pin selection and then select the Read XDC/UCF. It's going to warn you about overwriting your settings but go ahead. What happens?

Error message "Failed to generate IP 'mig_7series_0'. Failed to generate 'Custom UI' outputs

joeqsmith · « **Reply #43 on:** August 05, 2017, 11:31:07 pm »

If you have Windows 7 with Vivado installed, you could try the same steps and see what happens. If everything repeats, you will get a different result.

rstofer · « **Reply #44 on:** August 06, 2017, 12:10:07 am »

I do have a version of Vivado on Win 7. I'll give it a try tomorrow. I doubt that it is version 2017.2 but maybe....

joeqsmith · « **Reply #45 on:** August 06, 2017, 12:41:19 am »

We have tried 2017.2 16.3 and 15.2. These all behaved the same on the two OS's.

rstofer · « **Reply #46 on:** August 06, 2017, 01:17:55 am »

OK, I got an error attempting to customize the IP when using Win 10 that doesn't seem to be a problem with Win 7.

That is regrettable! I hope Xilinx gets it worked out fairly soon.

From my perspective, it may not be an issue. MIG does work with the Microblaze for the Nexys4_DDR using the various automations- that is a given. Digilent also offers a DDR component for the Nexys4_DDR that makes the DDR memory look like SRAM. I will create a dummy project in the near future to see if the component works under Win 10.

My project probably uses a configuration very much like the EchoServer. The difference is that I need 4 or 5 open ports and some way to get the streams out of the Microblaze and into my VHDL project. That is yet to be determined. Apparently it is non-trivial.

I am buying a much faster computer specifically for Vivado (plus some other tools). I planned to use Win 10 but I guess I can still buy Win 7 on the 'grey' market.

joeqsmith · « **Reply #47 on:** August 06, 2017, 02:04:05 am »

If you go to a better computer, it will be interesting to see if the webpack is still locked on the number of cores it allows you to select. They do only show you the max for your PC. So if you have 1 core, they only show the option to select one. Maybe that's why the webpack seems limited.

rstofer · « **Reply #48 on:** August 06, 2017, 03:04:11 am »

I ran some tests on my 3.2 GHz I7-860 quad core machine and, indeed, Vivado WebPack uses 8 logical processors. It is consistent with the info in the linked manual, Page 7.

Synthesis doesn't seem to use multiple threads (and it shouldn't) but the other processes use all 8 as they should.

Place and route keep the CPU pretty busy. Synthesis only uses about 18% but that's probably right because it only uses one thread.

I did notice that when the load gets high (80%+), the CPU throttles back from 3.2 GHz to 2.9 GHz. Not the kind of thing I would have anticipated.

The new machine will be a 4.5 GHz I7-1770K with 32 GB of 4133 MHz DDR4. I don't plan to overclock it but it is known to run at 5 GHz. Cooling being an issue...

The big difference in the new machine is a SSD PCIe x4 with 1TB of room. I have great hopes for this, especially at boot time.

AFAICT, 8 logical processors is all that Vivado will use. I was looking at the AMD 16 core 32 thread chip but if Vivado won't use all the threads, why bother?

https://www.xilinx.com/support/documentation/sw_manuals/xilinx2016_1/ug904-vivado-implementation.pdf

NorthGuy · « **Reply #49 on:** August 06, 2017, 03:31:53 am »

My computer with Vivado installed has CPU overheat warning buzzer. When Vivado gets busy, it is buzzing non-stop. If you're building a computer for Vivado, you may consider some sort of after-market cooling (liquid cooling perhaps) especially if you want to overclock.

It is ridiculous that Vivado is so slow. Although Xilinx says it's "SuperFast". Go figure.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Typical speed of FPGAs (Read 27689 times)

Share me