Author Topic: Microblaze 64 bit (Read 4495 times)

asmi · « **on:** September 27, 2023, 08:44:27 pm »

Hey guys,

I've been recently checking out my new FPGA board which contains an SODIMM with 8 GBytes of DDR3, and naturally to get access to entire memory, I have to use 64 bit Microblaze. However I've noticed it to be utterly unstable:
1. Their "Memory Test" template from Vitis is simply broken (because they declare memory size as a u32 type, so obviously 8GB won't fit). It does passes once I fix it, however it only tests like first 1024 words, so a long way away from a true memory test. I did ended up modifying the application so it would actually perform a full memory test (which took AGES!), but I needed to confirm memory operation one way or another anyway.
2. None of lwip templates work, getting stuck at random locations of memory, and this location changes all the time, so I couldn't trace it to anything specific nor to find any logic in it.
3. Debugging is also a hit-and-miss, when stepping over some function in debugger CPU gets stuck somewhere, but if I place a breakpoint after that function and use "Run" command instead - it works just fine.

All those problems immediately go away if I use 32bit Microblaze (and of course I can only use 2GBytes DDR3 out of 8 GBytes due to address space limitations), so it doesn't seem that any of those problems are caused by anything board-related, which is the main thing I'm trying to establish at this moment.

With that I wonder, if anyone else has any experience with 64 bit MB to share - either good or bad, as I'm kind of at a loss at this point, not sure where to dig.

brucehoult · « **Reply #1 on:** September 28, 2023, 12:00:29 am »

Have you considered RISC-V? There are dozens of RISC-V cores, both 32 and 64 bit, which work fine in Xilinx FPGAs. You can find a lot of them (along with SoC framework to make them work) at...

https://github.com/enjoy-digital/litex

If you need professional support then there is at least (maybe others)...

https://info.bluespec.com/xilinx

asmi · « **Reply #2 on:** September 28, 2023, 12:11:34 am »

Quote from: brucehoult on September 28, 2023, 12:00:29 am

Have you considered RISC-V? There are dozens of RISC-V cores, both 32 and 64 bit, which work fine in Xilinx FPGAs. You can find a lot of them (along with SoC framework to make them work) at...

https://github.com/enjoy-digital/litex

If you need professional support then there is at least (maybe others)...

https://info.bluespec.com/xilinx

I have my own RISC-V core, so I don't need someone else's ones, but that's not what this question is about. None of them can offer infrastructure that's even close to what you get from Xilinx for free.

brucehoult · « **Reply #3 on:** September 28, 2023, 12:19:48 am »

Very good.

Sad about the "utterly unstable" part, then.

asmi · « **Reply #4 on:** September 28, 2023, 01:14:04 am »

Quote from: brucehoult on September 28, 2023, 12:19:48 am

Sad about the "utterly unstable" part, then.

This is what it seems for me, hence I want to hear about other people's experience with it, both good or bad as the case may be.

SiliconWizard · « **Reply #5 on:** September 28, 2023, 03:32:32 am »

I have used the 32-bit variant in the past without much problem, but never tried the 64-bit variant. I stumbled upon this issue though: https://support.xilinx.com/s/question/0D52E00006hpW4oSAE/64-bit-microblaze-u64-variable-assigned-wrong-values
It doesn't look too good at first sight. No clue how well the 64-bit variant has been tested by Xilinx.

luudee · « **Reply #6 on:** September 28, 2023, 05:56:56 am »

Regarding the lack of stability, did you properly constrain your design ?
Did you generate a Timing Report and made sure timing is met ?

I've been working with Microblaze for over 20 years, they tend to be
very stable, and debugging works great as well.

Regarding 32/64 bit: You don't need a 64 bit Microblaze just because
you are using more than 4GB address space. Microblaze does have
an MMU option ...

Good Luck !
rudi

asmi · « **Reply #7 on:** September 28, 2023, 06:35:27 am »

Mystery solved! It turned out to be a bug in a Windows version of a compiler

If you have a line like this:

Code: [Select]

u64 * mem = (u64*)0x20000000ULL;

In Windows that pointer will end up pointing to address zero, while in Ubuntu it initializes properly. It looks like Windows build of a compliler treats all literals as 32 bit numbers regardless of suffixes.
Will do some more tests using Ubuntu compiler tomorrow, but what I've seen so far easily explains all those weird gremlins I've came across during my week of testing...

SiliconWizard · « **Reply #8 on:** September 28, 2023, 07:12:01 am »

This may be a similar issue as with the post I linked to above.

I highly recommend getting rid of the U, UL, ULL suffixes and using stdint macros instead (yes, they have been there for over 20 years - C99 - but many if not most C programmers have probably never seen them for some reason.)

On the "problematic compiler", try this instead, and report back if you can:
(stdint.h must be included.)

Code: [Select]

volatile u64 * mem = (volatile u64*) UINT64_C(0x20000000);

(I recommend adding the volatile qualifier here too, as shown.)

asmi · « **Reply #9 on:** September 28, 2023, 12:31:57 pm »

Quote from: SiliconWizard on September 28, 2023, 07:12:01 am

This may be a similar issue as with the post I linked to above.

It may very well be.

Quote from: SiliconWizard on September 28, 2023, 07:12:01 am

I highly recommend getting rid of the U, UL, ULL suffixes and using stdint macros instead (yes, they have been there for over 20 years - C99 - but many if not most C programmers have probably never seen them for some reason.)

Well tell it to Xilinx/AMD Vitis team which generates bsp headers with constants full of those things.

Quote from: SiliconWizard on September 28, 2023, 07:12:01 am

On the "problematic compiler", try this instead, and report back if you can:

I just did, and it still doesn't work, but even if it would, it would still be a no-go for me due to the fact that bsp generator creates a header with "oldschool" constants.

asmi · « **Reply #10 on:** September 28, 2023, 03:38:45 pm »

Quote from: luudee on September 28, 2023, 05:56:56 am

Regarding 32/64 bit: You don't need a 64 bit Microblaze just because
you are using more than 4GB address space. Microblaze does have
an MMU option ...

There is a problem with this approach - I can map 2GB of DDR3 into 32-bit address space at 0x8000_0000, or I can map full 8GB at address 0x2_0000_0000. But I can't do both at the same time. Which means that I only have access to 2GB of RAM, but it's available from the get-to (including live debugging support), or 8GB, but my application either need to fit into local BRAMs, or I will need to have some sort of bootloader which would initialize MMU, load the main application at 0x2_0000_0000 and launch it.
Also for some reason Vivado doesn't allow to have DDR mapped at 0x8000_0000/2G for CPU, but at 0x2_0000_0000/8G for other masters. I thought I could work-around the problem by using CDMA to shuttle data between CPU's "window" and the rest of DDR, but that also didn't seem to work, infact even CDMA's driver flatly ignores upper 32 bits of address (even though hardware does support using 64 bit addresses), and I had to resort to "manual" setup of CDMA so it would actually use full addresses, but that didn't work (CDMA reports "decode error").

luudee · « **Reply #11 on:** September 28, 2023, 03:57:37 pm »

Quote from: asmi on September 28, 2023, 03:38:45 pm

There is a problem with this approach - I can map 2GB of DDR3 into 32-bit address space at 0x8000_0000, or I can map full 8GB at address 0x2_0000_0000. But I can't do both at the same time. Which means that I only have access to 2GB of RAM, but it's available from the get-to (including live debugging support), or 8GB, but my application either need to fit into local BRAMs, or I will need to have some sort of bootloader which would initialize MMU, load the main application at 0x2_0000_0000 and launch it.
Also for some reason Vivado doesn't allow to have DDR mapped at 0x8000_0000/2G for CPU, but at 0x2_0000_0000/8G for other masters. I thought I could work-around the problem by using CDMA to shuttle data between CPU's "window" and the rest of DDR, but that also didn't seem to work, infact even CDMA's driver flatly ignores upper 32 bits of address (even though hardware does support using 64 bit addresses), and I had to resort to "manual" setup of CDMA so it would actually use full addresses, but that didn't work (CDMA reports "decode error").

You clearly do not understand how the MMU works. You might want to read up on that ....

asmi · « **Reply #12 on:** September 28, 2023, 04:35:13 pm »

Quote from: luudee on September 28, 2023, 03:57:37 pm

You clearly do not understand how the MMU works. You might want to read up on that ....

I will wait for someone more helpful. You clearly don't understand what my problem is.

asmi · « **Reply #13 on:** September 29, 2023, 09:34:56 pm »

I did some more digging, and it seems that the problem is not with a compiler per se, but with assembler. I've compiled the same C source file with -S key (only compile to assembly), and both Win and Linux compilers produced identical assembly code, but assemblers produced different code:
1. Linux assembler implemented addlik r1, -24 as addlik r1, 0xffe8 (this is a twos-complement of -24, which will be sign-extended to 64 bits), while Win assembler implemented it as a pair imml, 0x00ffff/addlik r1, 0xffe8, which is effective command addlik r1, 0x00ffffffe8, which is wrong, but consistent with Win assembler forcing upper 32 bits to zeros
2. Linux assembler implemented addlik r3,r0,8589934592 properly: imml 0x020000/addlik r3, r0, 0x0000 => addlik r3, r0, 0x200000000, which is correct, while Windows assembler implemented it as imml 0/addlik r3, r0, 0x0000 => effective addlik r3, r0, 0. What's interesting here is that it did realize that there is a long immediate involved (hence imml command), but it still zeroed out the upper part of it.
3. Third interesting point is instruction addlik r4,r0,-6172933522750876246, which requires two imml instructions (as each one can only store 24bit value), Linux assembler got this right - it emitted sequence imml 0xaa5555, imml 0xaaaa55, addlik r4, r0, 0x55aa, which is effective addlik r4, r0, 0xaa5555aaaa5555aa, while Windows version emitted only two commands: imml 0xffaa55, addlik r4, r0, 0x55aa, which is effectively means addlik r4, r0, 0xffffffffaa5555aa, as the immediate is sign-extended to 64 bits, which of course produces incorrect result.

asmi · « **Reply #14 on:** October 01, 2023, 07:06:20 pm »

In case anyone is curious, I've managed to get CDMA working having access to full 8GB, while leaving 32bit Microblaze with access to only 2GB. I've attached the memory map I settled on. Diagram validation issues a critical warning:

Quote

CRITICAL WARNING: [BD 41-1267] Slave segment </mig_7series_0/memmap/memaddr> is mapped into related masters </axi_cdma_0/Data> and </microblaze_0/Data> at different offsets <0x2_0000_0000 [ 8G ]> and <0x8000_0000 [ 2G ]>. Please ensure that assignments to related address spaces have the same offset.

But it's possible to proceed anyway. I also had to modify driver a bit because as-shipped it's impossible to pass 64 bit address when using 32bit CPU (as API accepts pointers). The reason my previous attempt didn't work was because of compiler shennanigans again. Turns out if you cast 32bit pointer (u32* in my case) straight to u64 number, it performs a sign-extension (and so 0x9000_0000 turns into 0xFFFF_FFFF_9000_0000), which causes AXI decode error. The workaround is to cast pointer to 32bit number first - like this (u64)(u32)u32_ptr.

Another unrelated thing is that even when you enable I/D caches, the data interface remains 32 bits-wide by default, and so reading from DDR takes more cycles than it needed to be. If you've got a wide memory interface like my board does (it's got 64 bit wide SODIMM), with MIG slave port being 512 bits wide, it makes sense to change I/D interfaces to make them full-cacheline-wide (like it is in "desktop" CPUs) so that entire cache line will be read in one cycle. In order to do that, you have to double-click on a Microblaze IP and go through pages until you get to cache settings. Attached are settings which I use.

asmi · « **Reply #15 on:** October 01, 2023, 08:33:54 pm »

Quote from: karpouzi9 on October 01, 2023, 08:27:02 pm

By chance have you posted this on the Xilinx forums or to their support team?

Yes I have, but it was late on Friday, so it will probably be a few more days until they respond. If they respond at all, that is

asmi · « **Reply #16 on:** October 02, 2023, 09:52:24 pm »

Just out of curiousity, I've ran Dhrystone benchmarks to compare performace of MB32 with default bus width of 32 bits vs making the bus width cacheline-wide, leaving all other cache settings the same, that gained about 3.4 DMIPS (from 133.2537 to 136.7331) and 0.03 DMIPS/MHz (from 1.3325 to 1.3673).


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Microblaze 64 bit (Read 4495 times)

asmi

Microblaze 64 bit

brucehoult

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

brucehoult

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

SiliconWizard

Re: Microblaze 64 bit

luudee

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

SiliconWizard

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

luudee

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

asmi

Re: Microblaze 64 bit

Share me