Author Topic: Xilinx FPGA, PCI-Express, ARM Cortex A - anyone got experience with that setup?  (Read 4998 times)

0 Members and 1 Guest are viewing this topic.

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
Hi,

for the Xilinx Artix7 FPGA, there is the XDMA PCI-e bridge IP core and corresponding Linux driver provided by Xilinx.

Has anyone here ever connected such FPGA, via PCI-e, to an ARM based system, such as the iMX6?

Edit: It's the Xilinx AC701 FPGA board, and a Toradex Apalis iMX6 board, as well as their TK1 board was tried (using the NVIDIA L4T based Linux image with kernel 3.10).

Xilinx themselves say on their AR# 65444 page that the driver is only for x86 systems.
The driver seems to be only for demonstrating one DMA transfer, and they have one forum thread where someone presents a substancial
Since I'm not a kernel developer, I looke whether someone else might have extended this for continuous operation. Indeed, there is W.Zabolotny with his "v2_xdma" on github.

I have gotten that modified driver to work on Linux / x86_64 platform, i.e. high speed data transfer from FPGA to the host system.
But as expected, it did not work on the ARM system.
There is no support from Xilinx for this scenario (they explicitly told, additionally in their forum).

I have not found a different solution out there, and so far, the only option seems to be to contract a third party to develop a driver for this, which makes me want to facepalm hard, as there basically is a driver but Xilinx doesn't think ARM platforms are important ;)

Or... are there solutions out there not hitting the wallet so hard?

Btw, I did try to just access things via the /dev/mem driver for memory mapped access without DMA. On one Linux/ARM board, this worked in principle - albeit much too slowly. But that board was ruled out for other reasons. On the boards I have now, with different ARM SoCs but all by the same manufacturer, I can't get any PCIe to work, although one of the boards has the same SoC and same kernel provided by NVIDIA (it's a Tegra K1), although they claim it should work - so apparently some changed Linux setting can have huge impact on how well (or not at all) PCIe works, which I would not have expected. I knew PCIe from the PC only, which so far had been "just works" ;)
« Last Edit: April 30, 2019, 10:47:17 am by TinkeringSteve »
 

Offline jeremy

  • Super Contributor
  • ***
  • Posts: 1079
  • Country: au
If I had to guess, I’d say it’s because they either want you to use a Zynq, or because PCIe IPs are very different between ARM semiconductor vendors, where as the x86 ones are really just intel.

If I recall correctly, there is even an Allwinner chip which is advertised as having PCIe, but it doesn’t actually work because they stuffed up the logic (they still advertise it as having PCIe though ;D )
 

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
I would love to use a Zynq, I read reports of people getting 400MB/s just by using /dev/mem, as things are just interconnected internally. But apparently the current path has been walked down too far already to do that.

I get to hear "PCI-e is *the* standard conenction between FPGA and a CPU", but the Xílinx forums are full of people who don't get this to work on ARM. I was told one potential problem may be the lack of cache coherency, but it fails before that would be an issue, from my humble assessment. Also some page I found, listing different ARM cores' properties, claimed Cortex A15 to be coherent... (don't know how to look for that in datasheets, I'm a software dev who tinkers with hardware occasionally)
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
Reading the OP makes me facepalm so hard :palm:
1. There is no such thing as "ARM platform" in the same sense as "x86 platform" or "x86_64" one because there are no standards covering peripherals. CPU core is not a platform. So each ARM SoC vendor has it's own one, and quite often multiple platforms for different SoCs they make.
2. TS clearly doesn't understand the difference between "sample code" and "production code". Code provided by Xilinx is just a sample and it's not intended for production use.

Fundamentally, if you came into FPGA world expecting that everything will be developed and ready for you - you're in for a big disappointment. Even if you are planning to use any pre-existing code (whether developed by Xilinx, or some third party), you've got to understand how it works, otherwise it's bound to bite you in the behind sooner or later.

So get yourself a platform manual for whatever SoC you're using and figure out how to talk to PCIe peripheral, and come up with the code that would enumerate the bus and initialize your device. Once you get to that point, you can reference the sample driver to figure out how to set up PCIe DMA transactions. It's not very complicated.

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
Reading the OP makes me facepalm so hard :palm:

If that's your standard reaction to somebody new to an area, I hope nobody will ever be so unfortunate as to endure being taught anything by you.


Quote
1. There is no such thing as "ARM platform" in the same sense as "x86 platform" or "x86_64"

I gave a specific example system, though, but I guess you were too busy facepalming, obstructing your view.
Also, it seems to me there could be commonalities to certain aspects - you yourself just meantioned one big difference to x86 platforms.

Quote
2. TS clearly doesn't understand the difference between "sample code" and "production code". Code provided by Xilinx is just a sample and it's not intended for production use.

Sample code that is funamentally broken in several aspects and doesn't even basically work seems pretty useless to me, though (e.g. read dwd_pete's thread "C2H Streaming XDMA Linux Driver Broken")


Quote
Fundamentally, if you came into FPGA world expecting that everything will be developed and ready for you - you're in for a big disappointment.

Unless you buy Altera/Intel, from what I have been told lately by serveral unconnected people/companies.

Also you *seem* to be suggesting it's actually a good thing that nobody should be so "lazy" to expect standard problems to be solved *once*, preferably by those closest to the source of the knowledge, as opposed to thousands of times over... What a waste from a farther angle, but whatever.

Quote
So get yourself a platform manual for whatever SoC you're using and figure out how to talk to PCIe peripheral, and come up with the code that would enumerate the bus and initialize your device. Once you get to that point, you can reference the sample driver to figure out how to set up PCIe DMA transactions. It's not very complicated.

Nothing seems complicated once it's second nature for you. I don't have any kernel development experience, that alone is a hurdle - and something like this doesn't exactly look like the firtst project a beginner in that area should try.
(ad least some PhD from some polish tech uni, who wrote v2_xdma, seems to agree - he at first wanted to write an own driver, but declared the XDMA core as too complex (for his time frame or willingness to invest work, he didn't say) and he settled on modifying their driver)

How about not being so pompous? Even if your reply contained potentially helpful statements. I don't see how that's needed. But if you're apparently a Linux kernel dev, I'm not surprized. Is anyone of you not autistic?
« Last Edit: April 29, 2019, 02:10:23 pm by TinkeringSteve »
 
The following users thanked this post: diyaudio

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
If that's your standard reaction to somebody new to an area, I hope nobody will ever be so unfortunate as to endure being taught anything by you.
It's my standard reaction to someone so pretentious as to facepalm over something without even understanding why things are the way they are, nor supplying any information that would actually allow to say something helpful - like FPGA in question, devboard used (if any), speed grade, any other relevant info. My statement is just a paraphrased quote from your post.
Also - I teach only those who genuinely wants to learn and are prepared to do their homework.
Also you *seem* to be suggesting it's actually a good thing that nobody should be so "lazy" to expect standard problems to be solved *once*, preferably by those closest to the source of the knowledge, as opposed to thousands of times over... What a waste from a farther angle, but whatever.
It's impossible to solve this problem once exactly because ARM is not a platform in traditional sense. You can thank SoC vendors for that.
How about not being so pompous? Even if your reply contained potentially helpful statements. I don't see how that's needed. But if you're apparently a Linux kernel dev, I'm not surprized. Is anyone of you not autistic?
I'm not a Linux kernel dev, although I did develop few kernel mode drivers for my projects in the past. Same goes for Windows. I just solve whatever problems that came my way, just like many other professional engineers, instead of complaining that I can't do it just because I don't know how to do it yet. More to your point - I wasn't born with that knowledge, and I don't think that PCIe peripheral driver is particularly bad first project as Linux driver model is pretty straightforward - provided that you're familiar with Linux userspace programming model, and is ready and willing to invest time into learning new things.

Offline Bassman59

  • Super Contributor
  • ***
  • Posts: 2501
  • Country: us
  • Yes, I do this for a living
2. TS clearly doesn't understand the difference between "sample code" and "production code". Code provided by Xilinx is just a sample and it's not intended for production use.

Fundamentally, if you came into FPGA world expecting that everything will be developed and ready for you - you're in for a big disappointment. Even if you are planning to use any pre-existing code (whether developed by Xilinx, or some third party), you've got to understand how it works, otherwise it's bound to bite you in the behind sooner or later.

The "promise" of SoCs is that the vendors do provide production-ready code. Nobody wants to write a TCP/IP or USB stack, or write a device driver for a peripheral. So this is why Xilinx and Intel have spent a fortune building toolsets that allow the customer to create systems which use such vendor-provided IP, and this frees the customer to create a product with their own special sauce.
 

Offline jeremy

  • Super Contributor
  • ***
  • Posts: 1079
  • Country: au
I’m sure if you looked at the source code for the latest rigol or siglent products, you would find a ton of Xilinx code copied verbatim.
 

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
If that's your standard reaction to somebody new to an area, I hope nobody will ever be so unfortunate as to endure being taught anything by you.
It's my standard reaction to someone so pretentious as to facepalm over something without even understanding why things are the way they are

"Pretentious"? It was meant to convey a slight frustration about a "almost there - could work, but not quite" situation. I don't see how that matches the dictionary definition of pretentious. Or how it matches your "mere paraphrasing" but in different setting - angrily directed against an individual. You added a facepalm emoticon that could convey aggression, whereas my "paraphrased" sentence has a damn smiley at the end, for god's sake.
As for complaining about Xilinx - I got the notion that Xilinx is partially "at fault" (or, not caring for a maybe less important market for that particular sort of thing, as I put it) from the Xilinx forum thread by "dwd_pete" that I mentioned, where he detailed how broken their code actually is - and how he thinks that shouldn't be normal. He seemed experienced in this, giving the impression that that's not what's necesarily to be expected. Their "support" forum is, in my experience of half a year or so, a joke, and after that experience I got told by a company that did some embedded work for my employer many years ago that they basically had the same experience - super expensive kit, poor support, and they just switched to Altera for a fraction of the price and, his words, "everything was bliss" (with the same problem: PCIe driver). dwd_pete also mentioned better experience with Altera (to me personally, I'm not referring to his "marked solution", lol). (just too late to switch here now)

So how would I not get the impressions that lead me to the conclusion that, in some ways, there's something left to be desired in how Xilinx did things?
Compared to the whole field, my impression may be wrong - but it looks to me like Xilinx is worse with support, if Altera/now Intel does actually support you with regards to drivers.

Quote
, nor supplying any information that would actually allow to say something helpful - like FPGA in question, devboard used (if any), speed grade, any other relevant info.

I wouldn't know that saying "Artix 7" like I did, is not enough. (I'll add it to OP)
But this is not stackoverflow. My idea was to first look around if there are people who do have experience with the problem, and it seems to make sense to say "ARM cortex A platforms" - they then know what it's not about (a lot of people seem to have only x86 experience with this), and how many commonalities between the SoCs with regards to this specific issue there are, I do not know.
I would have thought that it would, in the going on of the forum discussion, "reveal itself" (by people mentioning it) what precise information is further needed, which I may not know exatcly.

Quote
Also - I teach only those who genuinely wants to learn and are prepared to do their homework.

No matter how much you presume it, this doesn't relate to my post. You assumed a lot of things without knowing me, or why I have the impression of the situation as I do.

Quote
It's impossible to solve this problem once exactly because ARM is not a platform in traditional sense. You can thank SoC vendors for that.

Was there ever any attempt at making it more uniform?


Quote
instead of complaining that I can't do it just because I don't know how to do it yet.
More to your point - I wasn't born with that knowledge, [...] , and is ready and willing to invest time into learning new things.

Willing and being allowed to are not the same thing, though. I do have a book about Linux driver development, and the extent of things to be known, esp. with regards of a driver matching the XDMA core, impressions gleaned from all this, does not match your basically "piece of cake" description at all.
I have been doing some Linux user space programming, but not a lot. The e.g. SPI driver in the book looks a lot more simple than this.
I did go through the Xilinx driver code, and the mentioned modified one, and understood some aspects of it, btw. But not enough to see why it didn't work on my ARM SoC, when it worked on the x86. At some point an operation just came back with an error code, the source of which I didn't know and forum also yielded nothing. (like, the kernel stopped doing something / couldn't do somtheing - but not why).
Now I've even had ridiculous seeming problems like, a board I have can't map the PCIe BARs for the FPGA card, another one does, but the system freezes when that PCIe card and the on-board ethernet are used at the same time. (I went back and forth with the board vendor to try some things, resulting finally in their shrugging - after all, it's also their BSPs being used.)
Lol, this all makes it seem rather that there is another whole world of stuff to know before I could reasonably confidently get something to work.
If the books don't help with that and there is no expert nearby to look at it, it's basically an unaffordable time sink, trench war like progress.

I didn't even start this thread out of a burning issue - the PCIe stuff was pretty much put on hold some time ago. There is an alternative, just not particularly beautiful, and adds a bit to board cost. I just remembered that here might also be people who have done this stuff and I didn't ask here yet, and was curious. It wouldn't hurt to have it in the next board version, should there be a solution.
« Last Edit: April 30, 2019, 10:53:00 am by TinkeringSteve »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
Edit: It's the Xilinx AC701 FPGA board, and a Toradex Apalis iMX6 board, as well as their TK1 board was tried (using the NVIDIA L4T based Linux image with kernel 3.10).
As I understand these are SoMs and so you need some kind of carrier board. Did you use something pre-existing, or designed a custom board? If the latter is true, did you check SI to rule out physical board issues?
Also just so it's clear - the SoM is the PCIe root complex, and FPGA is PCIe device, right? Do you have any kind of error messages from the host that might be relevant?

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
Edit: It's the Xilinx AC701 FPGA board, and a Toradex Apalis iMX6 board, as well as their TK1 board was tried (using the NVIDIA L4T based Linux image with kernel 3.10).
As I understand these are SoMs and so you need some kind of carrier board. Did you use something pre-existing, or designed a custom board? If the latter is true, did you check SI to rule out physical board issues?
Also just so it's clear - the SoM is the PCIe root complex, and FPGA is PCIe device, right? Do you have any kind of error messages from the host that might be relevant?

Hey,
yes, they are replaceable modules, and there is a custom carrier board planned, but so far I have been using the one from the module vendor, or rather two - one huge with lots of I/O, and a compact one.
And yes, the SoM is the host, the FPGA is an endpoint.
As for checking signal integrity, does this require special gear? One of the companies I spoke to for potential out sourcing of the problem mentioned he had some dedicated PCIe analyzer something.

I have also used a TegraK1 module+carrier from a different vendor, which was ruled out for unrelated reasons. But that one did at least work so far with the FPGA/PCIe that I could access things via /dev/mem, which was good enough for accessing configuration register blocks.
Interestingly it has the LAN on USB3, vs. the Toradex board which has it on PCIe - which is the one that would freeze when PCIe and LAN are used at the same time to a significant degree.
Due to that, there was one error in the serial console the kernel spat out massively until deciding to auto-reboot: MSELECT, status=0x100. It seems to be specific to NVIDIA L4T's modifications to the kernel.

The iMX6 board had kernel startup messages saying things like "BAR 15: no space for [mem size 0x02000000 pref]" - listing a lot more BARs than I was used to after booting a board with connected & programmed FPGA on the other boards. I got a hint that BAR size limits may have been violated (although there were entries like this for orders of magnitude smaller sizes), but at that point the whole thing was already put on hold and I didn't further look into it what *exactly* that means and how to fix it. Only started to dig out the subject again yesterday, after months hiatus.
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
Are you using this board? If so, PCIe x1 ports on it are connected through PCIe switch (see schematics here). My memory is rather rusty on PCIe spec, but I'm pretty sure switches have their own config space, which is probably why you're seeing more BARs that you expect. Unfortunately I don't have this hardware on hand, so I can't really do any testing to confirm this theory, but it sounds plausible.

Offline TinkeringSteveTopic starter

  • Frequent Contributor
  • **
  • Posts: 297
Yeah, that's one of the boards. Somewhat larger than the average PC mainboard these days, which I found slightly amusing. They didn't skimp on connectors.

I'll read about the subject, thanks for looking that up.
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
If I had to guess, I’d say it’s because they either want you to use a Zynq, or because PCIe IPs are very different between ARM semiconductor vendors, where as the x86 ones are really just intel.

If I recall correctly, there is even an Allwinner chip which is advertised as having PCIe, but it doesn’t actually work because they stuffed up the logic (they still advertise it as having PCIe though ;D )
It actually makes financial sense to use a Zynq for this type of project. At least in China a bare Zynq-7010 chip XC7Z010-1CLG400C costs less than a bare i.MX6 Dual chip MCIMX6D6AVT08AD, and Zynq-7020 XC7Z020-1CLG400C costs about the same as said i.MX6 Dual chip. It may be cheaper and easier if your project is implemented using a Zynq.

I would love to use a Zynq, I read reports of people getting 400MB/s just by using /dev/mem, as things are just interconnected internally. But apparently the current path has been walked down too far already to do that.

I get to hear "PCI-e is *the* standard conenction between FPGA and a CPU", but the Xílinx forums are full of people who don't get this to work on ARM. I was told one potential problem may be the lack of cache coherency, but it fails before that would be an issue, from my humble assessment. Also some page I found, listing different ARM cores' properties, claimed Cortex A15 to be coherent... (don't know how to look for that in datasheets, I'm a software dev who tinkers with hardware occasionally)
The PCIe spec is too deeply entrenched with the x86 and amd64 platforms, retaining a crazy amount of platform dependencies. For using PCIe on any other platform workarounds have to be implemented and that becomes heavily vendor specific. Not only your OS and driver but also your hardware need to be implemented with those workarounds in mind.

If the power draw of your hardware is not strictly limited, how about investigating using some solution similar to LattePanda boards? That gives you a few proper amd64 cores from Intel and all the PCIe quirks go away.
« Last Edit: June 05, 2019, 04:58:29 am by technix »
 

Offline asmi

  • Super Contributor
  • ***
  • Posts: 2771
  • Country: ca
The PCIe spec is too deeply entrenched with the x86 and amd64 platforms, retaining a crazy amount of platform dependencies. For using PCIe on any other platform workarounds have to be implemented and that becomes heavily vendor specific. Not only your OS and driver but also your hardware need to be implemented with those workarounds in mind.
This is absolutely NOT the case. The reason it's so easy on x86 is that it's actually standardized, while that isn't so on other platforms. Part of the reason I suspect is that ARM CPUs were until recently 32 bit, and PCIe requires few big holes in address space, which simply didn't exist. Just configuration space access requires 256M worth of addresses per PCIe root port, and on top of that you need to have a hole below 4G boundary to support PCIe devices with 32bit BARs, and another huuuge hole somewhere else for 64bit BARs. This is only really possible once you have 64bit (or Pseudo-64bit) address space. With 64 bit ARMs becoming a commodity I still hope that somebody would actually bother to implement PCIe properly with full support of memory mapping. Existing PCIe cores that exist in some SoCs are just massive pain in the butt to use, while it absolutely shouldn't be that way.
Another big reason it's so simple on x86 is the presense of BIOS which handles all PCIe configuration, device mappings and so on, so by the time user code receives control, everything is already configured and ready to go, while on other platforms user code has to do it all by itself. Fundamentally, once all configuration is completed, using PCIe devices is the same as any other memory mapped peripheral, as all lower-level details are hidden from user code by the hardware.
« Last Edit: June 05, 2019, 02:30:20 pm by asmi »
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3508
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
The PCIe spec is too deeply entrenched with the x86 and amd64 platforms, retaining a crazy amount of platform dependencies. For using PCIe on any other platform workarounds have to be implemented and that becomes heavily vendor specific. Not only your OS and driver but also your hardware need to be implemented with those workarounds in mind.
This is absolutely NOT the case. The reason it's so easy on x86 is that it's actually standardized, while that isn't so on other platforms. Part of the reason I suspect is that ARM CPUs were until recently 32 bit, and PCIe requires few big holes in address space, which simply didn't exist. Just configuration space access requires 256M worth of addresses per PCIe root port, and on top of that you need to have a hole below 4G boundary to support PCIe devices with 32bit BARs, and another huuuge hole somewhere else for 64bit BARs. This is only really possible once you have 64bit (or Pseudo-64bit) address space. With 64 bit ARMs becoming a commodity I still hope that somebody would actually bother to implement PCIe properly with full support of memory mapping. Existing PCIe cores that exist in some SoCs are just massive pain in the butt to use, while it absolutely shouldn't be that way.
Another big reason it's so simple on x86 is the presense of BIOS which does handles all PCIe configuration, device mappings and so on, so by the time user code receives control, everything is already configured and ready to go, while on other platforms user code has to do it all by itself.
So far 64-bit ARM with PCIe is far and few between. As of BIOS there is UEFI on ARM that does the same, but everyone friggin loves u-boot.

As of this project an amd64 SoC can be the path of least resistance.
« Last Edit: June 05, 2019, 02:33:45 pm by technix »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf