Author Topic: Managing Complexities and Costs in the age of Multi-core Devices (Read 3028 times)

joe306 · « **on:** September 15, 2018, 11:13:45 pm »

Hello, I would like to get some comments of my questions about managing complexities and costs in this age of multi-core ARM and multi-cores DSPs. I have co-workers that have spent months working on the software for one of today's post powerful multi-core ARM+DSP processors. They tell me of the problems they have in getting things working and also the lack of documentation and the frustration they have when the processor vendor uses third party tools to configure the device because there are thousands of registers that need to be configured. So, we can get everything done on one chip but at an extremely high cost and the possibility that only a few engineers can keep the device running. I've almost jumped on board and started selecting one of these devices but stopped short at the increased life-cycle+support costs that I would pass onto the customer. My conclusion is to keep things manageable by using more two or more devices and link them together using their built-in high speed links.

Has anyone else came across this or thought about the increased cost of using these kind of devices?

Thanks,

Joe

rstofer · « **Reply #1 on:** September 16, 2018, 02:34:59 pm »

Complex projects requiring complex CPUs are going to have high costs. They always have.

Back in the early '70s, it was said that it cost Control Data just as much to ship a copy of the software as it did to ship the supercomputer. The thing is, building the hardware is pretty much a one-off deal. Build it and ship it. The software is a recurring cost of debugging, maintenance and upgrades. Unending costs...

As you progress up the pyramid of complexity you are going to find fewer and fewer resources to deal with it. This includes tools and people. Tools are manageable, finding people is going to be difficult and expensive.

nctnico · « **Reply #2 on:** September 16, 2018, 03:00:18 pm »

IMHO you should stick to the multi-core ARM SOCs which can run Linux. Linux then does all the heavy lifting of distributing tasks among the available processor cores.

tggzzz · « **Reply #3 on:** September 16, 2018, 08:11:24 pm »

The tool chain is more important than the processor, when it comes to minimising complexity.

You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.

Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.

See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools

MT · « **Reply #4 on:** September 16, 2018, 10:38:09 pm »

Quote from: joe306 on September 15, 2018, 11:13:45 pm

Hello, I would like to get some comments of my questions about managing complexities and costs in this age of multi-core ARM and multi-cores DSPs. I have co-workers that have spent months working on the software for one of today's post powerful multi-core ARM+DSP processors. They tell me of the problems they have in getting things working and also the lack of documentation and the frustration they have when the processor vendor uses third party tools to configure the device because there are thousands of registers that need to be configured. So, we can get everything done on one chip but at an extremely high cost and the possibility that only a few engineers can keep the device running. I've almost jumped on board and started selecting one of these devices but stopped short at the increased life-cycle+support costs that I would pass onto the customer. My conclusion is to keep things manageable by using more two or more devices and link them together using their built-in high speed links.

Has anyone else came across this or thought about the increased cost of using these kind of devices?
Thanks,
Joe

Complex projects, no matter multi core or not came at high development cost in every aspect of it. And it always is a increase in cost as the project goes along. Because people along the way faff's up or they did faff up the project plan in the beginning and so on for a million reasons all at the same time. A project manager who dont understand this is not experienced enough for complex projects and should be replaced. More often then not complex projects requires big fat arse companies who can take the financial blast when things "dont" return or go well.

mark03 · « **Reply #5 on:** September 17, 2018, 01:06:24 am »

Hmph. The replies so far are disappointing, ranging from "well duh!" to "your platform sucks, use mine instead." Speaking of which, I see no reason why you wouldn't call out the specific part in question. I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?

To some extent, the "well duh!" people are correct: there is a certain unavoidable complexity that comes with a part, and a project, such as you describe. The real issue is unnecessary complexity, and most complex projects have that in spades, especially when vendor-proprietary tools and/or source code are involved.

Unnecessary complexity is a paradox. Avoiding it requires a LOT of effort, often eschewing the "easy" tools/code provided by the vendor. In exchange for that up-front pain, you can have a system which makes sense, behaves well, and is maintainable down the road. It is precisely those systems/frameworks/codebases which are advertised to make your life easier, which tend to cause the most problems. (The counter-argument, of course, is that you still save time, net, especially if the project is "just a prototype.")

A good example of this, albeit at a much lower level of complexity overall, is the STM32 ecosystem. It takes time to really parse the less-than-optimal documentation and write your own drivers, but if you do, you can have a nice clean system with an order of magnitude fewer lines of code than you would if you had followed ST's advice and used their provided HAL/etc. Vendor-provided tools and code are crap; I cannot explain why this should always be so, and yet it seems to be an inviolable law of nature.

Things get really bad when you are driven to a part which has no tool support outside of the vendor, or for which the documentation is so poor or sparse that you really have no choice but to use vendor code libraries. Still, it helps to name and shame the problem and the perpetrator. At least it makes me feel better

NiHaoMike · « **Reply #6 on:** September 17, 2018, 01:58:23 am »

Quote from: mark03 on September 17, 2018, 01:06:24 am

I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?

What about the top two smartphone chip makers, Qualcomm and Mediatek? And Nvidia as well, if you count GPGPU as DSP.

maginnovision · « **Reply #7 on:** September 17, 2018, 02:32:07 am »

Quote from: tggzzz on September 16, 2018, 08:11:24 pm

The tool chain is more important than the processor, when it comes to minimising complexity.

You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.

Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.

See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools

I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.

However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.

mark03 · « **Reply #8 on:** September 17, 2018, 03:17:38 am »

Quote from: NiHaoMike on September 17, 2018, 01:58:23 am

Quote from: mark03 on September 17, 2018, 01:06:24 am
I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?
What about the top two smartphone chip makers, Qualcomm and Mediatek? And Nvidia as well, if you count GPGPU as DSP.

I assumed the OP was referring to an actual DSP, as opposed to a fast processor capable of doing DSP. I am not aware of DSPs in modern smartphones, except for maybe a highly specialized core here and there, e.g. in the audio subsystem.

Rerouter · « **Reply #9 on:** September 17, 2018, 03:34:27 am »

My approach to this problem is abstracing the problems from top down in a tree. Sudocode can be your freind.

As this atleast for me reveals ways to parrellise things very early on. And as I progress down the layers gets fleshed out following that structure.

Then there are things like independant parrellism where there is no overlap and will never conflict. Dependant parrellism where one task is dependant on another. And even shared space parrellism where they are essentually independant but write to the same interfaces or locations. So need to be handled to prevent conflicts.

This all falls into the lands of pipelining. Where even things like ensuring your threads are pulling from different L1 cache rows can give a good increase in speed. But its hard to manage unless you have those structures in place early on.

I rarely step to far into the arm game. So far most of my pipeling has revolved around juggling DMA transfers between 16 SPI interfaces. Keeping as much of the core resources free to do interesting things while ensuring the FIFO's dont run empty.

This is similar to a supervisor thread on a core, assigning tasks to each core to keep them all as busy as possible.

NiHaoMike · « **Reply #10 on:** September 17, 2018, 04:19:51 am »

Quote from: mark03 on September 17, 2018, 03:17:38 am

I assumed the OP was referring to an actual DSP, as opposed to a fast processor capable of doing DSP. I am not aware of DSPs in modern smartphones, except for maybe a highly specialized core here and there, e.g. in the audio subsystem.

https://en.wikipedia.org/wiki/Qualcomm_Hexagon

Quote

In Snapdragon S4 (MSM8960 and newer) there are three QDSP cores, two in the Modem subsystem and one Hexagon core in the Multimedia subsystem. Modem cores are programmed by Qualcomm only, and only Multimedia core is allowed to be programmed by user.

I remember when Xiaomi released a phone that "makes people look prettier". Then some other manufacturers came out with similar features and what they all seemed to have in common is that they're all based on Snapdragon 6xx or 8xx chips.

tggzzz · « **Reply #11 on:** September 17, 2018, 07:40:24 am »

Quote from: maginnovision on September 17, 2018, 02:32:07 am

Quote from: tggzzz on September 16, 2018, 08:11:24 pm
The tool chain is more important than the processor, when it comes to minimising complexity.

You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.

Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.

See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools

I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.

However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.

Interesting. How long ago was that? What caused the problems to manifest themselves?

The reason I ask is that I've "kicked the tyres" with them on a small project, and I was amazed at how well the hw+sw worked. I had zero problems apart from where I misused the facilities. As far as I could tell they very simply and transparently "just did what it says on the tin" - unlike most other toolchains!

Before anybody makes a strawman argument, yes of course they aren't the solution to all problems. But they do have a unique set of advantages, can move into FPGA territory - and give the lie to needing an RTOS and presumptions about high-performance software = latency and unpredictability.

maginnovision · « **Reply #12 on:** September 17, 2018, 08:44:19 am »

Quote from: tggzzz on September 17, 2018, 07:40:24 am

Quote from: maginnovision on September 17, 2018, 02:32:07 am
Quote from: tggzzz on September 16, 2018, 08:11:24 pm
The tool chain is more important than the processor, when it comes to minimising complexity.

You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.

Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.

See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools

I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.

However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.

Interesting. How long ago was that? What caused the problems to manifest themselves?

The reason I ask is that I've "kicked the tyres" with them on a small project, and I was amazed at how well the hw+sw worked. I had zero problems apart from where I misused the facilities. As far as I could tell they very simply and transparently "just did what it says on the tin" - unlike most other toolchains!

Before anybody makes a strawman argument, yes of course they aren't the solution to all problems. But they do have a unique set of advantages, can move into FPGA territory - and give the lie to needing an RTOS and presumptions about high-performance software = latency and unpredictability.

The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.

This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that. I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.

There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best. You may have had no issues and that's great but they do have issues, despite the fact that alot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use alot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.

tggzzz · « **Reply #13 on:** September 17, 2018, 12:06:47 pm »

Quote from: maginnovision on September 17, 2018, 08:44:19 am

Quote from: tggzzz on September 17, 2018, 07:40:24 am
Quote from: maginnovision on September 17, 2018, 02:32:07 am
Quote from: tggzzz on September 16, 2018, 08:11:24 pm
The tool chain is more important than the processor, when it comes to minimising complexity.

You need a tool chain that is optimised from the ground up for parallel processing and DSP. Have a look at the XMOS xCORE devices running xC and the xTIME IDE; they are the culmination of 30 years of improving toolsets for parallel processors. A principal commercial use for them is DSP, particularly for audio.

Up to 32 cores and 4000MIPS/chip (and can be chained together), hard realtime.

See https://www.digikey.co.uk/en/supplier-centers/x/xmos
xCORE architecture https://www.digikey.co.uk/en/pdf/x/xmos/xcore-architecture
xC parallel programming https://www.xmos.com/support/tools/programming?component=17653
xTIMEComposer IDE: https://www.xmos.com/support/tools

I think you're right that tools are very important. Having worked with XMOS mcus for a few years now I can say that the tools are very good but only when they work. There are so many ways to break them, especially the compiler, that I don't think I'd ever recommend them for professional use outside of projects that have been using them and audio projects heavily based on their own code. I personally have reported 3 or 4 compiler bugs they didn't know about. They did fix them but it took months. In the mean time it was on me to find and implement a work around. When you get vague errors like "compiler error, exiting..." it can be tough to find out what the problem is. It takes alot of time. The best way to make an expensive project more expensive is drawing it out with bugs that aren't even yours.

However, I do like their hardware and it's fairly unique. There are alot of drawbacks though, and depending on the application they can be total showstoppers.

Interesting. How long ago was that? What caused the problems to manifest themselves?

The reason I ask is that I've "kicked the tyres" with them on a small project, and I was amazed at how well the hw+sw worked. I had zero problems apart from where I misused the facilities. As far as I could tell they very simply and transparently "just did what it says on the tin" - unlike most other toolchains!

Before anybody makes a strawman argument, yes of course they aren't the solution to all problems. But they do have a unique set of advantages, can move into FPGA territory - and give the lie to needing an RTOS and presumptions about high-performance software = latency and unpredictability.

The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.

I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?

Quote

This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.

Don't care; crap libraries are omnipresent

Improving tools cannot help; you have to improve the developer.

Quote

I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.

Strange and concerning.

It is entirely possible (and easy

) to get livelock in multiprocess systems.

Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?

Quote

There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.

My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.

There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.

Quote

You may have had no issues and that's great but they do have issues, despite the fact that alot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use alot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.

That's always a key decision!

Thanks for your points.

Zero999 · « **Reply #14 on:** September 17, 2018, 01:04:22 pm »

Quote from: rstofer on September 16, 2018, 02:34:59 pm

The software is a recurring cost of debugging, maintenance and upgrades. Unending costs...

Not always. I can think of plenty of examples of software written many decades ago and is still working, unmodified. In many cases changing it is not easily possible, since it's on good old fashioned ROM.

nctnico · « **Reply #15 on:** September 17, 2018, 04:02:55 pm »

Quote from: mark03 on September 17, 2018, 01:06:24 am

Hmph. The replies so far are disappointing, ranging from "well duh!" to "your platform sucks, use mine instead." Speaking of which, I see no reason why you wouldn't call out the specific part in question. I assume it's made by TI. Who else is pushing heavy-duty DSP + ARM these days?

To some extent, the "well duh!" people are correct: there is a certain unavoidable complexity that comes with a part, and a project, such as you describe. The real issue is unnecessary complexity, and most complex projects have that in spades, especially when vendor-proprietary tools and/or source code are involved.

Unnecessary complexity is a paradox. Avoiding it requires a LOT of effort, often eschewing the "easy" tools/code provided by the vendor.

The problem with complex processors (SoC) is that you can't build an entire eco-system from scratch. I've got a few embedded Linux systems under my belt... From my own experience I know that you have to build on software created by others which may or may not be perfect. The more mainstream the libraries you are using the less chance they have bugs in them. Even with the bugs and debugging you'll be done quicker than writing and debugging everything from scratch. I'm not talking about the vendor provided tools/code because these rarely are bug free let alone production ready (mostly shiny beads & mirrors to lure customers in) but I'm talking about software from third parties (either open or closed source). In any case there will be a learning curve. Getting the hang of Linux and the cross-compilation tools took me a couple of months initially but that knowledge serves me well now. Maybe it is better to hire external peope who have knowledge instead of trying to get the internal software team up to speed. The choice of platform is critical here. Going for something mainstream like Linux or Android is a much safer bet.

People tend to overestimate 'bloat'. Sure some libraries can be several MB in size but usually the core functionality has been highly optimised over several versions. There is not much use to go through that entire process again. Also a large library doesn't mean every piece of code is executed.

A good example where trying to write everything from scratch went wrong in the end is Nokia: they created their own OS called Symbian. Given the hardware limitations it was an OK-ish choice in the beginning but they held on to it for way too long and it became an anchor which pulled them down. Google's Android OTOH is based on Linux. This meant Googe could concentrate on the functionality and start building the user interface without writing an OS and associated libraries first. The rest is history.

maginnovision · « **Reply #16 on:** September 17, 2018, 10:18:49 pm »

Quote from: tggzzz on September 17, 2018, 12:06:47 pm

Quote from: maginnovision on September 17, 2018, 08:44:19 am
The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.

I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?

Yes the source code was available, it was written by us. It wasn't recursive either. I suspect it had to do with some C / XC interfacing issue but it was easier to throw stack space at it since there aren't going to be any issues with that. No matter what we do the analyzer tools completely fail with these sections of code. Eats any memory we give it and fails to heap limits. Their version of eclipse fails to start beyond 2GB heap.

Quote

Quote
This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.

Don't care; crap libraries are omnipresent Improving tools cannot help; you have to improve the developer.

Regardless of whether you care it is something they tout, seeing as there is no hardware support for common peripherals.

Quote

Quote
I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.

Strange and concerning.

It is entirely possible (and easy ) to get livelock in multiprocess systems.

Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?

This was an issue with synchronous communication across tiles using interfaces, the same as it is now but without an extra thread. I went back and looked and I had it slightly wrong. It was 4 hours to respond, and then it'd work normally. Like usual the data was passed by reference and memcpy'd since the compiler optimizes memcpy. This is now and always has been the proper way to do it. It was totally independent of optimization level. Having tried to debug we were spitting data out over a couple uarts we monitored and time stamped. We got the message that it was making the call, and 4 hours later confirmation from the other thread. Neither thread was executing ANY code during that time. There were no other tasks trying to communicate with the threads(during debugging we had it cut down to just the 2 threads at one point) and the thread that took 1/6 a day to respond was busy about 800ns chunks every 10ms. That's it. The assembly didn't look abnormal. No warnings with -wall either. I suspect it was an issue with the scheduler but it was impossible at the time to verify. Unfortunately we don't get paid to find their bugs, we could have made some money charging by the hour.

Quote

Quote
There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.

My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.

There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.

Except in all their materials the numbers are inflated since you almost only get dual issue when hand writing assembly since the C almost never takes advantage of it. So 4000MIPS is really 2000MIPS. That's significant. The XS1 parts had a slight advantage in this way for lower core count designs since they could execute instructions up to 125MHz each "thread". The XS2 parts are limited to 100MHz max.

Quote

Quote
You may have had no issues and that's great but they do have issues, despite the fact that a lot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use a lot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.

That's always a key decision!

Thanks for your points.

This wasn't even a large project, ~30k lines of code. These MCU's are really neat, somewhat unique, and relatively unknown. It's important not to look too deep into the marketing materials. We have no plans to stop using them but you can really find yourself in trouble if you are waiting for them to fix their tools and you're on a deadline without enough experience to know what to look at. However, not everyone will have issues just like you had none. We've had projects that had no real issues, we've also had designs that had to be totally redesigned due to issues though and that costs A LOT of time.

tggzzz · « **Reply #17 on:** September 17, 2018, 11:32:21 pm »

Quote from: maginnovision on September 17, 2018, 10:18:49 pm

Quote from: tggzzz on September 17, 2018, 12:06:47 pm
Quote from: maginnovision on September 17, 2018, 08:44:19 am
The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.

I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?

Yes the source code was available, it was written by us. It wasn't recursive either. I suspect it had to do with some C / XC interfacing issue but it was easier to throw stack space at it since there aren't going to be any issues with that. No matter what we do the analyzer tools completely fail with these sections of code. Eats any memory we give it and fails to heap limits. Their version of eclipse fails to start beyond 2GB heap.

From that description it isn't perfectly clear whether the stack problem is in the IDE or in the embedded executable. If the stack problem is in the executable, I'd be worried, but if it is in the IDE then it is more amenable to bugfixing.

Either way it is annoying.

Quote

Quote
Quote
This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.

Don't care; crap libraries are omnipresent Improving tools cannot help; you have to improve the developer.
Regardless of whether you care it is something they tout, seeing as there is no hardware support for common peripherals.

Given the number of "errata" for common processors/peripherals and the difficulty of working around those, the odd bit of library "infelicities" is relatively solvable. At worst you can write the library yourself.

The build-vs-buy for libraries is an omnipresent dilemma in all development.

Hell, 35 years ago I was approached to implement HDL libraries for TTL ics. The client didn't give a monkeys about whether or not the libraries were accurate, since all the marketing department had to do was allow their client to tick a box. I declined to provide a quote

Quote

Quote
Quote
I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.

Strange and concerning.

It is entirely possible (and easy ) to get livelock in multiprocess systems.

Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?

This was an issue with synchronous communication across tiles using interfaces, the same as it is now but without an extra thread. I went back and looked and I had it slightly wrong. It was 4 hours to respond, and then it'd work normally. Like usual the data was passed by reference and memcpy'd since the compiler optimizes memcpy. This is now and always has been the proper way to do it.

You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.

Quote

It was totally independent of optimization level. Having tried to debug we were spitting data out over a couple uarts we monitored and time stamped. We got the message that it was making the call, and 4 hours later confirmation from the other thread. Neither thread was executing ANY code during that time. There were no other tasks trying to communicate with the threads(during debugging we had it cut down to just the 2 threads at one point) and the thread that took 1/6 a day to respond was busy about 800ns chunks every 10ms. That's it. The assembly didn't look abnormal. No warnings with -wall either. I suspect it was an issue with the scheduler but it was impossible at the time to verify. Unfortunately we don't get paid to find their bugs, we could have made some money charging by the hour.

The scheduler is a slightly nebulous concept in the xCORE devices, since much of the traditional RTOS functionality is in the hardware.

I can't make sense of those times (800ns/10ms/4hours) either!

Quote

Quote
Quote
There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.

My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.

There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.

Except in all their materials the numbers are inflated since you almost only get dual issue when hand writing assembly since the C almost never takes advantage of it. So 4000MIPS is really 2000MIPS. That's significant. The XS1 parts had a slight advantage in this way for lower core count designs since they could execute instructions up to 125MHz each "thread". The XS2 parts are limited to 100MHz max.

OK, that's significant - and disappointing.

I've only used XS1 devices, so I wouldn't have noticed.

Perhaps dual issue might still be practical in cases where, say, a DSP kernel can be hand-optimised.

Quote

Quote
Quote
You may have had no issues and that's great but they do have issues, despite the fact that a lot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use a lot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.

That's always a key decision!

Thanks for your points.

This wasn't even a large project, ~30k lines of code. These MCU's are really neat, somewhat unique, and relatively unknown. It's important not to look too deep into the marketing materials. We have no plans to stop using them but you can really find yourself in trouble if you are waiting for them to fix their tools and you're on a deadline without enough experience to know what to look at. However, not everyone will have issues just like you had none. We've had projects that had no real issues, we've also had designs that had to be totally redesigned due to issues though and that costs A LOT of time.

Yes, indeed it does. We've all seen that in other environments[1], and it is a shame to see it occurring in this environment.

It is always valuable to read such "war stories" from people that don't have a hidden agenda or prejudices. Thanks again.

[1] I have a tendency to make an estimate, and then keep the figure the same but use the next larger unit, e.g. 3 weeks -> 3 months. And every time someone asks me if it is accurate, I add 10% on principle

maginnovision · « **Reply #18 on:** September 18, 2018, 03:07:47 am »

Quote from: tggzzz on September 17, 2018, 11:32:21 pm

You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.

memcpy is optimized for the xcore mcus. http://www.xcore.com/viewtopic.php?t=5585

Anyway, I just wanted to mention these things because unlike PIC, AVR, ARM, PSOC there aren't too many people really familiar with the xcore MCU. Even fewer outside of audio projects.

tggzzz · « **Reply #19 on:** September 18, 2018, 07:06:15 am »

Quote from: maginnovision on September 18, 2018, 03:07:47 am

Quote from: tggzzz on September 17, 2018, 11:32:21 pm
You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.

memcpy is optimized for the xcore mcus. http://www.xcore.com/viewtopic.php?t=5585

When copying across tiles, memcopy uses the communication channels; there is no other possible hardware mechanism. That's analogous to the distinction between a procedure call and a remote procedure call, where the RPC is optimised to a PC if it is known to be local.

Quite frankly if the problem is only seen for cross-tile operation, I'd be concerned that some of the channel messages were being dropped/delayed in the switch fabric or that there was a livelock problem in the comms protocol. That would be somewhat difficult to fix

Quote

Anyway, I just wanted to mention these things because unlike PIC, AVR, ARM, PSOC there aren't too many people really familiar with the xcore MCU. Even fewer outside of audio projects.

Yes, very much so.

All the PICs/AVR/ARM/PSOC/etc are very much of a muchness with little to distinguish themselves from each other. There's nothing in them that is unfamiliar from when I started embedded programming in the early 80s. That's both delightfully easy and an appalling indictment of the lack of progress that has been made.

xCORE/xC really offers advances over early 80s technology.

Given than Moore's "law" has run out of steam, the near future is going to be a golden age of processor architecture as different techniques are used to continue progress.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Managing Complexities and Costs in the age of Multi-core Devices (Read 3028 times)

Share me