The projects were robotics focused, I started using their MCUs in 2011 I think. After resolving most of the issues we stopped updating the tools after trying new versions and getting all new issues. Probably a year or two ago. I'm using an XLF-216-512 now for another project. Most recent issues was the compiler unable to determine the stack size for a function and needed it specified explicitly. At least that gave me data I could use, although I had to dig through some build files to find where the function was from.
I wonder what caused that.
Was the function (or anything it called) recursive? No compiler can determine the recursion depth without knowing the worst case inputs.
Was the functions's source code available?
Yes the source code was available, it was written by us. It wasn't recursive either. I suspect it had to do with some C / XC interfacing issue but it was easier to throw stack space at it since there aren't going to be any issues with that. No matter what we do the analyzer tools completely fail with these sections of code. Eats any memory we give it and fails to heap limits. Their version of eclipse fails to start beyond 2GB heap.
From that description it isn't perfectly clear whether the stack problem is in the IDE or in the embedded executable. If the stack problem is in the executable, I'd be worried, but if it is in the IDE then it is more amenable to bugfixing.
Either way it is annoying.
This is no strawman, I do like the hardware it's the software I think needs work. On the other hand when I requested they add some functions to their DSP library they did, specfically henk. The I2C library was busted and I had to write it myself when I started, and they eventually fixed that.
Don't care; crap libraries are omnipresent Improving tools cannot help; you have to improve the developer.
Regardless of whether you care it is something they tout, seeing as there is no hardware support for common peripherals.
Given the number of "errata" for common processors/peripherals and the difficulty of working around those, the odd bit of library "infelicities" is relatively solvable. At worst you can write the library yourself.
The build-vs-buy for libraries is an omnipresent dilemma in all development.
Hell, 35 years ago I was approached to implement HDL libraries for TTL ics. The client didn't give a monkeys about whether or not the libraries were accurate, since all the marketing department had to do was allow their client to tick a box. I declined to provide a quote
I think they still have an issue with [[notification]] having issues when placed in an order that the compiler apparently doesn't like causing it not to function(this is one that I first reported). We had an issue that still remains where for some reason 2 tasks were unable to communicate normally. It would actually become unresponsive for about 4 hours then respond normally, wait 4 hours and respond again. The response I got was that it wasn't possible even if it was a timer issue because they loop every 41 seconds. They were unable to help with the issue and it required creating another thread just to facilitate data transfers back and forth. I never sent in a report on that because I didn't have time then to narrow down the cause and adapted the project to use the extra task.
Strange and concerning.
It is entirely possible (and easy ) to get livelock in multiprocess systems.
Were the source and receiver processes on the same or different tiles?
Were you using the synchronous or asynchronous comms mechanisms?
Could ayschronous source and receiver processes be operating at the same loop interval, and at exactly the wrong phase so that there was a memory clash of some sort?
Was it dependent on optimisation level?
Could anything "odd" be seen in the machine code instructions related to interprocessor comms?
This was an issue with synchronous communication across tiles using interfaces, the same as it is now but without an extra thread. I went back and looked and I had it slightly wrong. It was 4 hours to respond, and then it'd work normally. Like usual the data was passed by reference and memcpy'd since the compiler optimizes memcpy. This is now and always has been the proper way to do it.
You can't memcopy across tile boundaries since tiles have separate memories. Memcopy only works between cores in a tile.
It was totally independent of optimization level. Having tried to debug we were spitting data out over a couple uarts we monitored and time stamped. We got the message that it was making the call, and 4 hours later confirmation from the other thread. Neither thread was executing ANY code during that time. There were no other tasks trying to communicate with the threads(during debugging we had it cut down to just the 2 threads at one point) and the thread that took 1/6 a day to respond was busy about 800ns chunks every 10ms. That's it. The assembly didn't look abnormal. No warnings with -wall either. I suspect it was an issue with the scheduler but it was impossible at the time to verify. Unfortunately we don't get paid to find their bugs, we could have made some money charging by the hour.
The scheduler is a slightly nebulous concept in the xCORE devices, since much of the traditional RTOS functionality is in the hardware.
I can't make sense of those times (800ns/10ms/4hours) either!
There was about 2 years time where it would fail to define functions and you had to do it yourself in assembly or it would fail to compile and give you a page of error messages. They fixed this after 14.2.4 I think but none of our projects would compile on later versions without more work. They recommend -Os for everything because the dual issue optimizations are poor at best.
My experience is with 14.3.0 and the single issue processors, so I wouldn't have seen that.
There are so many problems with C optimisations for any processor/standard/compiler that I'm not surprised about there being problems with xC - since it is built using traditional C compilers.
Except in all their materials the numbers are inflated since you almost only get dual issue when hand writing assembly since the C almost never takes advantage of it. So 4000MIPS is really 2000MIPS. That's significant. The XS1 parts had a slight advantage in this way for lower core count designs since they could execute instructions up to 125MHz each "thread". The XS2 parts are limited to 100MHz max.
OK, that's significant - and disappointing.
I've only used XS1 devices, so I wouldn't have noticed.
Perhaps dual issue might still be practical in cases where, say, a DSP kernel can be hand-optimised.
You may have had no issues and that's great but they do have issues, despite the fact that a lot of their work goes back to the transputer days. I think it's a lot of talented people but unless you're specifically doing an audio based project, which you can use a lot of the code they take care to make sure is right, I don't think the cost is worth it to start a large or serious project. Time or money.
That's always a key decision!
Thanks for your points.
This wasn't even a large project, ~30k lines of code. These MCU's are really neat, somewhat unique, and relatively unknown. It's important not to look too deep into the marketing materials. We have no plans to stop using them but you can really find yourself in trouble if you are waiting for them to fix their tools and you're on a deadline without enough experience to know what to look at. However, not everyone will have issues just like you had none. We've had projects that had no real issues, we've also had designs that had to be totally redesigned due to issues though and that costs A LOT of time.
Yes, indeed it does. We've all seen that in other environments[1], and it is a shame to see it occurring in this environment.
It is always valuable to read such "war stories" from people that don't have a hidden agenda or prejudices. Thanks again.
[1] I have a tendency to make an estimate, and then keep the figure the same but use the next larger unit, e.g. 3 weeks -> 3 months. And every time someone asks me if it is accurate, I add 10% on principle