Author Topic: Why do people not like Microchip? (Read 76119 times)

T3sl4co1l · « **Reply #300 on:** May 03, 2022, 12:53:16 am »

AVR being "faster" I think was sarcasm, or hyperbole... maybe? I don't actually know the benchmarks offhand. But it's noteworthy that:
- PICs often run faster (64MHz+?), but have less MIPS/MHz (typical 4 cycle instructions?)
- PICs are terribly register constrained (everything is pulled through the W register; convention is to use 0-page in RAM for fast access, ala 8051, 6502, IIRC?) while AVR has absolute tons (32, though with some restrictions on which can be used for what)
- Both are load-store architecture (I think?), so, it takes a while to do much of anything in ASM, even the simplest operations in memory require several instructions. So compared to more powerful ISAs, both are relatively low performing.

Even as streamlined as AVR is, it has enough quirks that GCC is a bit of a force-fit. GCC's internal representation (GIMPLE) is 16-bit I think?, and AVR tends to be best used that way i.e. as a 16-bit machine hobbled with half word size. For example, GCC's ABI (application binary interface), register allocations, etc. go by pairs of registers; you'll never see it using e.g. r23 for two 8-bit function parameters, always r24 first (paired with r25, unused), then r22 next (paired with r23, unused), etc. That is, func(uint8_t a, uint8_t b) expects a and b in r24 and r22 respectively.

Or, kind of unsurprising, but somewhat-specialized instructions like MUL are largely library-driven, like multiplying two uint16_t's results in a call to __mulsi3 rather than inlining the arithmetic. (Only 8-bit ops are inlined, I think?) Which results in a ton of overhead when certain facts are known about the operands (like, an 8x16 is zero/sign-extended up to a 16x16 call), and makes repeated multiply-accumulate rather messy (sometimes the common registers are reused as if an accumulator, but often they get duplicated so you have e.g. the product in r22-r25, then summed into r18-r21). Even if you try to write out all the 8x8 component terms, it won't let you do that so easily, again because it assigns registers by 16-bit pairs, so you end up with a huge mess of erroneous sign extensions and shoving around arguments by one or two places, busywork that would all be trivially optimized out on a machine-level pass -- but again, it does optimization on IR so it doesn't know how to do this. (And I suppose, it's impressive that it does as well as it does, given this.)

AVR-GCC has been fairly lackluster over history as well; I'm not sure about the latest versions, but performance probably peaked around v4 to 5.4? Give or take whether you're using any (of the few) language features that have been added since then, of course. (There was one bugfix that I know of, or, knew of but I can't remember what it was actually in regards to at the moment(!).. but other than that, mostly just the added language features since then, with a tiny bit of loss probably due to pruning a few nonfunctional (or faulty!) optimization steps or something like that. Anyway, GCC 8+ is, not much slower, it's some percent difference, mind -- but it's rarely in the other direction, AFAIK.

Kind of as an example of how much effort it is to support new things -- there is one optimization that was finally added in GCC 8 I think. Normally, the ABI expects r1 = 0 as a shortcut. So, entering an interrupt, it always has to push and clear r1. Even if r1 is never read during the interrupt. And there was no way to inform the function prologue that it doesn't actually need to do that. Apparently what they did is create a pseudoinstruction, selected based on what's used in the function body. The assembler then converts this into the appropriate prologue/epilogue code. And this closed a ticket that was open for like, over a decade. So, not exactly easy to change things like this, given that GCC has to target literally almost everything, and there's probably just not too much priority for the AVR branch in general..?

But yeh, just a couple deeply technical things I've picked up in using and reading about it. Not at all to turn you off it or anything, just, a few things about how it works.

Or, not that this'll mean anything very much, if you aren't comfortable with instruction sets / assembler really... oh well?

Anyway, nothing that affects casual use of C, for the most part. Use volatile and ATOMIC and all that, as usual, as needed, for working with interrupts and stuff. (Trivia: the headers define system registers as volatile naked pointers.) A couple optimizations are easy enough, like invoking certain syntax to use IO port (in/out/cbi/sbi) instructions on respective registers (typically related to CPU, PORTs, or a few do-nothing general purpose registers you can sometimes use as fast RAM), which otherwise if you're not careful will just compile as SRAM access (at a couple cycles penalty).

As for Code::Blocks, that's what I use, works fine, pretty well set up for avr-gcc already. Have to add the programmer commands though. Never got debug to work on it? I don't have much for debug tools anyway so, not sure what the deal is, but kinda doesn't matter too much. Anyway, never had a problem with AVRDude, and the latest version adds UPDI support via dumb serial adapter so the AVR-DA etc. are accessible without special tools at all, very cool.

No clue what kind of PIC support might be found, besides extracting whatever assembler/compiler/programmer tools come in MPLAB and figuring out how to run them from outside. There's probably some 3rd party programmer tools you could look at (maybe that've been covered in this thread already idk). But yeh, anything comprehensive, one-stop adapter, good luck with that.

Tim

hans · « **Reply #301 on:** May 03, 2022, 07:32:06 am »

I've had the same problems with AVR-GCC's zero register. Why AVR has always kept with r0 is beyond me. At some point the ATMEGAs came in that added the MUL instruction, and that always places the result in r0/r1. Therefore everytime you do a multiply instruction, it needs push/pop r0, and before the pop it also has to move the multiply result in it's entirety out of the way

And MUL is more often than just writing a multiply sign. If you index an array of structs, and each element size is not 2^N sized, then that's best indexed with a multiply instruction.
Since r0 is a completely functional normal register, and the "zero register" is more a software convention, I don't see why it hasn't changed. Perhaps backwards compatibility for users that wrote a part of programs in ASM. But honestly that means there are millions of AVRs wasting dozens of cycles on push/popping r0 around a MUL.

Now PICs are practically still a lot slower. Since they have only a single register and a hardware stack, there is absolutely zero point porting the architecture to GCC that is completely built around software RAM stack frames. I suppose both SDCC and XC8 is full of static analysis to see which routines use how much data, which local variables don't have to life in time/space together etc.

If you look just at Coremarks, then you'll find that an AVR can get 0.5 - 1 Coremark/MHz, and a PIC18 doesn't get much farther than 0.1 Coremark/MHz. Granted some PICs can do 64MHz, while AVRs tend to max out at 20MHz.

Quote from: AaronD on May 02, 2022, 11:34:05 pm

Quote from: hans on May 02, 2022, 09:07:19 pm
...the protocol is baked into MPLAB or other programs that access the PICKIT.

The advantage of this is that every user-land program can do whatever it wants with it, in it's own way. For example, it's trivial to send commands to a PICKIT2 using a python library, and then the next second use the PICKIT2 in MPLAB again...
Quote from: hli on May 02, 2022, 10:03:52 pm
AFAIK MPLAB has always supported the current PicKit (and the previous one, to a certain degree) under all OSes. I'm using Linux for more than 10 years now exclusively, and never had issues. Yes, there are no _kernel level_ drivers, but MPLABX (the IDE, and the programming tool) always worked. Yes, a too-old MPLABX will not work with a too-new PicKit (and vice versa), but that the same under all supported OSes as well.

That might be what I was thinking of. It requires MPLAB specifically, to use a PicKit. No other IDE knows the protocol, and AFAIK there's no standalone command-line utility either. Yes, MPLAB is free and runs everywhere, but I really like to have the same IDE for everything if I can get it. So far, I've settled on Code::Blocks. Thus, my PicKit doesn't work.

Any command-line utility should work with any IDE, either as the compiler itself, or as a pre-build operation (gathering required resources), or as a post-build operation (distributing the binaries and/or programming the chip), or as a user-triggered "whatever" (also programming the chip, if you'd rather that be explicit like I do). XC8 is, so it's possible to use that in Code::Blocks or any other IDE, but AFAIK the PicKit "driver" isn't anything beyond MPLAB itself.

I personally use MPLAB IPE or MDB (Microchip Debugger) to do programming tasks only with the PICKIT. It's how I currently develop with PIC32s until I sort out GDB for the once (I got something working last weekend, but was only able to inspect the program location and not any variables)

MDB is also useful , as its basically MPLAB in the command line. You can tell it to run an automated set of commands, including runnning the simulator, or programming the device and run the code for a set amount of time.
You can also call MPLAB IPE from the commandline to have it program a PIC from a HEX file.

Nominal Animal · « **Reply #302 on:** May 03, 2022, 08:30:07 am »

Quote from: hans on May 03, 2022, 07:32:06 am

Why AVR has always kept with r0 is beyond me.

Because of avr-libc, and avr-gcc ABI.

That might sound snarky, but it is not my intention. The two are separate projects, and while there has been some coordination between the two, they have their own focus and aims. Basically, Atmel came up with the ABI. It got "frozen" in the two projects. (Note that while GCC does support e.g. 8-bit int for AVR, you cannot use that option with avr-libc.) Nobody has come up with an alternative that was preferable enough to warrant a completely new ABI. Just implementing it once is one thing, but maintaining it and dealing with the loss of compatibility to existing binaries, is the bigger issue. Plus, there are likely corner cases in the hundreds of AVR devices, that have been taken care of in avr-libc and avr-gcc, that would need to be mapped and reimplemented in the new ABI.

In other words, it is a "victory due to no competition" kind of thing.

JPortici · « **Reply #303 on:** May 03, 2022, 12:20:34 pm »

Quote from: seamusdemora on May 02, 2022, 10:25:05 am

Sorry - but I'm not quite clear on this... are you saying that one is better served using the 'sdcc' over the free version of 'xc8' ?

i would use SDCC only if it was the only compiler available for my target. In this particular case, XC8 optimization is more than good enough in free mode - paid optimization help reducing code size, but in reality that's more because of what XC8 calls procedural abstracion (i.e: find out which blocks of code repeat and put up a complex set of calls so that every block that is repeated is called only once. Won't help much for speed, good luck debugging that) rather than better optimization

XC8 supports new MCUs, XC8 has actual support, free and paid.

T3sl4co1l · « **Reply #304 on:** May 03, 2022, 01:22:34 pm »

In contrast, ARM has several competing ABIs; though I think none-eabi is the most common/popular/supported right now? The others may be older, or maintained for support on specific devices (no Thumb support? old library compatibility?), I don't know that much about ARM as yet. Anyway it's no surprise, ARM is much older and has more versions. Also while the core itself has been maintained by ARM, its integration has largely been customer driven, and so too the software, at least beyond basic (core compatible) libraries I would assume. So, any time you get a new OS, or compiler, or whatever, there's potential opportunity for tweaking the ABI, and a new contender emerges.

In contrast, the x86 ecosystem has almost entirely settled on the ancient Intel/IBM/MS convention, i.e. procedures begin with PUSH BP; MOV BP, SP and end with POP BP; RET [n] (with operands on the stack, n). Even in AMD64, where they had the opportunity to change much more (having more, general purpose, registers), they stuck with a similar convention. x86 I think, isn't generally available for integration? And it's only ever had, what, two or three OSs supported on it, though a host of compilers (but, they all have to target those OSs). So it's a very different ecosystem than ARM or AVR, yet still a very stable ABI, like AVR.

Tim

JPortici · « **Reply #305 on:** May 03, 2022, 02:05:21 pm »

Quote from: AaronD on May 02, 2022, 11:34:05 pm

That might be what I was thinking of. It requires MPLAB specifically, to use a PicKit. No other IDE knows the protocol, and AFAIK there's no standalone command-line utility either. Yes, MPLAB is free and runs everywhere, but I really like to have the same IDE for everything if I can get it. So far, I've settled on Code::Blocks. Thus, my PicKit doesn't work.

Any command-line utility should work with any IDE, either as the compiler itself, or as a pre-build operation (gathering required resources), or as a post-build operation (distributing the binaries and/or programming the chip), or as a user-triggered "whatever" (also programming the chip, if you'd rather that be explicit like I do). XC8 is, so it's possible to use that in Code::Blocks or any other IDE, but AFAIK the PicKit "driver" isn't anything beyond MPLAB itself.

you could use ipe-cmd for programming, which is what's running under the IDE/IPE anyway. IPE is just a gui that controls ipe-cmd.
However ipe-cmd seems to be another huge pile of crappily written java, there is so much latency inside it's unbeliveable. The actual programming is almost as fast as in programmer-to-go mode (80%-99% of theoretical speed depending on part and firmware image), but the set up and the time from pressing enter to actual execution.. gosh

westfw · « **Reply #306 on:** May 03, 2022, 10:19:06 pm »

Quote

Both are load-store architecture (I think?)

Nope. The 8bit PICs are Accumulator/Memory architecture. Although the amount of "memory" (which PIC calls "file registers") at one time is somewhat limited (and memory is "banked", which is a pain for C to deal with.)

Quote

[on AVR] somewhat-specialized instructions like MUL are largely library-driven, like multiplying two uint16_t's results in a call to __mulsi3 rather than inlining the arithmetic

Does not.

---
I wish people would stop posting data based on long-obsolete compiler versions. The XC8 "free" version has improved since its initial releases. avr-gcc also changes from version to version, sometimes in good ways, sometimes for the worse. The 8-bit PIC architecture was designed before C was popular, and is spectacularly unfriendly to "typical" C compilers; it's something of a miracle that the available C compilers work as well as they do.

Which is "faster" is largely irrelevant. You don't need "the fastest" cpu in your application, you only need one that is "fast enough." (something that has become really clear with the advent of cheap ARMs that are "clearly" much faster than 8bit CPUs.)
PICs have really clear architectural "problems", and while the AVRs have "less" such problems, they're more "hidden." (Of the "General Purpose Registers" only half work with "immediate" instructions, there are only 4 register pairs, and only 2 real "index" registers (well, sort-of 2.5) There are special instructions for accessing "some" of the peripheral registers, but as the number of those has increased, many are no longer accessible by the special instructions.) They all have "warts." (and don't get me started on the Cortex-M0 warts and gcc deficiencies!)

jpanhalt · « **Reply #307 on:** May 03, 2022, 11:54:34 pm »

Quote from: westfw on May 03, 2022, 10:19:06 pm

Although the amount of "memory" (which PIC calls "file registers") at one time is somewhat limited (and memory is "banked", which is a pain for C to deal with.

Current 8-bit PIC's (e.g., 12F1xxx, 16F1xxx and later) have the entire user RAM mapped to "linear RAM," which acts like contiguous space, but it is not. Common RAM is spared and is still usable in its entirety. Linear RAM is usually accessed using the FSRx registers and no bank switching is needed. I still use banked RAM for working registers. That may be a pain for C, but not for Assembly, as one should know what bank is being used. I use linear RAM for buffers, e.g., a FIFO or GLCD screen buffer.

T3sl4co1l · « **Reply #308 on:** May 04, 2022, 12:35:00 am »

Quote from: westfw on May 03, 2022, 10:19:06 pm

Quote
Both are load-store architecture (I think?)
Nope. The 8bit PICs are Accumulator/Memory architecture. Although the amount of "memory" (which PIC calls "file registers") at one time is somewhat limited (and memory is "banked", which is a pain for C to deal with.)

Oh yeah, not to forget banking..!

Quote

Quote
[on AVR] somewhat-specialized instructions like MUL are largely library-driven, like multiplying two uint16_t's results in a call to __mulsi3 rather than inlining the arithmetic
Does not.

---
I wish people would stop posting data based on long-obsolete compiler versions.

I... use avr-gcc 8.1.0, it's not exactly obsolete? It generates exactly these calls. I don't know offhand exactly when it's able to inline the calls or do any optimizations, but the claims are consistent with the observations.

Quote

Which is "faster" is largely irrelevant. You don't need "the fastest" cpu in your application, you only need one that is "fast enough." (something that has become really clear with the advent of cheap ARMs that are "clearly" much faster than 8bit CPUs.)
PICs have really clear architectural "problems", and while the AVRs have "less" such problems, they're more "hidden." (Of the "General Purpose Registers" only half work with "immediate" instructions, there are only 4 register pairs, and only 2 real "index" registers (well, sort-of 2.5) There are special instructions for accessing "some" of the peripheral registers, but as the number of those has increased, many are no longer accessible by the special instructions.) They all have "warts." (and don't get me started on the Cortex-M0 warts and gcc deficiencies!)

Speaking of "faster", if CPU speed is an issue -- obviously, none of these 8-bitters will be what you need!

Tim

JPortici · « **Reply #309 on:** May 04, 2022, 05:34:58 am »

Quote from: westfw on May 03, 2022, 10:19:06 pm

I wish people would stop posting data based on long-obsolete compiler versions. [snip loads of truth]
Which is "faster" is largely irrelevant. You don't need "the fastest" cpu in your application, you only need one that is "fast enough." [Other loads of truth] They all have "warts."

but mooooom i want to play microcontroller wars

/ hold on somebody on the internet is WRONG
pick whichever you prefer

westfw · « **Reply #310 on:** May 04, 2022, 05:39:08 am »

This is avr-gcc 5.4, the most recent "vendor supported" version:

Code: [Select]

WWHackintosh<2266> cat multiply.c #include <avr/io.h>

volatile int x,y,z;

int main() {
  z = x * y;
}
WWHackintosh<2267> avr-gcc -Os -mmcu=atmega328p -g multiply.c -nostartfiles
WWHackintosh<2268> avr-gcc -Os -mmcu=atmega328p -g multiply.c -nostartfiles
WWHackintosh<2269> avr-objdump -S a.out

00000010 <main>:
  z = x * y;
  10:    40 91 00 01     lds    r20, 0x0100    ; 0x800100 <_edata>
  14:    50 91 01 01     lds    r21, 0x0101    ; 0x800101 <_edata+0x1>
  18:    20 91 04 01     lds    r18, 0x0104    ; 0x800104 <y>
  1c:    30 91 05 01     lds    r19, 0x0105    ; 0x800105 <y+0x1>
  20:    42 9f           mul    r20, r18
  22:    c0 01           movw    r24, r0
  24:    43 9f           mul    r20, r19
  26:    90 0d           add    r25, r0
  28:    52 9f           mul    r21, r18
  2a:    90 0d           add    r25, r0
  2c:    11 24           eor    r1, r1
  2e:    90 93 03 01     sts    0x0103, r25    ; 0x800103 <z+0x1>
  32:    80 93 02 01     sts    0x0102, r24    ; 0x800102 <z>

Nominal Animal · « **Reply #311 on:** May 04, 2022, 05:44:12 am »

Quote from: T3sl4co1l on May 03, 2022, 01:22:34 pm

In contrast, the x86 ecosystem has almost entirely settled on the ancient Intel/IBM/MS convention, i.e. procedures begin with PUSH BP; MOV BP, SP and end with POP BP; RET [n] (with operands on the stack, n).

No.

There are two main ABIs: Microsoft/Intel and SYSV. While most people are familiar with the MS one because that's what they had at home as kids, the SYSV ABI is what the proper commercial systems (mostly Unix variants and compatibles at that time) used.

Quote from: T3sl4co1l on May 03, 2022, 01:22:34 pm

And it's only ever had, what, two or three OSs supported on it

No, quite a few more.

MS-DOS, CP/M-86, Novell Netware, OS/2, BeOS, FlexOS, Solaris, Linux and Android, OpenBSD, FreeBSD, DragonflyBSD, NeXTSTEP, and VxWorks, at least. I'm sure there are four or five commercial Unix variants missing from the list, too.

Both Via and AMD produced automotive x86 boards over a decade before Intel NUCs (released in 2013). I'm not sure exactly what they were used for (aviation entertainment? automotive enthusiasts? I dunno), but things like PicoPSUs were easily available in 2006, for example. (PicoPSU is an x86 and x86-64 compatible DC-DC power supply with 12V input, in an ATX-compatible form factor. ATX is a motherboard standard.) The Mini-ITX form factor was developed exactly for such use cases. I know, because I had a VIA one in 2006 or so, just for tinkering.

The x86 history is much more vibrant than might appear to a casual MS-DOS/Windows PC user!

westfw · « **Reply #312 on:** May 04, 2022, 05:56:23 am »

__mulsi3 in avr-libc does a 32x32bit multiply, and while it's a function, it still uses the MUL instruction internally:
https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/avr/lib1funcs.S#L422

(and I get the same code I posted above, even with gcc 8.1)

hans · « **Reply #313 on:** May 04, 2022, 08:31:57 am »

https://godbolt.org/z/qoE9f1x34

Seems fine to me, don't even need an optimizer. However, if you don't tell GCC what MCU you're using, then it will call the library function __mulhi3, which is a non-mul instruction multiply.

T3sl4co1l · « **Reply #314 on:** May 04, 2022, 09:44:11 am »

Aha, think I was thinking of mostly 32 bit operations: see this monstrosity for example,
https://godbolt.org/z/b6nr8xEeK
(from https://github.com/T3sl4co1l/Reverb/blob/master/dsp.c , contrast with https://github.com/T3sl4co1l/Reverb/blob/master/asmdsp.S#L529 which runs some 2-3 times faster.)

Strange, I must not use 16x16 = 16 operations much then! That explains the confusion.

Quote from: Nominal Animal on May 04, 2022, 05:44:12 am

Quote from: T3sl4co1l on May 03, 2022, 01:22:34 pm
In contrast, the x86 ecosystem has almost entirely settled on the ancient Intel/IBM/MS convention, i.e. procedures begin with PUSH BP; MOV BP, SP and end with POP BP; RET [n] (with operands on the stack, n).
No.

There are two main ABIs: Microsoft/Intel and SYSV. While most people are familiar with the MS one because that's what they had at home as kids, the SYSV ABI is what the proper commercial systems (mostly Unix variants and compatibles at that time) used.

Hmm lemme see here... yeah they're all using BP for stack frame. I gave that aspect as example of what level of generality I was thinking of. Which.. if I'm not mistaken, actually all the ABIs listed on Wikipedia use the BP motif then? Which honestly I'm a bit surprised to see, but I guess it's just that practical to use.

Is... is it not actually useful to speak of "ABI"s in such general terms? Like... I don't write compilers, I don't care what it is, I look at the spec sheet every time I'm writing ASM alongside compiled code. It could have params/returns from a different set of registers just about every time and I might not notice. Is this a thing people are actually extraordinarily picky about? Computers themselves, yes of course, if it doesn't match exactly that's a problem; but I'm a people here, talking to other people?

Quote

No, quite a few more.

MS-DOS, CP/M-86, Novell Netware, OS/2, BeOS, FlexOS, Solaris, Linux and Android, OpenBSD, FreeBSD, DragonflyBSD, NeXTSTEP, and VxWorks, at least. I'm sure there are four or five commercial Unix variants missing from the list, too.

Well yeah, like I said, about three?

MS-DOS is just whatever, as well as anything earlier. CP/M was... oh, they used zero-page calls wasn't it? And just kinda went with it, porting to 8086. Which became the uh, 0-0xff program header thing in DOS. Or did they use soft INT calls like MS-DOS too? I don't know. And the MS-DOS and BIOS INTs always struct me as an odd mish-mash of register usage; but whatever, they can do it however they want, it's an interrupt not even a call, it has to return very clean aside from the couple things it returns.

So then of the, let's see, OS/2, MS-DOS, Windows (granted, win16 and win32 APIs are pretty different), large similarities there; BeOS, what was, oh yeah I've heard of that, also a "commercial failure" so... it existed? Sure, but like, all those shitty ASM programs I wrote years ago, exist... do I get to invent ABIs for them too? What's the threshold here? And what, Solaris, Linux, Android, FreeBSD, NeXTSTEP---hey wait, that wasn't even x86, that was 68k wasn't it?! -- all share that common *nix thread. Maybe that's not common enough to share ABIs, I don't know. I would imagine they pick up common features from their respective platforms.

Oh I see, NeXT did actually have a i386 etc. port. Like..why?..

Tim

Nominal Animal · « **Reply #315 on:** May 04, 2022, 11:22:37 am »

Quote from: T3sl4co1l on May 04, 2022, 09:44:11 am

Hmm lemme see here... yeah they're all using BP for stack frame. I gave that aspect as example of what level of generality I was thinking of.

Ah, okay. Yes, that is a result of the 80x86 instruction set and register architecture (see e.g ref.x86asm.net).
The idiom is so deep that Intel later added the instructions ENTER and LEAVE to construct and tear down the stack frame using EBP and ESP exact that way.

Basically, the sum-of-two registers indirect addressing mode only supports four register combinations: [BX+SI], [BX+DI], [BP+SI], and [BP+DI]. BX register is one of the four that is split into two sub-register halves, BL and BH. SI and DI have auto-incrementing addressing modes, and are used for other indirect addressing modes. That basically leaves BP for the stack frame. (SP being the stack pointer itself.)

However, GCC for example does support -fomit-frame-pointer, and Intel Compiler Collection supports /Oy, which both omit reserving BP for the stack frame, if at all possible.

Quote from: T3sl4co1l on May 04, 2022, 09:44:11 am

Is... is it not actually useful to speak of "ABI"s in such general terms?

Welp, I don't usually go into the details, no; they're not that useful, really. The differences on x86 are quite small, anyway.

On x86-64, the calling conventions differ significantly. In particular, MS calling convention places the four first integer or pointer parameters in registers, with up to four XMM registers for floating-point and SIMD parameters. SYSV places the six first integer or pointer parameters in registers, and up to eight XMM registers. Any additional parameters will be put on the stack, which for very "hot" functions, can cause a notable slowdown.

(In other words, that for portable efficient x86-64 code, one prefers to keep the number of integer or pointer parameters per C call to four or less, plus up to four floating-point or xmm parameters; but for non-Windows C code, up to six plus eight is fine. That's about it. I'd need to re-check for C++, but I typically just assume the standard library uses a hidden pointer for the object, but otherwise uses a similar calling convention.)

Quote from: T3sl4co1l on May 04, 2022, 09:44:11 am

Well yeah, like I said, about three?

I mostly pointed out those because it is a whole other culture (Unix, VxWorks, and the much rarer embedded folks) that wasn't Microsoft on x86, and those who did use DOS or Windows way back when, often erroneously think that that was all that 80x86 was used for. No, there was a LOT more.

Moreover, there was a lot of standardization efforts outside the Microsoft enclave, especially Single UNIX Specification that became POSIX (IEEE Std 1003.1).

Quote from: T3sl4co1l on May 04, 2022, 09:44:11 am

Or did they use soft INT calls like MS-DOS too?

That varies between the ABIs/calling conventions. On the 8086, 80186-80486, the int instruction is basically the easiest way to do a "syscall".
x86-64 (AMD64 instruction set) provides a separate syscall assembly instruction.

In the MS-DOS era, it was common for each compiler to have their own calling convention; there wasn't an ABI per se, except maybe for BIOS and such.
One of my first proper paid programming gigs in the early-mid nineties was to copy-protect a commercial MS-DOS program, so that each copy was different, and contained a difficult-to-replace version identifier. A key part of that was that I replaced the EXE relocator code with my own hand-written one, which modified (not frobnicated, but close) basically all of the code, and failed if the key in the file itself did not match what the relocator code incidentally produced. (Basically, each serial number was then a random number.) The funky part of that job was that I didn't get any access to the source code (talk about paranoid people!), and I actually wrote most of the assembly code using MS-DOS debug.exe.
In other words, even the "run-time linker" itself was provided by the compiler, and not the OS.

It is important to note that the x86 BIOS, too, became much more complex when ACPI came around. It is nothing like "load some registers and then run int 0xHH anymore.

Quote from: T3sl4co1l on May 04, 2022, 09:44:11 am

Maybe that's not common enough to share ABIs, I don't know.

Actually, the commonalities were more due to a desire for compatibility, especially in the C compilers. The 86open (that started in 1997) is what brought Linux so close to Unix. Their work completed in 1999, when the ELF file format was chosen for executables and dynamic libraries across most operating systems running on x86. Even Intel participated, and started efforts that eventually made their Intel Compiler Collection compatible enough with GCC to be used in SYSV ABI OSes on x86. At and after the turn of the century, if the OSes used the same calling convention, you could run binaries for one operating system on a completely different operating system, using a simple "shim" layer in between!

How many of you have heard of Open64? It is the GPL-licensed C compiler AMD provided for the new AMD64 (x86-64) instruction set, and what Nvidia used to optimize code in their CUDA toolchain. It continued the standardization path that bridged different companies and open source projects in the nineties on x86, and in some ways culminated in ISO C99 and POSIX.1-2001 (IEEE Std. 1003-2001). Around that time, Microsoft saw the existing standardization efforts as their opponent, and did all sorts of nefarious stuff, that basically stopped any further cross-platform standardization for well over a decade. (Even their brightest star, the Annex K "safe" bounds-checking interfaces (the ones with the _s suffix you see in Microsoft documentation, is being discussed as to-be-removed in a future version of the C standard, as there isn't any real use for them.)

In a very real sense, a lot of positive momentum –– not towards a single goal, but towards interoperability and modularity and freedom of choice –– was lost soon around the turn of the century. We might be at the point where we have most of that back, but I'm not sure (as in I do not know; I do not have enough information to be sure one way or another).

Even GCC development stagnated horribly, becoming a cult-like gathering of "if you don't have a PhD in Computer Science, we shall not read your messages, you peon" self-aggrandizing idiots. (I believe it was a big reason why AMD supported Open64 so much more than GCC.) I guess it was the appearance of LLVM and Clang, and a new generation of developers, that managed an internal change in GCC. It's still a slow and cumbersome to participate in, but it's definitely better than it was at some point! And GCC is significant, because not only does it support a huge swath of different hardware, but it has quite a few language backends, too.

So, yeah; many of the choices done were compromises, but they were done for a good reason, to achieve at least the possibility of interoperability across operating systems on x86.

westfw · « **Reply #316 on:** May 04, 2022, 05:30:54 pm »

Quote

see this monstrosity for example,
https://godbolt.org/z/b6nr8xEeK contrast with https://github.com/T3sl4co1l/Reverb/blob/master/asmdsp.S#L529 which runs some 2-3 times faster.)

Well, yes. Of course...

Code: [Select]

accum = accum + (int32_t)*ptr1++ * *ptr2++;vs

Code: [Select]

; r25:r24:r23:r22 += mul_signed(r21:r20, r19:r18)Those do substantially different things (logically, not mathematically!)
C in general does a particularly poor job of "mixed size arithmetic" - it doesn't have the concept, and has rules that say it can't.
(IMNSHO, this provides one of major excuses for "dropping into assembly language" in a C program. (another is "access to carry or other status bits")

Nominal Animal · « **Reply #317 on:** May 04, 2022, 06:37:03 pm »

Quote from: westfw on May 04, 2022, 05:30:54 pm

(IMNSHO, this provides one of major excuses for "dropping into assembly language" in a C program. another is "access to carry or other status bits")

Agreed; in fact, A_m×B_n+C_k with n,m,k referring to different sizes in bits of the terms, is a very typical use case for me to use GCC/Clang extended asm. The case where k=m+n is the most common, obviously, but fixed-point variants (with rounded result) can be extremely useful, too.

On architectures where the multiplication instruction can take arguments in different registers, extended asm in an inlined helper function means the compiler can choose the registers used at each call site, and generate pretty darned good code around the assembly snippet.

SiliconWizard · « **Reply #318 on:** May 04, 2022, 07:52:00 pm »

For carry, I've used GCC's builtin functions such as '__builtin_add_overflow' and the like.

Nominal Animal · « **Reply #319 on:** May 05, 2022, 07:42:42 am »

I have as well. Also, using vector extensions, defining a SIMD vectorized data type via something like
typedef base-type vectorized-type __attribute__((vector-size (N * sizeof (base-type))));
(as an array of N base-type components per vectorized-type variable or member) does not actually require hardware SIMD support, nor SIMD support for that component count; but the compiler will generate very good code, using SIMD if possible, for basic algebraic operations (addition, subtraction, multiplication, and division) done component-wise in the vectorized data type.

That is,

Code: [Select]

typedef  float  vec4f __attribute__((vector_size (4*sizeof (float))));

static inline vec4f  vec4f_mul_add(const vec4f a, const vec4f b, const vec4f c) { return a*b + c; }

generates SSE/AVX code on x86-64, NEON code on ARMs with NEON vector extensions, and so on. On architectures that support a pair of floats (like say MMX on Pentium 5), the compiler implements it as a pair of vectorized values. Nice.

You can also treat it as an array. For example, a dot product operation using the same type could be written as

Code: [Select]

static inline float  vec4f_dot(const vec4f a, const vec4f b) { a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3]; }

(For those wondering, the inline above does not actually tell the compiler anything; it does not force the code to be inlined at all. It is just that to reduce cognitive load, I use static to denote ordinary functions useful in the current file scope only, and static inline to denote simple helper and accessor functions. That is, the presence of inline there is only a hint for us humans, and irrelevant to current C compilers. I often replace it with a preprocessor macro, something like HELPER_FUNCTION or HELPER, that expands to appropriate compiler/OS-specific definition; often into __attribute__((unused, always_inline)) static inline, to denote that it should be inlined and that it is okay/normal if the function is never used, in which case the compiler does not need to bother to emit code for it at all.)

This is supported by GCC, Clang, and Intel Compiler Collection at least, but is not standard C at all.

westfw · « **Reply #320 on:** May 05, 2022, 08:45:26 am »

Quote

I've used GCC's builtin functions such as '__builtin_add_overflow'

Huh. I didn't know that existed. It looks neat for the multiply/accumulate sort of things that DSP uses, but my goto application is an IP checksum, where you can fold the ones-complement wrap-around carry into the add of the next word, with a bit of care.

Code: [Select]

| (old MIT 8086 cross-assembler syntax.  Sorry about that!)

        xor     bx,bx           | initial value for xsum, clear carry

lpchk:  lodw                    |get next word          1 byte, 16 cycles
        adc     bx,ax           |overlap adding last CY 2 bytes, 3 c
        loop    lpchk           |next word              2 bytes, 17 c

        mov     ax,bx           | put where results have to go
        adc     ax,*0           | add final carry bit.

I don't see how to do that with the gcc built-ins. (I guess it additionally depends on having a loop counter that you can decrement without affecting Carry...)

NorthGuy · « **Reply #321 on:** May 06, 2022, 01:30:38 pm »

C has lots of built-ins for using CPU specific commands, but I find it much easier to write assembler directly.

SiliconWizard · « **Reply #322 on:** May 06, 2022, 05:48:26 pm »

Quote from: NorthGuy on May 06, 2022, 01:30:38 pm

C has lots of built-ins for using CPU specific commands, but I find it much easier to write assembler directly.

Depends on your perspective.

But some of the built-ins, such as the ones I mentioned, are not CPU-specific, which makes them portable (as long as you use GCC, and I bet a number of them are also supported by Clang).
I've written an arbitrary precision arithmetic library, and it's portable. It uses this kind of built-ins for some operations. I didn't write any CPU-specific code, and I'll get better performance than if writing pure C only - but of course, writing some critical parts in pure assembly would be more efficient (I think that's what GNU GMP does), but maintenance and portability would be annoying.

Simon · « **Reply #323 on:** May 08, 2022, 07:15:52 am »

Microchip seem to like shooting themselves in the foot. They abandoned their IDE and are trying to make people abandon the atmel IDE. MPLABX is a massive nightmare. It feels like no two installations will ever be identical and it constantly demands to have you update little packs. It does not ooze confidence. Their code configurator in MPLABX is awful, I won't even touch it, atmel studio has a code configurator that looks much better but clearly they intend to just abandon it, links to documentation don't work. So I am using a basic ARM chip because it is all I could get hold of and writing my own initialization code. I'm only using a microchip/atmel chip because it's what I could get hold of at the time, I did not choose it because I like their idea of not support.

Pineapple Dan · « **Reply #324 on:** May 20, 2022, 04:12:30 pm »

Quote from: Simon on May 08, 2022, 07:15:52 am

Microchip seem to like shooting themselves in the foot. They abandoned their IDE and are trying to make people abandon the atmel IDE. MPLABX is a massive nightmare. It feels like no two installations will ever be identical and it constantly demands to have you update little packs. It does not ooze confidence. Their code configurator in MPLABX is awful, I won't even touch it, atmel studio has a code configurator that looks much better but clearly they intend to just abandon it, links to documentation don't work. So I am using a basic ARM chip because it is all I could get hold of and writing my own initialization code. I'm only using a microchip/atmel chip because it's what I could get hold of at the time, I did not choose it because I like their idea of not support.

Their code configurator indeed isn't particularly great but it is an absolute dream compared to setting up a simple GPIO pin in Zephyr


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Why do people not like Microchip? (Read 76119 times)

Share me