am surprised mfrs are keeping variable length instructions these days, must play merry hell with the prefecher/pipeline.
Arm very probably would not be the first they are today without Thumb/Thumb2. Being able to run at 1 CPI off 16 bit wide RAM with no cache was critical to them getting the Nokia deal, and no doubt all the other "feature phones".
You should see how the deeply embedded world screams if your RISC-V code size is even 5% or 10% bigger than what they're already using (Cortex-M)! "Our ROMS are full -- we can't use your ISA if we have to take features out and we want to reduce the BoM not increase it". Hence recent extensions aimed specifically at embedded, adding things such as 2-byte instructions for load/store byte, table-jump, and push/pop multiple registers.
There are a few designs meant for controlling things in an FPGA soft core, but I have not yet seen any RISC-V microcontroller vendor -- no matter how tiny their chip -- choose to leave out the 2-byte instructions. Even the $0.10 CH32V003 drops the CPU registers from 32 to 16 to save space, but they keep the variable length instructions. At 48 MHz they can run one of the 2-byte instructions per clock cycle, while taking two clock cycles to run the 4-byte instructions -- limited by fetch speed from flash, which I think is 4 bytes wide but takes two cycles to access.
The corresponding $0.10 PUYA PY32 Arm-based microcontroller uses ARMv6-M which is very cut down compared to ARMv7 but is still also mixed instruction length -- it's Thumb1 plus half a dozen 4-byte instructions to make a viable stand-alone ISA.
Still unsure what is the real advantage of having a 64 bit processor for most embedded, small application projects...
None, if you've got single digit KB to a few hundred KB of RAM and flash. But you might as well If you've got 10s of MB or more. You don't have to be feeling cramped in 4 GB before you switch -- and in particular if you want to ensure you've got room to grow your application without having to switch ISAs again.
more memory used, more cache required in hand, more of everything for the occassional handling of 64 bit data...
Very little more. The code size for 32 and 64 bit is identical (at least in RISC-V, near enough). Structures and arrays containing pure data don't change in size -- only if you put pointers in them, which pretty much implies you're using a heap, which you don't do anyway on tiny machines. Most sensible data structures have a lot more data in them than pointers, which are overhead whether they are 2, 4, or 8 bytes. The biggest change is doubling the size of saved registers and return addresses on the stack. Who cares when you've got a singe-digit KB stack on your many-MB machine?
I guess for my applications, I see the amount of stuff you can hold in a single cache line reduces, so need more cache. cache misses hurt.
Tiny machines don't have data cache, they have SRAM. But see the previous point "sensible data structures keep the percentage used for pointers low anyway". Slightly non-tiny machines start to have icache, especially to help running from flash. See "32 bit and 64 bit code size is essentially the same".
OK, great for number crunching, but isnt heavy number crunching these days best dealt with long vector instructions ?
I guess doubles are handled natively in single cycles with a 64bit processor... one advantage.
The processors that are the main subject of this thread of course have 512 bit vector registers. And they're running dual-issue at 1 GHz. Inside spacecraft is an "embedded" use, but the CPUs aren't all that small. This is not an RP2040 competitor. (also ARMv6-M)