Author Topic: Microchip announces PIC64 ... and it's RISC-V. (Read 8698 times)

EverydayMuffin · « **Reply #25 on:** July 09, 2024, 09:55:05 pm »

Quote from: westfw on July 09, 2024, 09:47:40 pm

I don't know if I "trust" Microchip to design "Application processor" style chips (ie 500MHz+, "designed to run linux")

It might end up with something important "missing" (like RPi's rp2040 (in the other direction): "oh - you want some sort of code protection? Huh.")

Microchip already has a line of Application-class processors capable of running Linux.

SAM9 (Arm9), SAMA5 (Arm Cortex-A5) and SAMA7 (Arm Cortex-A7).

The PIC64-GX isn't even their first chip using this RISC-V processor core. PolarFire SoC is Linux capable and has been around a few years.

PCB.Wiz · « **Reply #26 on:** July 09, 2024, 09:57:31 pm »

Quote from: westfw on July 09, 2024, 09:47:40 pm

I don't know if I "trust" Microchip to design "Application processor" style chips (ie 500MHz+, "designed to run linux")

It might end up with something important "missing" (like RPi's rp2040 (in the other direction): "oh - you want some sort of code protection? Huh.")

Very different markets and very different customers.
See

https://edacafe.com/nbc/articles/1/2080615/Microchip-Unveils-Industrys-Highest-Performance-64-bit-HPSCMicroprocessor-MPU-Family-New-Era-Autonomous-Space-Computing

brucehoult · « **Reply #27 on:** July 09, 2024, 10:16:37 pm »

Quote from: EverydayMuffin on July 09, 2024, 09:55:05 pm

The PIC64-GX isn't even their first chip using this RISC-V processor core. PolarFire SoC is Linux capable and has been around a few years.

Yes, the Microchip/Microsemi PolarFire SoC uses the SiFive U54-MC core complex (five cores and cache) inside an FPGA, much like Zynq uses Arm. A dev kit ("Icicle") using an early run 250k LE chip shipped in July 2020 and mass production, including versions down to 23k LEs shipped a couple of years later.

Microsemi also made a specialised I/O expansion board for the HiFive Unleashed -- which debuted the U54-MC core complex -- in June 2018.

It's not really all that surprising Microsemi decided to make a zero-FPGA LEs version of the PolarFire SoC. The surprising thing is that Microchip decided to call it "PIC64".

glenenglish · « **Reply #28 on:** July 09, 2024, 10:43:28 pm »

Does anyone know what semiconductor node it is manufactured on ? Can get some idea from the VddCore...
We're STILL waiting for Polarfire2.....

brucehoult · « **Reply #29 on:** July 09, 2024, 11:31:01 pm »

Quote from: glenenglish on July 09, 2024, 10:43:28 pm

Does anyone know what semiconductor node it is manufactured on ? Can get some idea from the VddCore...
We're STILL waiting for Polarfire2.....

“PIC64GX is 28nm from UMC, and the PIC64-HPSC is fabricated using GlobalFoundries' 12nm 12LP+ process node"

https://www.electronicsweekly.com/news/products/micros/microchip-picks-risc-v-to-go-64bit-but-will-do-64bit-arm-too-2024-07/

625 MHz is very conservative for 28nm! The same cores in the FU540-C000 in 2018 were also in 28nm. It was designed for "up to 1.5 GHz", though out of the box it's set to 1.0 GHz. I've always run mine at 1.45 GHz and never had a problem (in a temperature-controlled domestic environment, not in industrial conditions)

glenenglish · « **Reply #30 on:** July 10, 2024, 12:01:59 am »

really ?
I'm never considered 625 MHz conservative for 28nm, although I am used to FPGA structures, not ASIC structures like this. . I would consider 625MHz 28nm fairly fast stuff in fpga world.... Depends on the dielectric etc, whether low static is an issue, speed etc tradeoffs.
I wonder what Polarfire2 is going to be . I think its 16nm. 28nm polarfire isnt fast (380 MHz multipliers) otherwise I'd be using it.

brucehoult · « **Reply #31 on:** July 10, 2024, 01:16:20 am »

Quote from: glenenglish on July 10, 2024, 12:01:59 am

really ?
I'm never considered 625 MHz conservative for 28nm, although I am used to FPGA structures, not ASIC structures like this. . I would consider 625MHz 28nm fairly fast stuff in fpga world.... Depends on the dielectric etc, whether low static is an issue, speed etc tradeoffs.

Well, yes, FPGA is different. You've got quite a lot of gate delays in every LUT -- something like 6 ASIC gate delays for a Xilinx LUT6 I think. That's maybe not too bad compared to random logic when you actually have a single output calculated from 6 inputs, but I suspect a lot of the time they're actually used to implement one 2 input function and an independent 3 input function which is probably two gate delays in random logic. So you're likely losing a factor of three in propagation delay / MHz right there. And then there are the routing delays which are likely to be a lot bigger than in an ASIC.

SiliconWizard · « **Reply #32 on:** July 10, 2024, 01:57:21 am »

For sure. To get an idea of what you can achieve on a 28nm process in terms of CPU performance, you can have a look at Intel CPUs. You'll have to look back. I don't think Intel used a 28nm process though (that I could find), but the closest would be 32nm for its second-gen Core i7. The i7-2700K was running at up to 3.90 GHz. Yes, on 32nm. And yes, an Intel Core i7, immensely more complex logic-wise than any RV64 core.

brucehoult · « **Reply #33 on:** July 10, 2024, 02:24:49 am »

Quote from: SiliconWizard on July 10, 2024, 01:57:21 am

For sure. To get an idea of what you can achieve on a 28nm process in terms of CPU performance, you can have a look at Intel CPUs. You'll have to look back. I don't think Intel used a 28nm process though (that I could find), but the closest would be 32nm for its second-gen Core i7. The i7-2700K was running at up to 3.90 GHz. Yes, on 32nm. And yes, an Intel Core i7, immensely more complex logic-wise than any RV64 core.

More complex, yes, but the SG2380 due later this year is expected to have around Nehalem or Sandy Bridge performance -- maybe more like Core i3 (no turbo) than Core i7. We'll see. It's on a 14nm process.

The JH7110 found in half a dozen different SBCs and laptops now is 28nm and 1.5 GHz. The SpacemiT K1/M1 is also 28nm and runs at 1.6 GHz in the plastic package and 2.0 GHz in the metal package (they claim it runs cooler than the 1.6 GHz at the same load). The die appears to be designed for 2.4 GHz and maybe that's doable with an actual heatsink & fan.

The difference I believe is in the sophistication of manual or semi-manual layout use by Intel, and the engineer-hours expended on that, vs the automatic methods used by RISC-V vendors.

Another data point is that the Pentium III and PowerPC G4 were hitting around 1.2 GHz speeds on 180nm, while the first RISC-V chip, the FE310, did 320 MHz, and Cortex-M3/M4 on 180nm is generally 180 MHz.

Shorter pipelines also play a part in the Arm and RISC-V designs. Pentium III has 10 pipeline stages, while the last G4s had 7 (early ones had 4). Cortex M3/M4 have 4 pipeline stages, the E31 in the FE310 has 5.

SiliconWizard · « **Reply #34 on:** July 10, 2024, 02:53:50 am »

Quote from: brucehoult on July 10, 2024, 02:24:49 am

The difference I believe is in the sophistication of manual or semi-manual layout use by Intel, and the engineer-hours expended on that, vs the automatic methods used by RISC-V vendors.

Yes, that and the fact Intel have their own optimized logic cell libraries, compared to using just whatever you're given with some PDK.

But anyway, 2GHz+ should definitely be achievable with an optimized RISC-V design on a 28nm node, IMO.

Regarding pipelines, yes, but keep in mind that while Intel went for ridiculously long pipelines with the Pentium 3 and much worse, Pentium 4, they have reversed the trend with the Core series. I don't remember the number of stages on the last gen Core iX, but it's not that big anymore. Even on a 2nd gen core i7, I think it was much shorter than on the (infamous) P4.

brucehoult · « **Reply #35 on:** July 10, 2024, 05:27:51 am »

Quote from: SiliconWizard on July 10, 2024, 02:53:50 am

Regarding pipelines, yes, but keep in mind that while Intel went for ridiculously long pipelines with the Pentium 3 and much worse, Pentium 4, they have reversed the trend with the Core series. I don't remember the number of stages on the last gen Core iX, but it's not that big anymore. Even on a 2nd gen core i7, I think it was much shorter than on the (infamous) P4.

According to https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures ...

5: 486 / Pentium

6: Pentium MMX

14: Pentium Pro, II

12: Pentium III

20: Pentium 4 (early)

10: Pentium M / Centrino

31: Pentium 4 (late)

12: Core / Core 2

20: Nehalem (first Core i7 etc)

14: Sandy Bridge - Cyprius Cove (aka 11th gen desktop)

12: Golden Cove (12th gen P cores), Raptor Cove (13th & 14th gen P cores)

I've got a 16" Lenovo laptop with a 13th gen i9-13900HX with 8 P cores (16 theads) and 16 E cores and it's MIGHTY.

It beats the 32 core ThreadRipper 2990wx tower I built in early 2019 by a little on absolutely everything (e.g. Linux kernel, LLVM builds), and by up to 50% on bursty loads (e.g. binutils / gcc / glibc builds), all while using 1/3 the peak power (I've measured the ThreadRipper at 375W).

The idle power also seems to be about 1/3. It's hard to tell for sure as you can't disable the battery, but I just measured 28W at the wall with the screen closed (when not travelling I use it via ssh from my M1 Mac Mini). Pegging 1 CPU with an infinite loop at 5.3 GHz took it to 68W. The ThreadRipper idles at 80W. I just reset the kWh & minutes counters on my meter and I'll leave it for a few hours and see what the average is for the laptop.

coppice · « **Reply #36 on:** July 10, 2024, 12:30:15 pm »

Quote from: brucehoult on July 09, 2024, 11:31:01 pm

“PIC64GX is 28nm from UMC, and the PIC64-HPSC is fabricated using GlobalFoundries' 12nm 12LP+ process node"

Now that is interesting. Someone has a hardened process up and running at 12nm. You might expect the hardened versions to be in a coarser process than the commercial ones.

glenenglish · « **Reply #37 on:** July 10, 2024, 08:25:06 pm »

and note that have the usual variable tiers of rad - ie rad hard, had tol, suggesting they are going all in on this.

and.. they will have an engineering equivilent silicon for the HPSC available. Probably $ not within reach of non-pros, though, given the likely numbers they'll make.

https://ww1.microchip.com/downloads/aemDocuments/documents/MPU64/ProductDocuments/Brochures/PIC64-HPSC-Evaluation-Platform-00005464.pdf

4 x dual lockstep X280 processors plus a boot controller- this is a serious play by the company.. I m guess that something like this will turn out also combined with an FPGA arch on package.

westfw · « **Reply #38 on:** July 11, 2024, 11:56:29 pm »

Quote

Microchip already has a line of Application-class processors capable of running Linux.
SAM9 (Arm9), SAMA5 (Arm Cortex-A5) and SAMA7 (Arm Cortex-A7).

Hmm. Any recent developments AFTER the Atmel acquisition? (at least they re-formatted the datasheets)
For that matter, were the Atmel parts actually "successful" ?

EverydayMuffin · « **Reply #39 on:** July 12, 2024, 10:24:44 am »

Quote from: westfw on July 11, 2024, 11:56:29 pm

Hmm. Any recent developments AFTER the Atmel acquisition? (at least they re-formatted the datasheets)
For that matter, were the Atmel parts actually "successful" ?

Latest device announced in 2022.

https://www.microchip.com/en-us/about/news-releases/products/new-1ghz-sama7g54-is-the-first-single-core-mpu-with-mipi-csi-2

iMo · « **Reply #40 on:** July 12, 2024, 04:41:00 pm »

Quote from: westfw on July 11, 2024, 11:56:29 pm

Quote
Microchip already has a line of Application-class processors capable of running Linux.
SAM9 (Arm9), SAMA5 (Arm Cortex-A5) and SAMA7 (Arm Cortex-A7).
Hmm. Any recent developments AFTER the Atmel acquisition? (at least they re-formatted the datasheets)
For that matter, were the Atmel parts actually "successful" ?

MCHP tried some ~10y back with the PIC32MZ_DA, there is/was 32+MB stacked on-chip dram, afaik.
No much visible success, imho..
PS: not sure if the pic32MZ got an MMU, afaik not (so no linux)..

https://www.microchipdirect.com/product/PIC32MZ2025DAG176-I/2J?samples=true

SiliconWizard · « **Reply #41 on:** July 14, 2024, 11:57:25 pm »

Finding its market can be tough.
10 years ago, ARM was clearly dominating for what was called "application processors". They were not necessarily running Linux (often not), but some proprietary OS's.
It was probably hard for the PIC32MZ with that much RAM to meet its market.

The strong point of Microchip has always been to be where others weren't.

The Atmel acquisition gives them a (small) share of the ARM market, but that's probably not where they should focus new developments. Directly competing with hundreds of other vendors is not a clever strategy. My thought is that they'll do just enough not to lose Atmel customers, and/or they'll try to convert them to other Microchip products.

One thing they don't have (well, technically they do, with the PIC32) are competitive 32-bit MCUs - I consider now the PIC32/MIPS as a dead end. So, meanwhile, they have Atmel MCUs for the "32-bit" market. They're going to release 64-bit RISC-V processors. I would find it reasonable to work on a new line of 32-bit RISC-V MCUs to replace their PIC32 line over time. Maybe they are.

Psi · « **Reply #42 on:** July 15, 2024, 01:35:31 am »

I think I'll wait for the PIC128

brucehoult · « **Reply #43 on:** July 15, 2024, 01:55:19 am »

Quote from: Psi on July 15, 2024, 01:35:31 am

I think I'll wait for the PIC128

RISC-V has that covered...

coppice · « **Reply #44 on:** July 15, 2024, 02:32:47 pm »

Quote from: SiliconWizard on July 14, 2024, 11:57:25 pm

One thing they don't have (well, technically they do, with the PIC32) are competitive 32-bit MCUs - I consider now the PIC32/MIPS as a dead end. So, meanwhile, they have Atmel MCUs for the "32-bit" market. They're going to release 64-bit RISC-V processors. I would find it reasonable to work on a new line of 32-bit RISC-V MCUs to replace their PIC32 line over time. Maybe they are.

Microchip wouldn't accept that MCUs mostly differentiate on what is not in the core - interesting peripherals, interesting mixes of things, service, etc. This was weird, as for years Microchip claimed their competitive edge was great service to customers of varying sizes. They went down the MIPS path to be different more than anything else. This was clearly dumb, as the licence burden of using an ARM core was not that great, and they were not able to offer seriously interesting mixes of peripherals and other functionality to stand out. Cores rarely stand out. Something like the MSP430 core stands out a bit, but the special features that really help that core do bursty ULP jobs exceedingly well could have been transplanted to other cores, like the ARM ones, making them do bursty ULP just as well. The fact that it hasn't says a lot about the clunky ARM licencing model, and which markets ARM has focussed on with the cores they offer. The patents on what the MSP430 does well have expired, so that is not a blocker.

glenenglish · « **Reply #45 on:** July 16, 2024, 05:37:40 am »

am surprised mfrs are keeping variable length instructions these days, must play merry hell with the prefetcher/pipeline.

Still unsure what is the real advantage of having a 64 bit processor for most embedded, small application projects...

more memory used, more cache required in hand, more of everything for the occassional handling of 64 bit data...

I guess for my applications, I see the amount of stuff you can hold in a single cache line reduces, so need more cache. cache misses hurt.

sure the thing can swap between 4 byte, 8 byte instructions , surely that consumes more area ?

OK, great for number crunching, but isnt heavy number crunching these days best dealt with long vector instructions ?
I guess doubles are handled natively in single cycles with a 64bit processor... one advantage.

if you are moving data around, that should all be via DMA, so the processor shouldnt care.

I'm probably just looking at this through my own window....

post script - I'm in need of some perspective modification, I think after writing this, that this view is a fairly narrow, current view of my own requirements, without having 'vision' of future requirements and progress in all things. More about that in my following post of the 32 v 8 bit revolution.

brucehoult · « **Reply #46 on:** July 16, 2024, 08:52:33 am »

Quote from: glenenglish on July 16, 2024, 05:37:40 am

am surprised mfrs are keeping variable length instructions these days, must play merry hell with the prefecher/pipeline.

Arm very probably would not be the first they are today without Thumb/Thumb2. Being able to run at 1 CPI off 16 bit wide RAM with no cache was critical to them getting the Nokia deal, and no doubt all the other "feature phones".

You should see how the deeply embedded world screams if your RISC-V code size is even 5% or 10% bigger than what they're already using (Cortex-M)! "Our ROMS are full -- we can't use your ISA if we have to take features out and we want to reduce the BoM not increase it". Hence recent extensions aimed specifically at embedded, adding things such as 2-byte instructions for load/store byte, table-jump, and push/pop multiple registers.

There are a few designs meant for controlling things in an FPGA soft core, but I have not yet seen any RISC-V microcontroller vendor -- no matter how tiny their chip -- choose to leave out the 2-byte instructions. Even the $0.10 CH32V003 drops the CPU registers from 32 to 16 to save space, but they keep the variable length instructions. At 48 MHz they can run one of the 2-byte instructions per clock cycle, while taking two clock cycles to run the 4-byte instructions -- limited by fetch speed from flash, which I think is 4 bytes wide but takes two cycles to access.

The corresponding $0.10 PUYA PY32 Arm-based microcontroller uses ARMv6-M which is very cut down compared to ARMv7 but is still also mixed instruction length -- it's Thumb1 plus half a dozen 4-byte instructions to make a viable stand-alone ISA.

Quote

Still unsure what is the real advantage of having a 64 bit processor for most embedded, small application projects...

None, if you've got single digit KB to a few hundred KB of RAM and flash. But you might as well If you've got 10s of MB or more. You don't have to be feeling cramped in 4 GB before you switch -- and in particular if you want to ensure you've got room to grow your application without having to switch ISAs again.

Quote

more memory used, more cache required in hand, more of everything for the occassional handling of 64 bit data...

Very little more. The code size for 32 and 64 bit is identical (at least in RISC-V, near enough). Structures and arrays containing pure data don't change in size -- only if you put pointers in them, which pretty much implies you're using a heap, which you don't do anyway on tiny machines. Most sensible data structures have a lot more data in them than pointers, which are overhead whether they are 2, 4, or 8 bytes. The biggest change is doubling the size of saved registers and return addresses on the stack. Who cares when you've got a singe-digit KB stack on your many-MB machine?

Quote

I guess for my applications, I see the amount of stuff you can hold in a single cache line reduces, so need more cache. cache misses hurt.

Tiny machines don't have data cache, they have SRAM. But see the previous point "sensible data structures keep the percentage used for pointers low anyway". Slightly non-tiny machines start to have icache, especially to help running from flash. See "32 bit and 64 bit code size is essentially the same".

Quote

OK, great for number crunching, but isnt heavy number crunching these days best dealt with long vector instructions ?
I guess doubles are handled natively in single cycles with a 64bit processor... one advantage.

The processors that are the main subject of this thread of course have 512 bit vector registers. And they're running dual-issue at 1 GHz. Inside spacecraft is an "embedded" use, but the CPUs aren't all that small. This is not an RP2040 competitor. (also ARMv6-M)

glenenglish · « **Reply #47 on:** July 16, 2024, 09:05:46 pm »

Thanks Bruce for your thoughtfully written post and the references to Thumb are good
You are dead right about having a gigabyte spare so who cares. data structures will be the same size.

It's interesting then the rise of the not so small machines . Maybe I am seeing the rise of the 64 bit machine, just like the rise of 32 bit ARM cores that superseeded the 8 bit cores.

The big advantage on 32 v 8 is that the usual quantities and precisions that I deal with (now) are very much more suited to a 32 bit variable size. Look at the cycle cost of doing a 3 bit right shift of a 32 bit int in AVR........ 64 bits would be more corner cases (for my work currently- or rather- the WAY i write currently) , although again having 64 bit accumulators , doubles etc, there becomes a point where the programming effort (which dominates low volume projects) dwarfs any silicon cost.

The only reason I continue to use (modern)AVR (like AVR64DB64) on some basic projects for clients is when they take over the source maintenance , that the chips and tools in general a simpler to work with for the non expert, For me, it has become 'why bother with 8 bit ' with extra work and lower throughput for same clock speed of an 8 bit AVR core when I am handling 32 bit precisions all the time now.

Perhaps the same shift will follow with 64 bit microcontroller cores.

On MPSoC, I run the A53 in 64 bit mode, and with a half a gigabyte, I dont care too much except that as usual, I am very cache aware on globals (and frequently accessed ones are in locked cache lines) .

NorthGuy · « **Reply #48 on:** July 19, 2024, 01:34:35 pm »

Quote from: brucehoult on July 16, 2024, 08:52:33 am

Structures and arrays containing pure data don't change in size -- only if you put pointers in them, which pretty much implies you're using a heap, which you don't do anyway on tiny machines. Most sensible data structures have a lot more data in them than pointers, which are overhead whether they are 2, 4, or 8 bytes.

Objects (e.g. C++ objects) may have huge portion of pointers in them, double-linked lists with little payloads may have more pointers than data (not to mention payloads may be pointers as well), tables of strings will be arrays of pure pointers ...

However, empirically there's nearly no difference between 32-bit and 64-bit software on PC, so this shouldn't be a big concern.

brucehoult · « **Reply #49 on:** July 19, 2024, 02:07:07 pm »

Quote from: NorthGuy on July 19, 2024, 01:34:35 pm

Quote from: brucehoult on July 16, 2024, 08:52:33 am
Structures and arrays containing pure data don't change in size -- only if you put pointers in them, which pretty much implies you're using a heap, which you don't do anyway on tiny machines. Most sensible data structures have a lot more data in them than pointers, which are overhead whether they are 2, 4, or 8 bytes.

Objects (e.g. C++ objects) may have huge portion of pointers in them, double-linked lists with little payloads may have more pointers than data

Yes they could, but that's a very stupid way to program from both a size and speed perspective. If you have double-linked lists at all then they should have 20 or 30 payload items in each one. Sliding things around inside an array is much cheaper than chasing pointers -- at least assuming you have a cache or VM. But it also gets the pointer overhead down, which is important no matter whether pointers are 4 or 8 bytes.

Quote

tables of strings will be arrays of pure pointers ...

Tables of pointers to strings will be, obviously, but that's not the only or best way to make a table of strings.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Microchip announces PIC64 ... and it's RISC-V. (Read 8698 times)

Share me