XMOS barrel processors are a neat little artifact from an alien planet. Interesting, but mostly a curiosity. I'd rather use a SoC-FPGA and a standard language over some tiny company's proprietary custom stuff anyway.
The wacky thing about XMOS is the instruction set and programming language, not that it's a barrel processor, which is a perfectly fine way to design a multi-core CPU if you care about total throughput not peak single-thread speed.
At last week's RISC-V Summit in Munich, Ben Abdelhamid, a postdoc at Heidelberg University, showed his "BRISKI" RISC-V RV32I barrel processor which implements up to 16 harts/core. It uses fewer than 800 LUTs and runs at 650+ MHz in a VU9P FPGA, with CPI = 1 (always .. no cache or branch predictor needed), achieving a pretty amazing 0.8 MIPS/LUT. It fits 1024 cores (16384 hardware threads) in that FPGA.
Barrel processors have a lot of advantages for "embarassingly parallel" problems. Basically you can avoid the
significant silicon real-estate used by caches, prediction and super-scalar operation, and use it to provide many simple cores.
Sun used that to great advantage in their UlteaSPARC T-series processors for server-side operations. When I used them for a soft real-time telecoms server system, other engineers were stunned by the (easily achieved) performance. Shame Big Red borged them.
It is easy to produce fast parallel hardware; many companies have done it. Where they have fallen down is normally the software - especially when they expect people to be able to write correct parallel C/C++!
The key point about XMOS is not the processor (and who cares about the instruction set[1]) but the way the cores, switch fabric, peripherals, language and toolchain have been integrated into a remarkably pleasant ecosystem.
- CSP starts with the presumption that processing is parallel, and has a decent theoretical basis. Unsurprising when it was invented by Tony Hoare (QuickSort, semaphores, NULL, Turing Award), and its concepts have been continually implemented (TMS320, Go, etc)
- xC is C with the bits that make parallelism "difficult" removed (e.g. aliasing), and added RTOS and CSP parallelism constructs
- the Eclipse-based IDE takes full advantage of everything being integrated, compiles the xC (and C/C++ ugh) code, then examines the optimised binaries to determine the min/max times of all paths from here to there
We urgently need hardware+software ecosystems that are based on parallelism from the
ground silicon up to application. 1970s tech with a few warts bolted on 40 years too late[2] simply won't cut it in the future. So far xC+xCORE+IDE are the sole demonstration of a better practical system. Rust has promise, but is too far from the hardware.
We need more highly parallel ecosystems.
[1] they have even had chips where one of the cores was an ARM processor. The different processor was fully integrated into the hardware+xC+IDE ecosystem.
[2] start with the concept of a memory model, and never forget that C defined parallelism as being part of the library - and explicitly avoided providing the properties that would enable a library to implement parallel functions in C