I'm not a fan of asymmetric processing for applications which aren't nailed down when the device ships. The asymmetry introduces a nasty question of whether a particular operation should be done here or there. That was a downfall of the CELL(?) architecture with one BIG and 8(?) LITTLEs a decade ago.
Ah, yes: IBM
Cell architecture, used on the Sony Playstation 3.
Where the only operations that will be in LITTLE are fixed at design time, that disadvantage is not a major problem.
I'm specifically talking about small "accelerators" with tiny instruction sets, basically a simple ALU (I don't even need multiplication or division myself!), attached to e.g. a DMA engine, GPIO ports, that sort of a thing; not full asymmetric cores, and not with full or random memory access.
Stuff like CRC or hash calculation while doing memory-to-memory DMA, peripheral bus implementations, PWM, PDM, ADC with averaging and min-max recording, even wide LUT.
To circle back to the original topic, it would be very nice if we had a programming language that could embed such "miniprograms" better than what we have right now. I don't know how many of you are aware, but GPU pixel and vertex shaders are written in a C or C++ -like languages, for example OpenGL and OpenGL ES
GLSL supplied as strings in source format, and compiled by the graphics drivers for its own hardware.
Perhaps some kind of augmented string format, or "#include" on steroids; perhaps a source file format where initial lines declared properties (accessible as compile-time constants), with the rest of the file available as an augmented string object? Or processable with an external compiler program, with the result provided as an augmented binary blob? The compile-time constant properties of that object are useful, if one needs to e.g. describe the resource use or expectations of that miniprogram.
In Linux systems programming, I occasionally use the
seccomp BPF filters, to limit a process to a subset of (explicitly allowed) system calls. It mitigates the attack and bug surface when running e.g. dynamically compiled code –– consider something that evaluates an user-defined expression a few hundred million times. BPF itself is a
mini-language, with a small binary
instruction set. These are "installed" for a thread or process, and run by the kernel before internal syscall dispatch. Currently, one writes such programs by populating an array using lots of preprocessor macros, which is otherwise OK, but calculating jumps is annoying:
static const struct sock_filter strict_filter[] = {
BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof (struct seccomp_data, nr))),
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_rt_sigreturn, 5, 0), // Jump to RET_ALLOW if match
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_read, 4, 0), // Jump to RET_ALLOW if match
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_write, 3, 0), // Jump to RET_ALLOW if match
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_exit, 2, 0), // Jump to RET_ALLOW if match
BPF_JUMP(BPF_JMP | BPF_JEQ, SYS_exit_group, 1, 0), // Jump to RET_ALLOW if match
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW)
};
static const struct sock_fprog strict = {
.len = (unsigned short)( sizeof strict_filter / sizeof strict_filter[0] ),
.filter = (struct sock_filter *)strict_filter
};
The RP2040 PIO and TI Sitara PRU code are examples of similar "miniprograms" in microcontroller environments.
Exactly how the "sub-language" embedding would work, I'm not sure –– I'd like the "give me this file as an augmented string object at compile time" and "run this file through an external compiler and give me the result as an augmented binary blob" ––; but looking at existing code and build machineries using such miniprograms/sub-languages might yield important ideas that reduces the barrier/learning-curve for using such sub-languages more often.
The way GCC, Clang, and Intel CC support GCC-style
extended asm should not be overlooked, either.
This is an extremely powerful macro assembler, that lets one write assembly where the C compiler chooses the exact registers used for the assembly snippet, unlike e.g. externally included assembly source files. (With inlined functions whose bodies are basically just an extended asm statement, it means the compiler can choose which registers it uses in the inlined copy.)
The syntax for the constraints is absolutely horrible, though; otherwise it is pretty nice.