Ok so I simply won't fill the 16MB buffers with random(), just average whatever happens to be there (most likely zeros or garbage) as doubles in parallel in 4 threads:
https://gist.github.com/xk/b8b2ff4ab1455237906a8b13e3f1f02f
The i7 Mac:
unibodySierra:Desktop admin$ gcc -O0 threads.c -o threads
unibodySierra:Desktop admin$ time ./threads
real 0m0.054s
user 0m0.084s
sys 0m0.073s
And the OPi Zero:
pi@orangepi:~$ gcc -lpthread -O0 threads.c -o threads
pi@orangepi:~$ time ./threads
real 0m0.166s
user 0m0.440s
sys 0m0.140s
And the Mac now is 0,166/0,054= 3x times faster, and more than 100x as expensive.
And for this, Odroid XU4 is:
real 0m0.212s
user 0m0.130s
sys 0m0.625s
And Odroid c2 (quad A53 @1.5 GHz):
real 0m0.142s
user 0m0.350s
sys 0m0.140s
This benchmark is really too short to measure accurately on any of these systems, but especially Intel.
Both the Odroids have less User time than the Orange Pi, the XU4 by a factor of 3.4x. They are both Raspberry Pi-sized SBCs. The C2 costs $5 more than a Pi. The XU4 is about twice the C2, but a 4A power supply is included. Both have 2 GB RAM, gig ethernet, and much faster SD card and USB than a Pi (and eMMC too).