Author Topic: The search for a (CHEAP) supercomputer... (Read 18555 times)

GeorgeOfTheJungle · « **Reply #50 on:** May 02, 2017, 12:10:49 pm »

Quote from: cleaningOut on May 02, 2017, 01:19:45 am

None of those programs have been ported to ARM, and the companies involved have basically no incentive to do so.

Ported to ARM? What does that mean? Most things these days are written in C or higher level languages, it should be a matter of hitting "compile" in the proper toolchain (yeah, I know, I know...).

brucehoult · « **Reply #51 on:** May 02, 2017, 12:26:05 pm »

Quote from: GeorgeOfTheJungle on April 30, 2017, 09:30:53 pm

Cool, now the OPi (with dom0's code above) gives:

pi@orangepi:~$ gcc -lpthread -O0 threads.c -o threads
pi@orangepi:~$ time ./threads

real   0m0.831s
user   0m2.070s
sys   0m0.230s

But OSX does not have random_r, any ideas?

On an Odroid XU4 (quad Cortex A15 @2.0 GHz) I got:

real   0m0.398s
user   0m0.710s
sys   0m0.300s

CJay · « **Reply #52 on:** May 02, 2017, 12:26:19 pm »

Quote from: GeorgeOfTheJungle on May 02, 2017, 12:10:49 pm

Quote from: cleaningOut on May 02, 2017, 01:19:45 am
None of those programs have been ported to ARM, and the companies involved have basically no incentive to do so.

Ported to ARM? What does that mean? Most things these days are written in C or higher level languages, it should be a matter of hitting "compile" in the proper toolchain (yeah, I know, I know...).

Would be interesting to see how a current Intel chip stacks up against something like an ARM, core to core so to speak if the compiler/hardware could be persuaded to run code on just the CPU core and not the 'extra' (potentially) performance enhancing on chip peripherals.

An almost but not quite comparable example, ISTR the 1940s Bletchley Bombes were capable of outpacing a generic desktop PC CPU until a few years ago, obviously that was a highly specialised application and one that the Bombes had been specifically designed and optimised for but even so, 60 or so years of technological development hadn't produced a generic machine that was any faster

brucehoult · « **Reply #53 on:** May 02, 2017, 12:34:33 pm »

Quote from: GeorgeOfTheJungle on April 30, 2017, 09:49:49 pm

Ok so I simply won't fill the 16MB buffers with random(), just average whatever happens to be there (most likely zeros or garbage) as doubles in parallel in 4 threads:

https://gist.github.com/xk/b8b2ff4ab1455237906a8b13e3f1f02f

The i7 Mac:
unibodySierra:Desktop admin$ gcc -O0 threads.c -o threads
unibodySierra:Desktop admin$ time ./threads

real   0m0.054s
user   0m0.084s
sys   0m0.073s

And the OPi Zero:
pi@orangepi:~$ gcc -lpthread -O0 threads.c -o threads
pi@orangepi:~$ time ./threads

real   0m0.166s
user   0m0.440s
sys   0m0.140s

And the Mac now is 0,166/0,054= 3x times faster, and more than 100x as expensive.

And for this, Odroid XU4 is:

real   0m0.212s
user   0m0.130s
sys   0m0.625s

And Odroid c2 (quad A53 @1.5 GHz):

real   0m0.142s
user   0m0.350s
sys   0m0.140s

This benchmark is really too short to measure accurately on any of these systems, but especially Intel.

Both the Odroids have less User time than the Orange Pi, the XU4 by a factor of 3.4x. They are both Raspberry Pi-sized SBCs. The C2 costs $5 more than a Pi. The XU4 is about twice the C2, but a 4A power supply is included. Both have 2 GB RAM, gig ethernet, and much faster SD card and USB than a Pi (and eMMC too).

GeorgeOfTheJungle · « **Reply #54 on:** May 02, 2017, 12:50:30 pm »

Bruce I have to admit that I swithed the OPi to the performance governor for the tests:

Code: [Select]

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
interactive

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor 
performance

Because if you don't do that it takes a while for the cpu to speed up...

GeorgeOfTheJungle · « **Reply #55 on:** May 02, 2017, 02:33:43 pm »

But the xu4 has 8 real cores, so let's try that versus the i7 4 cores + hyperthreading shall we?
8 threads and 96MB buffers : https://gist.github.com/xk/ba76a426e4c391c0b22b3eddbdb11898

The xu4 8 ARM cores big/little:
pi@xu4:~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
performance
pi@xu4:~$ gcc -lpthread -O0 threads.c -o threads
pi@xu4:~$ time ./threads

real   0m2.502s
user   0m6.875s
sys   0m9.675s

The Intel i7 4 cores+hyperthreading:
MBP-17:Desktop admin$ gcc -O0 threads.c -o threads
MBP-17:Desktop admin$ time ./threads

real   0m0.383s
user   0m0.904s
sys   0m0.874s

brucehoult · « **Reply #56 on:** May 02, 2017, 04:05:20 pm »

Quote from: GeorgeOfTheJungle on May 02, 2017, 02:33:43 pm

But the xu4 has 8 real cores, so let's try that versus the i7 4 cores + hyperthreading shall we?

Not really. The big and LITTLE cores don't ever run at the same time as each other. For performance questions you should ignore the LITTLE cores.

GeorgeOfTheJungle · « **Reply #57 on:** May 02, 2017, 04:12:07 pm »

Quote from: brucehoult on May 02, 2017, 04:05:20 pm

Not really. The big and LITTLE cores don't ever run at the same time as each other. For performance questions you should ignore the LITTLE cores.

I see the 8 cores @ 100% there in htop:

Code: [Select]

pi@xu4:~$ cat loop.sh
while [ true ]; do
    ./threads
done
pi@xu4:~$ sh loop.sh


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: The search for a (CHEAP) supercomputer... (Read 18555 times)

GeorgeOfTheJungle

Re: The search for a (CHEAP) supercomputer...

brucehoult

Re: The search for a (CHEAP) supercomputer...

CJay

Re: The search for a (CHEAP) supercomputer...

brucehoult

Re: The search for a (CHEAP) supercomputer...

GeorgeOfTheJungle

Re: The search for a (CHEAP) supercomputer...

GeorgeOfTheJungle

Re: The search for a (CHEAP) supercomputer...

brucehoult

Re: The search for a (CHEAP) supercomputer...

GeorgeOfTheJungle

Re: The search for a (CHEAP) supercomputer...

Share me