Author Topic: benchmarking logical core use versus physical cores only (Read 1482 times)

Simon · « **on:** April 06, 2020, 07:12:47 pm »

I am running boinc (covid-19 WU's galore) and i am trying to figure out if using all 16 logical cores on my Ryzen 7 is actually any better than limiting it to 8 cores. According to hardware monitor the power does go up a lot when I run all logical cores rather than just enough for physical cores.

Is there a utility that properly benchmarks this?

nctnico · « **Reply #1 on:** April 06, 2020, 07:48:19 pm »

Quote from: Simon on April 06, 2020, 07:12:47 pm

I am running boinc (covid-19 WU's galore) and i am trying to figure out if using all 16 logical cores on my Ryzen 7 is actually any better than limiting it to 8 cores. According to hardware monitor the power does go up a lot when I run all logical cores rather than just enough for physical cores.

Is there a utility that properly benchmarks this?

Don't think so. It is highly dependant on what part of the system is the bottleneck. Processing power or memory bandwidth. Based on the increased power consumption you measured I'd say that using all logical cores offers more processor speed in this case. Still I'd try to get a version of the software which runs on a GPU (graphics card). The GPU performance dwarfs the processing power of the CPU while having a better energy efficiency.

Simon · « **Reply #2 on:** April 06, 2020, 07:49:55 pm »

Well GPU's work for certain job's but not all.

Nominal Animal · « **Reply #3 on:** April 07, 2020, 01:50:31 pm »

In Linux, you can use the taskset command/utility to change the CPU affinity of running process, or to start a new process.

To find out which processor numbers are grouped to which physical core, you can run lscpu -e , or for example

Code: [Select]

awk -F '[\t ]*: *' '$1=="processor" { proc=$2 } $1=="core id" { core[$2] = core[$2] " " proc } END { for (c in core) printf "Core %s has processors%s\n", c, core[c] }' /proc/cpuinfo

On my two-core, two-thread-per-core Core i5-7200U, these output

Code: [Select]

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    3100.0000 400.0000
1   0    0      1    1:1:1:0       yes    3100.0000 400.0000
2   0    0      0    0:0:0:0       yes    3100.0000 400.0000
3   0    0      1    1:1:1:0       yes    3100.0000 400.0000

and

Code: [Select]

Core 0 has processors 0 2
Core 1 has processors 1 3

respectively; i.e. even processor numbers correspond to one core, and odd processor numbers to the other, if I boot with HyperThreading enabled.

So, if I wanted a process to run on physical cores only, I'd limit it to processors 0 and 1, or to 2 and 3.

If I were you, Simon, and running BOINC on Linux, I'd compare the total power use to BOINC performance with different task affinities et cetera, to see which configuration makes the most sense.

After all, the best benchmark is always to run your normal load, and just compare the performance and power use with different settings.

Simon · « **Reply #4 on:** April 07, 2020, 01:56:39 pm »

I'm on windows. but as an example on my little laptop that has an i5 8/4 cores setting 75% CPU hits 100% in windows task manager so looks like I get indeed 50% more work done an full use of a core by having 1.5 threads per core.

On my ryzen 7 with 16/8 cores it takes 88% "processor" use to get near 100% usage. So it's practically saying that yes my CPU is crap or at least RAM bandwidth is and you have to pretty much alternate jobs to keep each core filled with work.

andersm · « **Reply #5 on:** April 07, 2020, 02:12:04 pm »

On Windows you can use eg. Task Manager to set affinity (Details, right-click the process, Set affinity), but it's a per-process attribute. Allowing the process to run only on every other CPU (0, 2, 4...) would restrict it to one thread per core. But to get the most accurate results, just reboot and disable hyper-threading in BIOS.

Nominal Animal · « **Reply #6 on:** April 07, 2020, 02:23:09 pm »

On Windows, the same idea works: measuring your actual workload performance with different settings, is always the best benchmark.
I just do not know the best tools to use for this on Windows.

For a number of reasons, mostly human-derived -- i.e., inefficient code --, it is typical that to fully utilize the CPUs in a system, one needs to use about 1.5 times the number of threads/processes. If the computation is light/fast, and the communications latencies long, then even more threads/processes are needed. This applies to all computationally intensive code, across all OSes.

Some of that has to do with latencies and the task at hand as a whole. When data has to be transferred (to storage, to another node in the cluster, or somewhere else over the network), you need "extra work" to keep the CPUs busy during the transfers.

This is also why the number of threads or processes a service uses should always be human-tunable, and not dictated by the number of CPUs or CPU cores on a system.

When the computation is intensive, the ALU:core ratio (assuming architectures where hyperthreads on the same core share the ALU resources) is key.
I haven't had my mittens on Ryzens or Threadrippers to actually verify this, but I do believe Ryzen 7 has enough ALUs per core to make it worthwhile to use hyperthreading/SMT, as if the threads were separate cores. (That is, a computationally intensive thread running on a core with mostly-idle or non-mathy sibling thread will be faster than when the sibling thread is also computationally intensive, but the difference is much less than 50% (I'd guess on the order of 10%-15% on Ryzen 7?), so it makes sense for parallel-type workloads like BOINC.)

Simon · « **Reply #7 on:** April 07, 2020, 06:20:23 pm »

Yea just turned the hyperthreading off and not sure any of those tasks run any faster but then why would the CPU be at 100%

rrinker · « **Reply #8 on:** April 07, 2020, 06:43:57 pm »

Not sure what you mean by "why would the processor be at 100%". Do you mean why would it be at 100% with hyperthreading enabled if it isn't really giving all the work? Easy, if there aren't enough core resources, say it's ALUs, or maybe something else, it's running twice as many threads, but with the contention, each one is delivering under half the throughput due to contentions. Disable hyperthreading, you have half as many threads, but each one runs faster because of no contention. Either way, the CPU is running 100% or close to it.
Either way, the physical cores are completely loaded, so 1005 utilization, even if the workload isn't one that benefits from hyperthreading. Turning off the virual cores doesn't mean the CPU is only running at half the load.

Simon · « **Reply #9 on:** April 07, 2020, 07:05:11 pm »

If my boinc WU's take the same amount of time with 16 tasks on 16 logical cores as 8 tasks on 8 actual cores how can the CPU be at 100% when 8 tasks run on 16 logical cores. with only 8 enabled each one is shown to use 6.8% of the CPU rather than 6.25%

Simon · « **Reply #10 on:** April 07, 2020, 07:34:57 pm »

The power consumption on the cores is up by 25%, the package consumption jumps as well. I do not know which part the duplicated pipelines are part of, maybe the cores with the increase is package power being more cache work.

Nominal Animal · « **Reply #11 on:** April 07, 2020, 07:47:04 pm »

Note that in my machine (Linux), "processors" 0 and 2 are in the same physical core, and "processors" 1 and 3 in the other physical core. I'm not sure how the numbering goes in Windows, but it might make sense to check to be sure.

Quote from: Simon on April 07, 2020, 07:05:11 pm

How can the CPU be at 100% when 8 tasks run on 16 logical cores.

It shouldn't be possible (unless there were at least 8 other tasks using CPU at the same time). Sounds like the CPU load calculation is a bit wonky? Maybe check the documentation of the utility you use to display it, for gotchas?

(I'm not sure, but BOINC itself might be displaying the percentage of time it does calculations, compared to the time it can run. Or it could be displaying something like the fraction of elapsed CPU time to elapsed wall clock time. It is annoying, but knowing exactly what each figure means, and how it is calculated, makes a big difference here.)

Quote from: Simon on April 07, 2020, 07:05:11 pm

If my boinc WU's take the same amount of time with 16 tasks on 16 logical cores as 8 tasks on 8 actual cores

Which time, wall clock time, or time running on a CPU?

If you mean each individual work unit completes in the same wall clock time in either configuration, then your computer can fully utilize all 16 threads on 8 cores.

If you mean each individual work unit requires the same amount of processing time to complete in either configuration, then your computer has either enough arithmetic-logic units to not starve any of the BOINC tasks, not even when two of them run on sibling threads on the same core; or so few arithmetic-logic units that running anything arithmetic-intensive on one thread fully stalls any arithmetic on the sibling thread on thesame core.

Comparing the number of BOINC work units completed per wall clock time unit (in the entire computer, not per task!), in the different configurations, is the important metric here. As in "the computer completes N WUs per hour when running 8 threads on 8 physical cores, and M WUs per hour when running 16 threads."

When I run benchmarks, for a number of reasons I use the number of work units completed divided by the elapsed wall clock time as the metric. This absolutely requires that the machine is otherwise idle, as any other running tasks will throw off the measurement, but the measurement makes most practical sense.

Simon · « **Reply #12 on:** April 07, 2020, 07:51:28 pm »

Well I give up. It looks like it works best loaded with 14-15 intensive tasks like boinc and ultimately it might do more work so i will leave it at that. As a boinc WU is not repeatable I was hoping to run some other test to find out.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: benchmarking logical core use versus physical cores only (Read 1482 times)

Simon

benchmarking logical core use versus physical cores only

nctnico

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

Nominal Animal

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

andersm

Re: benchmarking logical core use versus physical cores only

Nominal Animal

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

rrinker

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

Nominal Animal

Re: benchmarking logical core use versus physical cores only

Simon

Re: benchmarking logical core use versus physical cores only

Share me