Performance optimization - post your saves

Post all other topics which do not belong to any other category.
SoShootMe
Fast Inserter
Fast Inserter
Posts: 233
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Performance optimization - post your saves

Post by SoShootMe »

mrvn wrote:
Tue Aug 24, 2021 1:59 am
SoShootMe wrote:
Fri May 07, 2021 4:07 pm
ptx0 wrote:
Fri May 07, 2021 2:38 pm
too bad we can't pin threads to cores
AFAIK you can, and I've often thought it may offer some (small) performance benefit due to caches. But you have to make sure other things are excluded from running on those cores too, so in most cases it seems like micro-management overkill.
There can be some gain with improved cache hits. If you have 2 sockets/cores with separate caches then pining threads to each set of cores/threads that share caches can be beneficial. Even if they get interrupted every now and then.
The reason I wrote "small" was on the basis that it won't take long to "warm up" the cache compared to how long a thread will typically run before being pre-empted, assuming it doesn't call/trap into the kernel. In other words, only a small fraction of potential progress is lost by needing to fill the cache (non-pinned case) or, equivalently, there is only a small gain by not needing to fill the cache (pinned case).

The problem with other threads running on the pinned core is that they will "steal" time from the pinned thread, and may "pollute" the cache, both eroding the gain from pinning. Of course, the smaller (or larger) that gain is, the more (or less) important it is to avoid time stealing/cache pollution.

Since you went on to talk about NUMA... I was considering only a non-NUMA system. Although a cache local to certain core(s) has some similarities to main memory local to certain core(s) in a NUMA system.

mrvn
Smart Inserter
Smart Inserter
Posts: 5111
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Performance optimization - post your saves

Post by mrvn »

SoShootMe wrote:
Tue Aug 24, 2021 7:19 am
mrvn wrote:
Tue Aug 24, 2021 1:59 am
SoShootMe wrote:
Fri May 07, 2021 4:07 pm
ptx0 wrote:
Fri May 07, 2021 2:38 pm
too bad we can't pin threads to cores
AFAIK you can, and I've often thought it may offer some (small) performance benefit due to caches. But you have to make sure other things are excluded from running on those cores too, so in most cases it seems like micro-management overkill.
There can be some gain with improved cache hits. If you have 2 sockets/cores with separate caches then pining threads to each set of cores/threads that share caches can be beneficial. Even if they get interrupted every now and then.
The reason I wrote "small" was on the basis that it won't take long to "warm up" the cache compared to how long a thread will typically run before being pre-empted, assuming it doesn't call/trap into the kernel. In other words, only a small fraction of potential progress is lost by needing to fill the cache (non-pinned case) or, equivalently, there is only a small gain by not needing to fill the cache (pinned case).

The problem with other threads running on the pinned core is that they will "steal" time from the pinned thread, and may "pollute" the cache, both eroding the gain from pinning. Of course, the smaller (or larger) that gain is, the more (or less) important it is to avoid time stealing/cache pollution.

Since you went on to talk about NUMA... I was considering only a non-NUMA system. Although a cache local to certain core(s) has some similarities to main memory local to certain core(s) in a NUMA system.
I was kind of hinting at a middle ground there.

For example you pin some threads to core {0,1,2,3} and some to {4,5,6,7} because sets of 4 cpu core/threads share L1/L2 caches. But there are 2 L3 caches so you don't want threads to jump between those two. L3 caches also are the largest so they take the longest to fill back up after a switch.

Post Reply

Return to “General discussion”