The reason I wrote "small" was on the basis that it won't take long to "warm up" the cache compared to how long a thread will typically run before being pre-empted, assuming it doesn't call/trap into the kernel. In other words, only a small fraction of potential progress is lost by needing to fill the cache (non-pinned case) or, equivalently, there is only a small gain by not needing to fill the cache (pinned case).mrvn wrote: ↑Tue Aug 24, 2021 1:59 amThere can be some gain with improved cache hits. If you have 2 sockets/cores with separate caches then pining threads to each set of cores/threads that share caches can be beneficial. Even if they get interrupted every now and then.
The problem with other threads running on the pinned core is that they will "steal" time from the pinned thread, and may "pollute" the cache, both eroding the gain from pinning. Of course, the smaller (or larger) that gain is, the more (or less) important it is to avoid time stealing/cache pollution.
Since you went on to talk about NUMA... I was considering only a non-NUMA system. Although a cache local to certain core(s) has some similarities to main memory local to certain core(s) in a NUMA system.