avakining wrote: ↑Wed Dec 14, 2022 5:00 pm
Memory is clearly not the bottleneck here,
The suggested bottleneck is memory bandwidth, ie the rate at which data is transferred between memory and the processor. This is likely to be the case for code that repeatedly reads and writes data greatly exceeding processor cache size, when most accesses are sequential and/or only a small fraction of data transferred is being processed/changed. A related bottleneck is memory latency, which is likely in a similar case but when most accesses are non-sequential.
Factorio must read and update the state of many entities each step of the simulation, so the "active" data will greatly exceed processor cache size, but the updates for each entity are often relatively computationally simple. This combination tends to make memory bandwidth and/or latency a more significant performance factor.
The relevance of this is that if performance is dominated or limited by a system's memory performance rather than its computational performance, a P-core may provide less or no benefit over an E-core, because they share the same memory performance. There are lots of factors so I'm far from certain but I this is unlikely for Factorio on an M1/M1 Pro.
either Factorio is incorrectly not calling for a high QoS (probably 33, as it should request highest priority for p-cores), or macOS is incorrectly not allocating p-core usage for Factorio. Without looking the process calls, it's hard to tell where the fault lies.
The implication from Rseding is that Factorio does not set QoS. The OS decides when and on which core to schedule each thread in all cases, but in this case it does not have "high QoS" to influence its decisions. But QoS is only a factor, a hint; the OS will also use other factors including historic observation, such as feedback provided by the CPU (if any exists) or that a thread previously run on an E-core usually uses its entire time slice.
To call this non-trivial is unquestionably a massive understatement, and cases where the OS "gets it wrong" are unavoidable. For example, at some particular moment, if there are more threads that are ready to run than the number of available P-cores, will performance be better if some particular thread is scheduled on an E-core, or if the OS doesn't schedule the thread because a P-core may become available very soon? There is no way to know for sure, with or without QoS.
It could be that Factorio is architecturally predisposed to "problems" in a system with cores of unequal performance, and the specific scheduling heuristics implemented in macOS for such a system are highlighting this. Marking game update-related threads with high QoS may cause the OS to schedule them only on P-cores (at least if power saving is not a priority), and perhaps result in better performance (at least sometimes). Equally, doing so may mean everything that matters now has high QoS so the OS's scheduling decisions will be the same as without QoS set and there is essentially no performance difference at all. The result could change if the OS's implementation does.
Hopefully the above shows that it is not as simple as either Factorio or macOS being "incorrect" but a much more nuanced matter. Also, I agree with FuryoftheStars that this is more a suggestion than a bug, although I would add that it is probably worth investigating.