Allow one more rendering thread than available cores
Posted: Mon Oct 14, 2019 9:10 pm
I tested in-game and it seemed like it helped, so I tried to reduce the variables and made something in map editor, and it still helped by increasing both FPS and UPS. Zoom all the way out in the attached save.
When setting core affinity to between 3 and 7 cores on Windows 10 with an i7700, each time that I set the rendering threads to be one more than that, there was a noticeable gain in FPS and UPS. Beyond one extra rendering thread, there wasn't a noticeable difference on my end.
Except when the phsyical/logical cores will be heavily utilized (such as allowing only two cores, and they're maxed out already), increasing the rendering thread to one more than the available cores (setting affinity for the process) improves FPS and UPS. This was reliably reproducible.
I can't test with more than 8 rendering threads on my i7700 since Factorio won't allow it even if I set it within the config. I did disable hyperthreading from BIOS, and setting core affinity to 3 and rendering threads to 4 yielded an increase in performance there, as well.
The map is set to run at a fast game speed so that UPS changes are more easily noticeable. Zoom all the way out in the attached save. It's worth a test, right?
EDIT...Info I should've added:
With hyperthreading enabled:
7 core affinity and 8 rendering threads noticeably outperformed 7 core affinity and 7 threads
8 core affinity and 8 rendering threads was marginally better than 7 core affinity and 8 threads
With hyperthreading disabled via BIOS:
3 core affinity and 4 rendering threads noticeably outperformed 3 core affinity and 3 rendering threads
4 core affinity and 4 rendering threads was marginally better than 3 core affinity and 4 rendering threads
And just out of curiosity (because some people may be on weak 2/4 core cpus), when the cores were heavily taxed:
2 core affinity and 2 rendering threads outperformed 2 core affinity and 3 rendering threads
2 core affinity and 4 rendering threads was worse than 3 rendering threads
When there's headroom on the cores in use, one extra rendering thread was beneficial to both FPS and UPS, but more than one extra rendering thread wasn't beneficial. Using process explorer (not good to run while truly comparing in-game FPS/UPS), I could see that total cpu usage did increase as rendering threads increased even though the total load was spread out, and FPS/UPS improved. The amount of increased cpu usage wasn't linear, and beyond one additional rendering thread over cores available wasn't a performance improvement even though total cpu usage increased.
This would only be useful when there are lots of entities on the screen, as I originally noticed when someone accidentally dropped tens--or maybe hundreds--of thousands of items on the ground when downgrading many chests to wooden chests.
EDIT 2019-10-16:
Found an interesting discussion and this:
https://www.mcs.anl.gov/~huiweilu/pdf/ISC12-Lv.pdf
Which mentions the usefulness of Intel's memory management. No clue if AMD performs similarly, and maybe the Ryzen 3 has caught up since I often read how it's only as good as your ram. The gist from what I skimmed is that "data-intensive" workloads, rather than raw computations, benefit from Intel's memory management (as of Nehalem, 2008) with more threads in use. Allowing more concurrent memory requests as well as simply "dividing the workload" is what's beneficial, I've read. So, the bump in UPS could be explained, but the fact that Factorio calls them "rendering threads" has me wondering if that's just a name and those threads aren't strictly related to rendering, but data processing. I forgot if it was the discussion, a different article in the discussion, or that whitepaper, but it was also specifically mentioned that if cores aren't fully taxed, then there's often a benefit to running more threads for data-intensive workloads, at least on Intel as of back then.
So that aligns with that I observed. I think it's worth a shot to let Factorio use more threads than OS-reported cores, especially after reading that article and how often I've heard Rseding91 say that it's primarily a memory-bound game rather than cpu-bound.
When setting core affinity to between 3 and 7 cores on Windows 10 with an i7700, each time that I set the rendering threads to be one more than that, there was a noticeable gain in FPS and UPS. Beyond one extra rendering thread, there wasn't a noticeable difference on my end.
Except when the phsyical/logical cores will be heavily utilized (such as allowing only two cores, and they're maxed out already), increasing the rendering thread to one more than the available cores (setting affinity for the process) improves FPS and UPS. This was reliably reproducible.
I can't test with more than 8 rendering threads on my i7700 since Factorio won't allow it even if I set it within the config. I did disable hyperthreading from BIOS, and setting core affinity to 3 and rendering threads to 4 yielded an increase in performance there, as well.
The map is set to run at a fast game speed so that UPS changes are more easily noticeable. Zoom all the way out in the attached save. It's worth a test, right?
EDIT...Info I should've added:
With hyperthreading enabled:
7 core affinity and 8 rendering threads noticeably outperformed 7 core affinity and 7 threads
8 core affinity and 8 rendering threads was marginally better than 7 core affinity and 8 threads
With hyperthreading disabled via BIOS:
3 core affinity and 4 rendering threads noticeably outperformed 3 core affinity and 3 rendering threads
4 core affinity and 4 rendering threads was marginally better than 3 core affinity and 4 rendering threads
And just out of curiosity (because some people may be on weak 2/4 core cpus), when the cores were heavily taxed:
2 core affinity and 2 rendering threads outperformed 2 core affinity and 3 rendering threads
2 core affinity and 4 rendering threads was worse than 3 rendering threads
When there's headroom on the cores in use, one extra rendering thread was beneficial to both FPS and UPS, but more than one extra rendering thread wasn't beneficial. Using process explorer (not good to run while truly comparing in-game FPS/UPS), I could see that total cpu usage did increase as rendering threads increased even though the total load was spread out, and FPS/UPS improved. The amount of increased cpu usage wasn't linear, and beyond one additional rendering thread over cores available wasn't a performance improvement even though total cpu usage increased.
This would only be useful when there are lots of entities on the screen, as I originally noticed when someone accidentally dropped tens--or maybe hundreds--of thousands of items on the ground when downgrading many chests to wooden chests.
EDIT 2019-10-16:
Found an interesting discussion and this:
https://www.mcs.anl.gov/~huiweilu/pdf/ISC12-Lv.pdf
Which mentions the usefulness of Intel's memory management. No clue if AMD performs similarly, and maybe the Ryzen 3 has caught up since I often read how it's only as good as your ram. The gist from what I skimmed is that "data-intensive" workloads, rather than raw computations, benefit from Intel's memory management (as of Nehalem, 2008) with more threads in use. Allowing more concurrent memory requests as well as simply "dividing the workload" is what's beneficial, I've read. So, the bump in UPS could be explained, but the fact that Factorio calls them "rendering threads" has me wondering if that's just a name and those threads aren't strictly related to rendering, but data processing. I forgot if it was the discussion, a different article in the discussion, or that whitepaper, but it was also specifically mentioned that if cores aren't fully taxed, then there's often a benefit to running more threads for data-intensive workloads, at least on Intel as of back then.
So that aligns with that I observed. I think it's worth a shot to let Factorio use more threads than OS-reported cores, especially after reading that article and how often I've heard Rseding91 say that it's primarily a memory-bound game rather than cpu-bound.