Performance optimization - post your saves

Post all other topics which do not belong to any other category.
SoShootMe
Filter Inserter
Filter Inserter
Posts: 517
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Performance optimization - post your saves

Post by SoShootMe »

AdamK wrote: Thu May 06, 2021 2:21 pm I'm not sure if memory latency would show itself in %CPU ussage (on one hand, if CPU instruction execution is stalled by need to wair on memory, it should bee seen as %CPU usage, on the other hand, modern CPUs are so complex I can't be sure here.
CPU usage is calculated by the OS, eg for a given core it is the proportion of time over some period (eg the last second) that the OS was able to schedule a thread on that core. Waiting on RAM (or some other things like execution resources, especially with SMT) is handled by the CPU itself, so a thread that is "stalled" because of this is still "running" from the OS point of view, and the time still counts as CPU usage.

On the other hand, a thread that is "stalled" waiting for data from storage (including paging activity) is different: in this case, the OS gets involved and will suspend the thread, marking it unable to be scheduled again until after the data are available. Suspended threads do not contribute to CPU usage.
User avatar
jodokus31
Smart Inserter
Smart Inserter
Posts: 1621
Joined: Sun Feb 26, 2017 4:13 pm
Contact:

Re: Performance optimization - post your saves

Post by jodokus31 »

ptx0 wrote: Thu May 06, 2021 3:09 pm
jodokus31 wrote: Thu May 06, 2021 3:02 pm I have DDR4-3600 16-19-19-39 and a Ryzen 5600. And it's still only 16 UPS. I'm sure, there are better rigs out there, but i can't imagine that it gets extremely much better (like 2x) at current date
i'm on DDR4-2666 Kingston ECC memory + Ryzen 3700x and it's 15 UPS because i'm using mimalloc on Linux as my system-wide memory allocator. if I start it up with glibc malloc there's 11 UPS. i'm sure with 4400MHz memory CL19 and mimalloc you might even get 25UPS! but not 30 UPS. probably never 30 with this map.
I just tested mimalloc:
- Without I get 17-18 UPS, if I stand zoomed-in somewhere
- With I get 19-20 UPS

So, there is a bit of potential, I guess.

BTW:
This is my CPU graph
It alternates between the cores and one core is quite high
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: Performance optimization - post your saves

Post by ptx0 »

SoShootMe wrote: Fri May 07, 2021 2:23 am On the other hand, a thread that is "stalled" waiting for data from storage (including paging activity) is different: in this case, the OS gets involved and will suspend the thread, marking it unable to be scheduled again until after the data are available. Suspended threads do not contribute to CPU usage.
except on linux where iowait is included in system load calculations
jodokus31 wrote: Fri May 07, 2021 8:13 am It alternates between the cores and one core is quite high
too bad we can't pin threads to cores
SoShootMe
Filter Inserter
Filter Inserter
Posts: 517
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Performance optimization - post your saves

Post by SoShootMe »

ptx0 wrote: Fri May 07, 2021 2:38 pm
SoShootMe wrote: Fri May 07, 2021 2:23 am On the other hand, a thread that is "stalled" waiting for data from storage (including paging activity) is different: in this case, the OS gets involved and will suspend the thread, marking it unable to be scheduled again until after the data are available. Suspended threads do not contribute to CPU usage.
except on linux where iowait is included in system load calculations
Fair point, but I don't consider iowait "CPU usage" (which I'd say is user, system, and perhaps nice time depending on context). As I understand it, it is basically a type (special case) of idle time. On the other hand, threads in iowait state contribute to the load average calculation (if that's what you meant), just like runnable threads, but then load average has a different meaning to CPU usage.
ptx0 wrote: Fri May 07, 2021 2:38 pm
jodokus31 wrote: Fri May 07, 2021 8:13 am It alternates between the cores and one core is quite high
too bad we can't pin threads to cores
AFAIK you can, and I've often thought it may offer some (small) performance benefit due to caches. But you have to make sure other things are excluded from running on those cores too, so in most cases it seems like micro-management overkill. Generally I think it is better to just ignore per-core usage, which is not very meaningful with "busy" threads flitting around, and look at the total :).
User avatar
AdamK
Long Handed Inserter
Long Handed Inserter
Posts: 64
Joined: Thu Jul 25, 2019 9:11 am
Contact:

Re: Performance optimization - post your saves

Post by AdamK »

Just to finish my thread-within-a-thread, today I updated my box from I7-8086K with DDR-2166 to I7-11700K with DDR-3200, both have/had decent mobos, memory also with as low CL as I could easily get, and I'm up from 9-10 FPS to 12-13FPS. I'd say that 20% increase here is significant :) (I haven't tried oveclocking it yet)

I forgot one thing: my base uses robots for building and some little logistic. 9-10 FPS on old config was with pretty idle bots, new one is with bots having a bit of work, so I'd say, I may expect +1FPS at lease ;)
Abarel
Inserter
Inserter
Posts: 42
Joined: Wed Mar 13, 2019 10:20 pm
Contact:

Re: Performance optimization - post your saves

Post by Abarel »

AdamK wrote: Mon May 10, 2021 11:05 am Just to finish my thread-within-a-thread, [...] so I'd say, I may expect +1FPS at lease ;)
Just in case someone gets here and have similar doubts, Ram bandwidth (not just frequency, but latency) means A LOT difference.
As fun sidenote, save your map before using console commands if you are afraid of achievements, then run this little script (it will take several seconds, be patient):

Code: Select all

/c for key, ent in pairs(game.player.surface.find_entities_filtered({force=game.player.force})) do
	if string.find(ent.name,"locomotive") then
		ent.destroy()
	end
end
My PC went from 13 UPS to 60 UPS on your map :-D
User avatar
jodokus31
Smart Inserter
Smart Inserter
Posts: 1621
Joined: Sun Feb 26, 2017 4:13 pm
Contact:

Re: Performance optimization - post your saves

Post by jodokus31 »

Abarel wrote: Sat May 29, 2021 1:36 am
AdamK wrote: Mon May 10, 2021 11:05 am Just to finish my thread-within-a-thread, [...] so I'd say, I may expect +1FPS at lease ;)
Just in case someone gets here and have similar doubts, Ram bandwidth (not just frequency, but latency) means A LOT difference.
As fun sidenote, save your map before using console commands if you are afraid of achievements, then run this little script (it will take several seconds, be patient):

Code: Select all

/c for key, ent in pairs(game.player.surface.find_entities_filtered({force=game.player.force})) do
	if string.find(ent.name,"locomotive") then
		ent.destroy()
	end
end
My PC went from 13 UPS to 60 UPS on your map :-D
Destroying all the locomotives is surely not a good workaround ;)
And destroying the whole production is not a good way to save UPS ;)
azesmbog
Filter Inserter
Filter Inserter
Posts: 254
Joined: Mon Jan 28, 2019 12:05 pm
Contact:

Re: Performance optimization - post your saves

Post by azesmbog »

AdamK wrote: Mon May 10, 2021 11:05 am Just to finish my thread-within-a-thread, today I updated my box from I7-8086K with DDR-2166 to I7-11700K with , both have/had decent mobos, memory also with as low CL as I could easily get, and I'm up from 9-10 FPS to 12-13FPS. I'd say that 20% increase here is significant :) (I haven't tried oveclocking it yet)
I also looked at your map.
I would say 90 percent of construction and transport drones are superfluous :)
However, everyone plays as he personally likes.
I have an average UPS on this map - 18-19: ((And then - in idle.
(on my current map ups 42-43, and even then I find it very low)
DDR-3200 is not very productive for such a map, it would be necessary to raise the frequency to 3600 and higher and lower the timings. if possible.
Abarel
Inserter
Inserter
Posts: 42
Joined: Wed Mar 13, 2019 10:20 pm
Contact:

Re: Performance optimization - post your saves

Post by Abarel »

jodokus31 wrote: Sat May 29, 2021 8:07 am Destroying all the locomotives is surely not a good workaround ;)
That was just a joke. :-D
I should have put in bold the "As fun sidenote"...

The real meaning is that the map was using too many of everything, and so the UPS droped a lot. It would work almost the same deleting 70% of trains (10k trains, mostly iddling), 90% of bots (221k logistic + 101k construction bots, mostly iddling), and 40% of inserters (near 100k inserters, half of them iddling most the time). However, Factorio is such a great game that even with this map having 11M entities and 26M connectors, the map is still fun.
I focused the joke on trains as I think the main gaining can be achieved by greatly reducing the amount of trains (1-1 trains are fun, and you can do the same here with just 3k trains, as most the 10k are iddling almot full time), improving the signaling (ok in 3 ways roudabouts, but ugly on 4 ways crossings, stackers and almost everywhere), optimizing the schedules, and even start optimizing inserters as train stations are a big chunk of the inserters used (6 inserters to box, then 6 to belt, to just fill a single blue belt; can replace these 12 inserters with just 2, with a lot less belts and splitters. And there are 3k+ stations... so it is a big task, with big results.
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: Performance optimization - post your saves

Post by ptx0 »

Abarel wrote: Sat May 29, 2021 1:23 pm It would work almost the same deleting 70% of trains (10k trains, mostly iddling)
oh, are you using vehicle equipment grid mod?
User avatar
jodokus31
Smart Inserter
Smart Inserter
Posts: 1621
Joined: Sun Feb 26, 2017 4:13 pm
Contact:

Re: Performance optimization - post your saves

Post by jodokus31 »

Abarel wrote: Sat May 29, 2021 1:23 pm
jodokus31 wrote: Sat May 29, 2021 8:07 am Destroying all the locomotives is surely not a good workaround ;)
That was just a joke. :-D
I should have put in bold the "As fun sidenote"...
I wondered a bit, if it was really a joke and also wanted to prevent, that somebody blindly copies the command and destroys his save

Regarding trains: If you make trains double-headed and have dead-end stations, the path finder gets relieved significantly
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: Performance optimization - post your saves

Post by ptx0 »

jodokus31 wrote: Sun May 30, 2021 10:00 am Regarding trains: If you make trains double-headed and have dead-end stations, the path finder gets relieved significantly
please cite a source or provide a test case where this is true, because the pathfinder will not search both directions if it's not even possible to go both. it only searches from the locomotive side. RoRo with single loco facing front would be the least expensive.
User avatar
jodokus31
Smart Inserter
Smart Inserter
Posts: 1621
Joined: Sun Feb 26, 2017 4:13 pm
Contact:

Re: Performance optimization - post your saves

Post by jodokus31 »

ptx0 wrote: Sun May 30, 2021 6:08 pm
jodokus31 wrote: Sun May 30, 2021 10:00 am Regarding trains: If you make trains double-headed and have dead-end stations, the path finder gets relieved significantly
please cite a source or provide a test case where this is true, because the pathfinder will not search both directions if it's not even possible to go both. it only searches from the locomotive side. RoRo with single loco facing front would be the least expensive.
If the path finder can skip a whole rail contraption, because there is no simple way through it (excluding forwards and then backwards), it should save performance. Assuming the path finder does it, but i have no source or evidence for this...
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: Performance optimization - post your saves

Post by ptx0 »

jodokus31 wrote: Sun May 30, 2021 7:18 pm If the path finder can skip a whole rail contraption, because there is no simple way through it (excluding forwards and then backwards), it should save performance. Assuming the path finder does it, but i have no source or evidence for this...
the pathfinder is available on github
User avatar
jodokus31
Smart Inserter
Smart Inserter
Posts: 1621
Joined: Sun Feb 26, 2017 4:13 pm
Contact:

Re: Performance optimization - post your saves

Post by jodokus31 »

ptx0 wrote: Mon May 31, 2021 2:50 pm the pathfinder is available on github
Interesting. Takes a bit of time to get into it.
User avatar
eradicator
Smart Inserter
Smart Inserter
Posts: 5207
Joined: Tue Jul 12, 2016 9:03 am
Contact:

Re: Performance optimization - post your saves

Post by eradicator »

Not sure if this counts as "performance", but it seems to be the best fit thread on the forum. I've noticed that the mere existance of a lot of prototypes seems to slow down data stage loading of all subsequent mods. I've also noticed that at runtime the name list for Prototype/Entity.additional_pastable_entities seems to be copied for each entity, even if the prototypes are all defined with a reference to the exact same table. This might be fine for short lists, but my mod generates about 5000 prototypes, each of which needs the full list of names of all of them. Resulting in 5000*5000 * (estimate 50 byte) = about 1 gigabyte, but at runtime I'm seeing a ram increase of more than 3 gigabyte. And if the list was stored only once it should be closer to 250 megabyte. I'm not sure why additional_pastable_entities is a list of names and not types, but in the current state it's not realistic to use for this.

(The generation algo as such is not optimized and quite slow by itself.)
eradicators-stockpile_1.9.2.zip
(1.32 MiB) Downloaded 628 times
Author of: Belt Planner, Hand Crank Generator, Screenshot Maker, /sudo and more.
Mod support languages: 日本語, Deutsch, English
My code in the post above is dedicated to the public domain under CC0.
User avatar
eradicator
Smart Inserter
Smart Inserter
Posts: 5207
Joined: Tue Jul 12, 2016 9:03 am
Contact:

Re: Performance optimization - post your saves

Post by eradicator »

I know, not at all a gameplay problem, but maybe there's something to be optimized hiding? I did expect the cpu cost of spawning a single entity to be constant.

For testing something I used LuaSurface.create_entity to spawn in 3 groups of 5000 solar panels each, so the panels in each group all have the same position. I saved the game, and tried to load it, but loading takes ages even after removing all mods. I waited for more than 10 mins and the load bar just stays stuck at 100%. The map is otherwise new and empty. Spawning in the panels was also very slow as if it's trying to detect collision against all existing panels, but I'm not sure if that wasn't mod related.

I used this command 5 times while standing still. Then used editor area clone to copy the group 2 more times, then gave up because it was so slow :D.

Code: Select all

/c
local p = game.player
local ce, args = p.surface.create_entity, {name = 'solar-panel', force = p.force, position = p.position, raise_built = false, 
create_build_effect_smoke = false, spawn_decorations = false, move_stuck_players = false}

for _=1, 1000 do ce(args) end
Attachments
_stationary_fusion_perftest_1.zip
(1.95 MiB) Downloaded 183 times
Author of: Belt Planner, Hand Crank Generator, Screenshot Maker, /sudo and more.
Mod support languages: 日本語, Deutsch, English
My code in the post above is dedicated to the public domain under CC0.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14264
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Performance optimization - post your saves

Post by Rseding91 »

Stacking entities on top of each other causes anything that has to interact with that section of the map to grow linearly more expensive with each entity added. That issue is compounded if they are electric connecting entities because during migrations they disconnect and reconnect to nearby electric poles; but that has to interact with every one of the entiteis in the area as it searches for poles.

That ends up being O(N^2).
If you want to get ahold of me I'm almost always on Discord.
User avatar
eradicator
Smart Inserter
Smart Inserter
Posts: 5207
Joined: Tue Jul 12, 2016 9:03 am
Contact:

Re: Performance optimization - post your saves

Post by eradicator »

Rseding91 wrote: Thu Aug 19, 2021 12:56 am Stacking entities on top of each other causes anything that has to interact with that section of the map to grow linearly more expensive
As for the case of create_entity: As it doesn't do any apparent collision checking I expected it to not care at all about other entities. I added all the {foo = false} options in an attempt to disable as much checking as I could.
Rseding91 wrote: Thu Aug 19, 2021 12:56 am during migrations they disconnect and reconnect to nearby electric poles
Aha, interesting. So if migrating poles searched for connectables instead of the other way around it would've work as I expected. Thanks for the explanation!
Author of: Belt Planner, Hand Crank Generator, Screenshot Maker, /sudo and more.
Mod support languages: 日本語, Deutsch, English
My code in the post above is dedicated to the public domain under CC0.
mrvn
Smart Inserter
Smart Inserter
Posts: 5855
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Performance optimization - post your saves

Post by mrvn »

SoShootMe wrote: Fri May 07, 2021 4:07 pm
ptx0 wrote: Fri May 07, 2021 2:38 pm
SoShootMe wrote: Fri May 07, 2021 2:23 am On the other hand, a thread that is "stalled" waiting for data from storage (including paging activity) is different: in this case, the OS gets involved and will suspend the thread, marking it unable to be scheduled again until after the data are available. Suspended threads do not contribute to CPU usage.
except on linux where iowait is included in system load calculations
Fair point, but I don't consider iowait "CPU usage" (which I'd say is user, system, and perhaps nice time depending on context). As I understand it, it is basically a type (special case) of idle time. On the other hand, threads in iowait state contribute to the load average calculation (if that's what you meant), just like runnable threads, but then load average has a different meaning to CPU usage.
ptx0 wrote: Fri May 07, 2021 2:38 pm
jodokus31 wrote: Fri May 07, 2021 8:13 am It alternates between the cores and one core is quite high
too bad we can't pin threads to cores
AFAIK you can, and I've often thought it may offer some (small) performance benefit due to caches. But you have to make sure other things are excluded from running on those cores too, so in most cases it seems like micro-management overkill. Generally I think it is better to just ignore per-core usage, which is not very meaningful with "busy" threads flitting around, and look at the total :).
There can be some gain with improved cache hits. If you have 2 sockets/cores with separate caches then pining threads to each set of cores/threads that share caches can be beneficial. Even if they get interrupted every now and then. The thread can jump between cpu threads or cores with little cost if they remain in the same cache.

After that it makes a huge difference with NUMA but only if your memory allocation and usage is also NUMA aware. For example you would allocate all inserters in a memory segment in one NUMA domain and pin the thread doing inserter updates to cores in the same NUMA domain. The thread would run a lot faster there. Or split the map into quadrants and place each quadrant into one NUMA domain and pin a thread there. Then each assembler, belt, inserter needs to be allocated in the right memory block and processed by the right thread.

Factorio isn't really designed for that. The core design was really single threaded and since then it only got a few patch ups for special things that could be made multi threaded after the fact. Like the fluid system that could be broken into independent parts. But there isn't a inserter thread that runs in parallel with for example an assembler thread. It's not designed that way. Afaik nothing in factorio would be able to optimize for NUMA and pinned threads without a major overhaul. It's just not designed that way and maybe doesn't fit that memory model at all.

Best approach might be to split things by geography. Split the map into largish tiles and everything in that tile gets pinned to one core and it's closest memory. Of course that requires extra work at the tile boundaries where threads would collide but given large enough tiles that can be minimized (overhead wise). But that would probably only help systems with NUMA as the working set is far to big to make cache effects relevant in that optimization.
Post Reply

Return to “General discussion”