Performance: Multicore pathfinding for "low" GHz servers

Hathwos · Post by **Hathwos** » Fri Mar 11, 2022 5:14 pm

TL;DR

Outsource pathfinding to idle cores, to optimize the main loop which is using one core 100% on server with multiple cores but low performance per core.

What ?

Instead of running the A* inside the main loop, i would suggest to implement some kind of event driven scenario for multicore usage.

- Spawn as many "pathfinding procceses" (PP) as needed/cores available, to prevent main loop using one core 100%
- Provide these PPs with "walkable area" data and changes (if needed, e.g. used landfill, a blocking placable)
- If main loop needs to calculate a path, just trigger a event with coordinates
- One of the (least used) PPs catches and handles the event and presents the result to the main loop

This would spread some of the calculation burden from main loop to other cores

: Factorio core usage.png (33.45 KiB) Viewed 6174 times

Why ?

This suggestion happens because: Trains

I'm running a Factorio dedicated multiplayer server and got blamed for low performance, so i looked into it.
The CPU used is a INTEL Celeron J4125 at 2 GHz, 4 physical cores

There are about 200 trains (mainly 1-2-1) on a mostly loop based track system with a mixture of end-stops and roro-stops

Factory running at full throuput (buffer chests/tanks for trains) somewhat arround 1k science per minute

Result: One core at 100% nearly all the time (see picture) and therefore increadable low ups (e.g. walk using 3 exoskeletons still slower then "normal" walkspeed)

Test: I figured that if i stop (or delete) all trains, then the server instantly goes "back to normal" with full ups (while factory still running from/into buffer chests full throughput) at around 30% one core usage

There would lie the benefit for the game.. better multicore support == bigger factories

(Except i missed something or got it horrobly wrong

)

-- EDIT: Save attached

DarkShadow44 · Post by **DarkShadow44** » Sun Mar 13, 2022 3:47 am

Multi-threading comes up every once in a while, and the answer we got is that it's not worth the effort. Multithreading is hard, determinism must be kept and performance is not guaranteed.

Post by **ssilk** » Sun Mar 13, 2022 5:46 am

The problem is then, that you need to calculate, when the pathfinding will be ready. Because when you don’t know when the calculation is ready on every computer that participate this current game, then some calculations will be ready sooner or later. And then the biters will start walking sooner or later. You kill the determinism if you do such things.

You can of course send an event, if ready, so that if all other participating computers get exactly the number of events equals the number of participants we know that we now can start to walk. But this will produce so much overhead in communication, that it is cheaper to not implement it and instead calculate things in a way, that keeps the determinism.

moved to won’t implement until someone has a better idea

Post by **boskid** » Sun Mar 13, 2022 6:25 am

"One could simply say "make it multithreaded" but there are a lot of technical challenges to solve before this is even possible..." (FFF #364)

Every time i am seeing those "make it multithreaded" topics i am always thinking about our priorities related to code. Basically we have this list of priorities:
1/ Validity (produces correct result, is save-load stable and does not desync)
2/ Readability
3/ Performance

If we know given path of code is really hot, we are sometimes willing to give up on readability (of a certain piece of code) in order to get a performance gain, but we never compromise on the validity. I literally do not care if a given solution would be fast if it would desync, crash or do whatever we never want to see. What everyone making those suggestions is seeing is only the cpu load value saying "it is single threaded" and then there is the "make it multithreaded" suggestion. It would be really great if such suggestions would also have some discussion about how to solve the technical challenges to keep a solution valid in a way that would make it even possible to consider a multithreading stuff. A suggestion which does not even recognize the validity part i am considering to be just rude because it assumes that we did not consider it multithreaded already when we always reconsider this every time we see some piece of code showing on the profiler. I can make every code extremally fast by simply deleting everything, it would not even need multithreading, it would be just pure fast, but it would not be valid at all.

I was already heavily thinking about making trains multithreaded or at least the trains pathfinder multithreaded but i have always this reminder that we have chain signals in game. Lets have this specific setup:

setup

save file

In this setup all train stops on top are named the same and there are a lot of chain signals. First both trains want to simply go straight as those train stops are the closest ones and there is no other traffic around

initial path

But once both trains are within the chain signal section, left and right train stops are closed which forces a repath of both trains at the same tick. First train to repath will see that it is in the middle of intersection (internally it is called InChainSignalSection) so the beginning of the pathfinder has to follow some really specific requirements like not selecting a path that would make a train to stop at the next signal while in chain signal section because such train could jam a heavily used intersection with proper signalling and players would just consider this behavior to be broken, unexpected or whatever. If a train is in chain signal section, it will not select a path where a train would be forced to stop in the middle of intersection so it may happen a train will find longer path that first lets it quit a chain signal section first and then wait there.

path after closing outer stops

If you open the attached save file and resume the time from the map editor you will notice that left train will arrive at the middle train stop but the second train will repath and go around and wait in a block where it is not on the intersection. If you would try experimenting with this save file you may notice that order of which train is going straight and which is going around is based on the internal trains update order (first train repathing after the outer train stops are closed will go straight). This is because trains repathing is basically an atomic operation consisting of 3 parts: "release all signals, find a path, reacquire signals up to a trains braking distance or farther if there are chain signals involved". State of signals is a shared state of a trains network so anything related to trains has to properly deal with state of signals and keep this single operation atomic. I could throw any amount of threads doing the trains pathfinding but it would not matter at all because there would be always only a single request being processed at a given point in time and the update thread would be waiting for the result of the pathfinder anyway. There is literally no reason in making pathfinder to be done in a separate threads since the update thread can do that computation on its own already as it is a blocking operation.

Since the state of signals is considered a shared state, in theory it could be possible to isolate separate rail networks as in most cases they are unable to interact with each other making it a candidate for a trains pathfinder, but then the update of trains themselves is also sequential because entities moving on a surface have to reregister them on the surface in a deterministic order because entity registration order on a surface determines order of entities returned when we do any kind of entity searches and we assume that entity searches give entities in the same order on all clients (they are save-load stable)

ickputzdirwech · Post by **ickputzdirwech** » Sun Mar 13, 2022 9:03 am

boskid wrote: Sun Mar 13, 2022 6:25 am in theory it could be possible to isolate separate rail networks as in most cases they are unable to interact with each other making it a candidate for a trains pathfinder

You mean something like the transport lines but for rail networks?

boskid wrote: Sun Mar 13, 2022 6:25 am but then the update of trains themselves is also sequential because entities moving on a surface have to reregister them on the surface in a deterministic order because entity registration order on a surface determines order of entities returned when we do any kind of entity searches and we assume that entity searches give entities in the same order on all clients (they are save-load stable)

Except for Factorio modding I don’t know anything about programming. I must therefore ask: is it possible (meaning your “valid”) to do the reregistration of the moved trains separately after all rail networks have been updated? Maybe in a order by rail network? First all trains from rail network 1, then 2 and so on. And if possible could it have significant performance improvements?

BlueTemplar · Post by **BlueTemplar** » Sun Mar 13, 2022 10:44 am

This seems to be a good thread for this question :

While it's not the case for the abovelinked game,

Map screenshot

I've heard that regular grid train systems that you so often see in megabases were really bad UPS-wise, because they were causing a powers of 2 complexity explosion for the train pathfinder due to the equal weights for the two adjacent paths when moving in a generally diagonal direction.

Has anyone tried to benchmark this ?

Post by **boskid** » Sun Mar 13, 2022 11:31 am

ickputzdirwech wrote: Sun Mar 13, 2022 9:03 am You mean something like the transport lines but for rail networks?

Yes, something like transport line groups (or fluid systems). Issue is that it is really unlikely to see multiple rail networks in a single file since most of the best ways to use rails is to have a single rail network making it easily expandable for adding new blocks of production that can interact with other blocks.

ickputzdirwech wrote: Sun Mar 13, 2022 9:03 am Except for Factorio modding I don’t know anything about programming. I must therefore ask: is it possible (meaning your “valid”) to do the reregistration of the moved trains separately after all rail networks have been updated? Maybe in a order by rail network? First all trains from rail network 1, then 2 and so on. And if possible could it have significant performance improvements?

No, trains themselves are also doing entity searches which are part of collision checks. If you would have a rail network without any signals, 2 trains may collide with each other so first train after it was moved has to reregister on the new position making second train colliding which prevents its movement to the same spot (if a collision would not revert movement, 2 entities would start overlapping with each other and it would be no longer possible to split them without deconstructing them first). If 2 trains would do the reregistration at later stages, they would fail to collide with each other in those kind of situations.

BlueTemplar wrote: Sun Mar 13, 2022 10:44 am I've heard that regular grid train systems that you so often see in megabases were really bad UPS-wise, because they were causing a powers of 2 complexity explosion

This is wrong. There is an upper bound of how many steps will a trains pathfinder do: worst case is to have 4 nodes for every rail segment: for each travel direction for each InChainSignalSection value: basically having a huge rail network with only chain signals but with a single normal signal for a train to make it possible to leave a chain signal section and start reconsidering entire network from the new perspective of "finally i left the intersection". Since the amount of rail segments is limited by the amount of rail pieces (every rail segment contains at least one rail entity), that means the amount of work is linear to the amount of entitites. Because of the implementation of the pathfinder, there is also priority queue involved (in this case its a MinHeap) and every node doing an expansion may need to touch at most 3 other rail segments. Total worst case amount of work is then 3*4*N log N which means it is the O(N log N) class.

In most cases trains pathfinder will return early because of the heuristic used when it finds a path. Worst case for the performance is when there is an unreachable train stop causing a train to enter a "no path" state. Only way for a trains pathfinder to get into the "no path" result is to visit every rail segment reachable from the trains point of view and seeing that none of visited segments is next to the goal. Having a lot of trains in a no path state on a large rail network and doing many forced repaths can just make game unplayable.

robot256 · Post by **robot256** » Sun Mar 13, 2022 12:39 pm

boskid wrote: Sun Mar 13, 2022 11:31 am
ickputzdirwech wrote: Sun Mar 13, 2022 9:03 am You mean something like the transport lines but for rail networks?
Yes, something like transport line groups (or fluid systems). Issue is that it is really unlikely to see multiple rail networks in a single file since most of the best ways to use rails is to have a single rail network making it easily expandable for adding new blocks of production that can interact with other blocks.

This makes perfect sense. If you try to keep trains separated from each other to avoid delays at intersections, that also tends to keep each individual network much simpler.

Large separate rail networks pretty much only happen when they are on different surfaces. You could get 3 or 4 large ones in a long Space Exploration game, but most offworld rail networks will be small. I suppose PvP could also lead to it, but I doubt many players are building parallel megabases in PvP.

Nidan · Post by **Nidan** » Sun Mar 13, 2022 4:00 pm

boskid wrote: Sun Mar 13, 2022 6:25 am What everyone making those suggestions is seeing is only the cpu load value saying "it is single threaded" and then there is the "make it multithreaded" suggestion. It would be really great if such suggestions would also have some discussion about how to solve the technical challenges to keep a solution valid in a way that would make it even possible to consider a multithreading stuff.

Most performance improvement suggestions here are indeed weak, whether due to missing or insufficient programming knowledge or failure to account for some requirement or limitation of the engine. Even when one did their research and accounted for the well known requirements/limitations (e.g. determinism), there's usually a dev post somewhere (forums, FFF, reddit, …) making a particular suggestion infeasible. Since you are one of the few who actually care about performance the low hanging fruits are likely already taken care of. Long story short: It's hard to make a good suggestion.

Examples

While we can always come up ideas and even implement them, those implementations will always be in a void, discounting constraints of the engine we don't know about. The solution would be having code access so we can try out changes and get meaningful results, but let's stay realistic: Not gonna happen.
How about a middle ground? For a given piece of the engine where you want improvements (e.g. fluid network), extract that piece into a library and make the interface the library uses public, so the community can program and, more importantly, test and compare their own attempts, allowing us to weed out the bad ideas among ourselves and only bother you / the devs with actual improvements. Add some goals and performance constraints and you could call this a contest.

BlueTemplar · Post by **BlueTemplar** » Mon Mar 14, 2022 4:37 pm

Thank you for the explanation of rail pathfinder complexity.

aka13 · Post by **aka13** » Mon Mar 14, 2022 7:45 pm

Thanks, that was a great insight on the pathfinder.

Factorio Forums