0.17.xx optimizing base for ups

PunPun · Post by **PunPun** » Thu Oct 10, 2019 7:34 pm

mrvn wrote: Thu Oct 10, 2019 10:40 amA* expands the path where |path| + H() is the cheapest. The heuristic H() must not overestimate the cost to reach the goal. It usually underestimates it relative to the amount of path remaining.

Until here you are correct.

mrvn wrote: Thu Oct 10, 2019 10:40 amWith many paths of identical cost that means the paths that have been least explored will have the lowest estimates cost and will be expanded first.

No they are not. Assuming we are using a sane heuristic(in this case that would be euclidean distance to goal) the paths that deviate the least from the direct line between the start and the goal are explored first.

Post by **Rseding91** » Thu Oct 10, 2019 11:05 pm

viewtopic.php?t=52635

manjhi · Post by **manjhi** » Fri Oct 11, 2019 12:17 am

Is there a way to drill down on entity updates? For my current base the entity updates is taking the most of the time and I'd like to see which entities are actually contributing the most.

quyxkh · Post by **quyxkh** » Fri Oct 11, 2019 3:42 am

You're using Dijkstra even for single-stop stations?

PunPun · Post by **PunPun** » Fri Oct 11, 2019 10:37 am

Rseding91 wrote: Thu Oct 10, 2019 11:05 pm viewtopic.php?t=52635

OHGODS. It's a djikstra. The only thing djikstra has over A* is that it is easier to implement. No wonder big railnetworks with lots of intersections are slow.

Post by **Rseding91** » Fri Oct 11, 2019 6:52 pm

PunPun wrote: Fri Oct 11, 2019 10:37 am
Rseding91 wrote: Thu Oct 10, 2019 11:05 pm viewtopic.php?t=52635
OHGODS. It's a djikstra. The only thing djikstra has over A* is that it is easier to implement. No wonder big railnetworks with lots of intersections are slow.

If you say so. The only time I've ever seen rail path finding being "slow" is on rail networks like this: https://www.factorio.com/blog/post/fff-296 and even then it's not actually slow relative to the rest of the simulation (still runs at 60 UPS) - it just takes enough time that I can measure it. All of the time ends up being memory latency at that point.

PunPun · Post by **PunPun** » Sat Oct 12, 2019 1:25 pm

Rseding91 wrote: Fri Oct 11, 2019 6:52 pm
PunPun wrote: Fri Oct 11, 2019 10:37 am
Rseding91 wrote: Thu Oct 10, 2019 11:05 pm viewtopic.php?t=52635
OHGODS. It's a djikstra. The only thing djikstra has over A* is that it is easier to implement. No wonder big railnetworks with lots of intersections are slow.
If you say so. The only time I've ever seen rail path finding being "slow" is on rail networks like this: https://www.factorio.com/blog/post/fff-296 and even then it's not actually slow relative to the rest of the simulation (still runs at 60 UPS) - it just takes enough time that I can measure it. All of the time ends up being memory latency at that point.

Which is why I mentioned that djikstra is easier to implement. I guess there is no point in spending development time making something 10 times faster if it's taking less than 10% of the simulation time. But the point of "all the time being memory latency" A* would be faster than djikstra even more as it needs to access a fraction of the nodes that djikstra does. If for some reason the train pathfinding needs optimizing in the future the first thing I would do is implement A*. Even tough it is more complicated than djikstra it is not a hard to implement algorithm. And its performance is wastly superior to djikstra.

Also someone should correct the wiki https://wiki.factorio.com/Railway/Train_path_finding. It claims A* is used when it is actually djikstra. And those two are infact different algorithms.

Post by **Bilka** » Sat Oct 12, 2019 1:33 pm

PunPun wrote: Sat Oct 12, 2019 1:25 pm Also someone should correct the wiki https://wiki.factorio.com/Railway/Train_path_finding. It claims A* is used when it is actually djikstra. And those two are infact different algorithms.

Well, do you have a reliable source for it being djikstra? I cannot find an FFF that says whether it is A* or djikstra. The only thing I can find is viewtopic.php?p=293877#p293877 which says A*. If I am to believe the comments in the original pathfinding commit from 0.5.0, it's also A*, but that was 6 years ago.

PunPun · Post by **PunPun** » Sat Oct 12, 2019 1:55 pm

Bilka wrote: Sat Oct 12, 2019 1:33 pm
PunPun wrote: Sat Oct 12, 2019 1:25 pm Also someone should correct the wiki https://wiki.factorio.com/Railway/Train_path_finding. It claims A* is used when it is actually djikstra. And those two are infact different algorithms.
Well, do you have a reliable source for it being djikstra? I cannot find an FFF that says whether it is A* or djikstra. The only thing I can find is viewtopic.php?p=293877#p293877 which says A*. If I am to believe the comments in the original pathfinding commit from 0.5.0, it's also A*, but that was 6 years ago.

Well the sourcecode that is in the See also part of the page is dijkstra.

Also I don't think Rseding91 would link to a thread like this that is also saying that it is dijkstra unless he tought it was correct.

Rseding91 wrote: Thu Oct 10, 2019 11:05 pm viewtopic.php?t=52635

Edit: just realized it is spelled dijkstra not djikstra.

Post by **Bilka** » Sat Oct 12, 2019 2:03 pm

PunPun wrote: Sat Oct 12, 2019 1:55 pm Well the sourcecode that is in the See also part of the page is djikstra.

Also I don't think Rseding91 would link to a thread like this that is also saying that it is djikstra unless he tought it was correct.
Rseding91 wrote: Thu Oct 10, 2019 11:05 pm viewtopic.php?t=52635

He also thought that he was correct with the A*, aka not a reliable source. I guess I shall trust aaargha's judgement of the source code, afaik they know trains pretty well.

Post by **Rseding91** » Sat Oct 12, 2019 6:15 pm

I don't know if I've ever stated what algorithm the trains use for path finding (If I did say it was using a given algorithm in the past or not): I didn't write it, and I don't maintain it. I still couldn't say if it's actually using a given algorithm or not. All I did is link to the topic where others have talked about it in great detail already.

If I did say it was using A* in the past and it's not then I was wrong. Regardless of what algorithm it's using I have implemented a few optimizations on top of the code to improve performance so if someone thinks they might have ideas about how to make it even faster I'd love to hear them.

Edit: I did say it was A* (viewtopic.php?p=293877#p293877) so I was wrong.

PunPun · Post by **PunPun** » Sun Oct 13, 2019 12:56 am

Rseding91 wrote: Sat Oct 12, 2019 6:15 pmIf I did say it was using A* in the past and it's not then I was wrong. Regardless of what algorithm it's using I have implemented a few optimizations on top of the code to improve performance so if someone thinks they might have ideas about how to make it even faster I'd love to hear them.

In that case I shall try to catch you on discord once I have cought some sleep. Optimizing pathfinding depends a lot on what data is already there from other gameplay and how it is structured so trying to figure this out in a forum correspondance would be annoying and slow.

mrvn · Post by **mrvn** » Mon Oct 14, 2019 9:47 am

PunPun wrote: Thu Oct 10, 2019 7:34 pm
mrvn wrote: Thu Oct 10, 2019 10:40 amA* expands the path where |path| + H() is the cheapest. The heuristic H() must not overestimate the cost to reach the goal. It usually underestimates it relative to the amount of path remaining.
Until here you are correct.
mrvn wrote: Thu Oct 10, 2019 10:40 amWith many paths of identical cost that means the paths that have been least explored will have the lowest estimates cost and will be expanded first.
No they are not. Assuming we are using a sane heuristic(in this case that would be euclidean distance to goal) the paths that deviate the least from the direct line between the start and the goal are explored first.

Sure. But in a grid the distance is more a manhatten metric. So going NWNWNWNW is longer than the euclidean distance predicted and then paths more distance from the directed line are predicted to be shorter. The longer the path the wider the area A* explores becomes.

Apart from all that A* has to work around other trains as well. There are many paths in a grid and it will try to avoid blocked paths. Even take longer paths if they have less obstacles. No matter how good A* is in trying the best paths first the fact that there are so many possibilities costs time.

PunPun · Post by **PunPun** » Mon Oct 14, 2019 2:47 pm

mrvn wrote: Mon Oct 14, 2019 9:47 am
PunPun wrote: Thu Oct 10, 2019 7:34 pm
mrvn wrote: Thu Oct 10, 2019 10:40 amA* expands the path where |path| + H() is the cheapest. The heuristic H() must not overestimate the cost to reach the goal. It usually underestimates it relative to the amount of path remaining.
Until here you are correct.
mrvn wrote: Thu Oct 10, 2019 10:40 amWith many paths of identical cost that means the paths that have been least explored will have the lowest estimates cost and will be expanded first.
No they are not. Assuming we are using a sane heuristic(in this case that would be euclidean distance to goal) the paths that deviate the least from the direct line between the start and the goal are explored first.
Sure. But in a grid the distance is more a manhatten metric. So going NWNWNWNW is longer than the euclidean distance predicted and then paths more distance from the directed line are predicted to be shorter. The longer the path the wider the area A* explores becomes.

Apart from all that A* has to work around other trains as well. There are many paths in a grid and it will try to avoid blocked paths. Even take longer paths if they have less obstacles. No matter how good A* is in trying the best paths first the fact that there are so many possibilities costs time.

This.

mrvn wrote: Thu Oct 10, 2019 10:40 amWith many paths of identical cost that means the paths that have been least explored will have the lowest estimates cost and will be expanded first.

Looks like this and is also what factorios current pathfinding does because it is not using A* but dijkstra

: dworst.PNG (28.05 KiB) Viewed 8046 times

: davarage.PNG (31.81 KiB) Viewed 8046 times

: dbest.PNG (31.11 KiB) Viewed 8046 times

-
A proper A* would look like this.
-

: aworst.PNG (34.51 KiB) Viewed 8046 times

: aavarage.PNG (32.04 KiB) Viewed 8046 times

: abest.PNG (29.93 KiB) Viewed 8046 times

Ofcourse the less nodes you have in your graph the better as it means less work. And A* isn't magic. But it's not as bad as you are trying to make it.

mrvn · Post by **mrvn** » Mon Oct 14, 2019 2:53 pm

Thanks, once again it shows: A picture says more than a thousand words.

Looks like there is still a lot of optimization possible for train path finding if it's using plain dijkstra even if grids will always be a bad case for it simply because they have so many junctions.

PunPun · Post by **PunPun** » Mon Oct 14, 2019 4:27 pm

mrvn wrote: Mon Oct 14, 2019 2:53 pm Thanks, once again it shows: A picture says more than a thousand words.

I understand. Pathfinding and especially A* is a subject that is not obvious without visual examples. Also if you google A* stuff the internet is full of misinformation posted by people who do not understand how A* actually works. So it is really common for people who have not spent 100+h coding A* implementations to have the wrong idea about it. Also the biggest misconception about A* is that it does not support multiple goals. When all you have to do is search the path backwards. You can have multiple start nodes and it works extremely well. In a grid it most of the time only expands from the closest start node only.

mrvn wrote: Mon Oct 14, 2019 2:53 pm Looks like there is still a lot of optimization possible for train path finding if it's using plain dijkstra even if grids will always be a bad case for it simply because they have so many junctions.

Indeed and it gets even worse when the start is not near an edge of the grid. As dijkstra expands away from the goal at the same rate as it expands towards the goal.

There is however a situation where A* will be much much worse and you'd want to do a special case to detect it before starting A*. If a train is pointing towards a dead end. Because we want to support N stations in a single search we are searching from stations to the train. And if the train is unreachable because it is pointing towards a dead end A* will have to search all the nodes in the entire network. Ofcourse this is just as bad as having a train pointing towards the network trying to path to an unreachable station in the current implementation.

Maybe a bidirectional algorithm where we expand from the train with dijkstra and expand from the stations with A* would work the best. If either exhausts all nodes it can terminate early as there is no path. At worst case this approach would be just as slow as plain dijkstra and at best it would be less than twice the nodes as the best case A* when a path exists.

Edit: I tought up a situation where the bidirectional algorithm would net twice the nodes. Situation being two big networks train in one and station in the other. But if we think about realistic gameplay situations the bidirectional dijkstra/A* hybrid or just regular A* would be the fastest on avarage.

Edit edit: The bidirectional hybrid algorithm would also eliminate the need to search two paths with A* with double headed trains.

mrvn · Post by **mrvn** » Thu Oct 17, 2019 10:36 am

PunPun wrote: Mon Oct 14, 2019 4:27 pm There is however a situation where A* will be much much worse and you'd want to do a special case to detect it before starting A*. If a train is pointing towards a dead end. Because we want to support N stations in a single search we are searching from stations to the train. And if the train is unreachable because it is pointing towards a dead end A* will have to search all the nodes in the entire network. Ofcourse this is just as bad as having a train pointing towards the network trying to path to an unreachable station in the current implementation.

There will always be worst cases.

The first improvement I would do there is to add a rail network id like the electric network id or fluid system id. If the train and stations aren't on the same rail network (e.g. on different surfaces or simply not connected) there is no point searching for a path.

PunPun wrote: Mon Oct 14, 2019 4:27 pm Maybe a bidirectional algorithm where we expand from the train with dijkstra and expand from the stations with A* would work the best. If either exhausts all nodes it can terminate early as there is no path. At worst case this approach would be just as slow as plain dijkstra and at best it would be less than twice the nodes as the best case A* when a path exists.

Edit: I tought up a situation where the bidirectional algorithm would net twice the nodes. Situation being two big networks train in one and station in the other. But if we think about realistic gameplay situations the bidirectional dijkstra/A* hybrid or just regular A* would be the fastest on avarage.

You can also do A* towards multiple goals. The problem there is the heuristic function. You have to estimate the distance to the goal, which comes down to distance to the closest station. If that station turns out not to be the goal the heuristic will greatly underestimate the cost to reach the actual best station. Which makes A* perform less than optimal. As it expands it will constantly turn back to reach the closest station until it gets far enough from one station so that another is the closest. Then it will race towards that station and spread around it again. For that scenario though you would have to have tracks signaled bidirectional with stations on the wrong side (for the train) and no easy way to turn the train around.

Another option is to do multiple A* in parallel, one per goal. If you do a bidirectional search you already have 2 searches in parallel. It's not much harder to extend that to N+1 searches. Just takes a bit more memory. N times as much which could be a problem. I have tons of station named "fuel". If they are all enabled that would be a lot of memory. Maybe fall back to dijkstra for >10 stations or don't go bidirectional then.

PunPun wrote: Mon Oct 14, 2019 4:27 pm Edit edit: The bidirectional hybrid algorithm would also eliminate the need to search two paths with A* with double headed trains.

For double headed trains you search with A* starting at the destinations. Then you expand the paths from each station backwards till one of them hits the segment the train is in. Why would you need to search twice?

Single headed trains sound more troublesome to me. Because when you near the segment where the train is in but going the wrong way then how do you estimate the distance to the train?

Here is another worst case scenario for you: The train is on a dead end with the station at the end. But it's facing the wrong way. It has to drive away from the station, turn around somewhere and come back. If there is nowhere to turn around then it has to search the whole network. Forward search, backward search are the same. bidirectional would search the whole network twice.

I note that all the worst cases seem to be "No Path" situations. Something a working rail network shouldn't have. And if it does that searches once every time the rail network is modified. The more frequent case will be trains stopped at a chain signal. Far more frequent I would hope.

PunPun · Post by **PunPun** » Sat Oct 19, 2019 6:21 pm

mrvn wrote: Thu Oct 17, 2019 10:36 amFor double headed trains you search with A* starting at the destinations. Then you expand the paths from each station backwards till one of them hits the segment the train is in. Why would you need to search twice?

Because the ends are not neccessarily in the same segment. If you have a 100+ car long train the ends can be really far from eachother.

mrvn wrote: Thu Oct 17, 2019 10:36 ambidirectional would search the whole network twice.

I think you misunderstand how a bidirectional search works. When one search reaches a node that the other search has already seen the algorithm terminates and combines the two searches to make the path. It is a regularly used optimization for many cases. It would never search the network more than once.

mrvn wrote: Thu Oct 17, 2019 10:36 am You can also do A* towards multiple goals. The problem there is the heuristic function. You have to estimate the distance to the goal, which comes down to distance to the closest station. If that station turns out not to be the goal the heuristic will greatly underestimate the cost to reach the actual best station. Which makes A* perform less than optimal.

No you can't just use the distance to the closest goal. If you do that you break the algorithm and are no longer guaranteed to get the best path to a goal.

mrvn wrote: Thu Oct 17, 2019 10:36 am Another option is to do multiple A* in parallel, one per goal. If you do a bidirectional search you already have 2 searches in parallel. It's not much harder to extend that to N+1 searches. Just takes a bit more memory.

It does not take just a bit more memory. It also takes N+1 times more time. We are trying to do less work not more. And a bidirectional search is not the same as two searches in parallel.

mrvn · Post by **mrvn** » Mon Oct 21, 2019 9:51 am

PunPun wrote: Sat Oct 19, 2019 6:21 pm
mrvn wrote: Thu Oct 17, 2019 10:36 amFor double headed trains you search with A* starting at the destinations. Then you expand the paths from each station backwards till one of them hits the segment the train is in. Why would you need to search twice?
Because the ends are not neccessarily in the same segment. If you have a 100+ car long train the ends can be really far from eachother.

Right. The distance heuristic would be from the nose/tail of the train and differ quite a bit for a 100+ car train.

PunPun wrote: Sat Oct 19, 2019 6:21 pm
mrvn wrote: Thu Oct 17, 2019 10:36 ambidirectional would search the whole network twice.
I think you misunderstand how a bidirectional search works. When one search reaches a node that the other search has already seen the algorithm terminates and combines the two searches to make the path. It is a regularly used optimization for many cases. It would never search the network more than once.

No, I understand that part. But with there being no path the two searches would never hit the same segment going in the same direction. They can reach every single segment and always be in opposing driving directions.

Ok, to be fair combined they hit every possible state (segment + direction) exactly once.

PunPun wrote: Sat Oct 19, 2019 6:21 pm
mrvn wrote: Thu Oct 17, 2019 10:36 am You can also do A* towards multiple goals. The problem there is the heuristic function. You have to estimate the distance to the goal, which comes down to distance to the closest station. If that station turns out not to be the goal the heuristic will greatly underestimate the cost to reach the actual best station. Which makes A* perform less than optimal.
No you can't just use the distance to the closest goal. If you do that you break the algorithm and are no longer guaranteed to get the best path to a goal.

For A* to work the heuristic must never report a distance larger than what it actually is. As long as that remains true A* will find the shortest path. The better the heuristic the faster it finds it usually. A heuristic of H() = min([<distance to each goal>]) fits the requirement.

Note: You don't pick the closest goal at the start and then use the distance to that. Every time the heuristic has to find the closest goal from the position it is at. Otherwise I agree with you. If you predetermine the closest goal it will break A*.

PunPun wrote: Sat Oct 19, 2019 6:21 pm
mrvn wrote: Thu Oct 17, 2019 10:36 am Another option is to do multiple A* in parallel, one per goal. If you do a bidirectional search you already have 2 searches in parallel. It's not much harder to extend that to N+1 searches. Just takes a bit more memory.
It does not take just a bit more memory. It also takes N+1 times more time. We are trying to do less work not more. And a bidirectional search is not the same as two searches in parallel.

Is there a way to record all the train path finding the game does for say 10 minutes in a megabase? Might be helpful to run the same set of path finding through the various improvements suggested and see how many nodes in the graph each one touches.

PunPun · Post by **PunPun** » Mon Oct 21, 2019 5:57 pm

mrvn wrote: Mon Oct 21, 2019 9:51 amFor A* to work the heuristic must never report a distance larger than what it actually is. As long as that remains true A* will find the shortest path. The better the heuristic the faster it finds it usually. A heuristic of H() = min([<distance to each goal>]) fits the requirement.

I was too tired. You are correct it does find the shortest path. But if you have two lists of points and you need to connect any one point from one list to any one point on the other list with the shortest path. You explore less nodes if you choose the bigger list as the startnodes and the smaller list as goals.

Factorio Forums

0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups

Re: 0.17.xx optimizing base for ups