Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
PunkSkeleton
Long Handed Inserter
Long Handed Inserter
Posts: 86
Joined: Sun Oct 09, 2016 2:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by PunkSkeleton »

First of all check how UltraSPARC T5 CPUs are made - they have 4x as many cores as ALUs. Why? Because in most applications memory latency is the main problem. This is a perfect example of how multi-threading helps to deal with memory latency issues which seem to be a main performance bottleneck in Factorio.
Next: multi-threading belts is probably one of the most trivial things in the game as you have "millions" of completely independent belt networks in most factories. Why wouldn't you want to do those independent belt updates in parallel unless all the belt update does not happen in consistent point during frame cycle. Synchronization? Hell no, just make each thread update modulo 0-<number of threads> belt network. The only synchronization is: "I'm finished with belts, let's proceed with other updates". Will this make one thread working longer then the other? Yes, but it is simple implementation and it will give some gains in performance. You can also somehow sort the networks by size and use some more clever work division (biggest then 8th biggest in thread 1, 2nd biggest then 7th biggest in thread 2) but this is just more optimization.
Also belt updates seem not so easy to multi-thread by area especially with belt optimization. What happens on one side of the belt often affects what happens on the other end. There is no limit in effect distance here, at least I don't see it while playing the game.

Multi-thread by area: make them large, especially when using scheduler. 64x64 isn't a big size at all, let those threads work for a while. You don't need performance gains for small factories - those already work perfectly fine. Also there are two approaches:
1. Everything behaves exactly the same as during single-threaded update. That means you need some complicated logic on cross border updates. This is something I would do in my work as I absolutely cannot change the result of the program by making it multi-threaded. In this case you really want to have as big areas as possible.
2. Accept that multi-threaded version of the game works slightly differently. This seems to be what Harkonnen had done and it is fine for games. But this is also harder to debug in case of problems.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14735
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

PunkSkeleton wrote:First of all check how UltraSPARC T5 CPUs are made - they have 4x as many cores as ALUs. Why? Because in most applications memory latency is the main problem. This is a perfect example of how multi-threading helps to deal with memory latency issues which seem to be a main performance bottleneck in Factorio.
Next: multi-threading belts is probably one of the most trivial things in the game as you have "millions" of completely independent belt networks in most factories. Why wouldn't you want to do those independent belt updates in parallel unless all the belt update does not happen in consistent point during frame cycle. Synchronization? Hell no, just make each thread update modulo 0-<number of threads> belt network. The only synchronization is: "I'm finished with belts, let's proceed with other updates". Will this make one thread working longer then the other? Yes, but it is simple implementation and it will give some gains in performance. You can also somehow sort the networks by size and use some more clever work division (biggest then 8th biggest in thread 1, 2nd biggest then 7th biggest in thread 2) but this is just more optimization.
Also belt updates seem not so easy to multi-thread by area especially with belt optimization. What happens on one side of the belt often affects what happens on the other end. There is no limit in effect distance here, at least I don't see it while playing the game.

Multi-thread by area: make them large, especially when using scheduler. 64x64 isn't a big size at all, let those threads work for a while. You don't need performance gains for small factories - those already work perfectly fine. Also there are two approaches:
1. Everything behaves exactly the same as during single-threaded update. That means you need some complicated logic on cross border updates. This is something I would do in my work as I absolutely cannot change the result of the program by making it multi-threaded. In this case you really want to have as big areas as possible.
2. Accept that multi-threaded version of the game works slightly differently. This seems to be what Harkonnen had done and it is fine for games. But this is also harder to debug in case of problems.
  • Belts cascade update other belts in any connected sequence otherwise they would trip over each other as they try to move items
  • Belts activate inserters, loaders, mining drills, and other belts that are waiting on the given belt
  • Entities going active/inactive changes the place the entity sits in the list of active entities for a given chunk
  • Entities going active/inactive changes the chunks active state changing where it sits in the list of active chunks on the surface
So, you can't "... just make each thread update modulo 0-<number of threads> belt network." You would need to ensure that all operations happen in a deterministic way (regardless of thread count) without race conditions. And that's if it even helps in the end since there's a non-zero cost to start a thread (or start it working on something) that you don't pay when you do all of the work asynchronously.

Not to say it's impossible and not to say it wouldn't help but it's nowhere near as simple as you're trying to make it sound.
If you want to get ahold of me I'm almost always on Discord.
hoho
Filter Inserter
Filter Inserter
Posts: 681
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Parrallel processing in games & applications

Post by hoho »

Having multiple threads in flight per core on Sparc makes sense. Not so much on a x86 CPU.

Most obvious difference is that Sparc has almost an order of magnitude higher memory throughput.
Second, it doesn't have to context switch to wait for data to be pulled into CPU/caches. Context switch pollutes the cache - takes time.
User avatar
MeduSalem
Smart Inserter
Smart Inserter
Posts: 1686
Joined: Sun Jun 08, 2014 8:13 pm
Contact:

Re: Parrallel processing in games & applications

Post by MeduSalem »

hoho wrote:Having multiple threads in flight per core on Sparc makes sense. Not so much on a x86 CPU.

Most obvious difference is that Sparc has almost an order of magnitude higher memory throughput.
Second, it doesn't have to context switch to wait for data to be pulled into CPU/caches. Context switch pollutes the cache - takes time.
SPARC CPUs also didn't have Out-of-Order Execution for quite a long time compared to x86 CPUs and others, which also made it very reasonable to fill the pipeline with instructions from multiple threads to avoid the CPU cores from becoming idle during long stalls from cache misses and branch mispredictions and other stuff that causes huge delays.

If I remember right the SPARC T4 from as late as 2011 was the first SPARC CPU from Sun/Oracle ever to have OoOE.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

FrodoOf9Fingers wrote:Perhaps there's been some confusion here.

I am not saying that implementing multi-threading is easy. I am not saying that we should implement multi-threading. I know it's hard. But it's not as hard as many people seem to think. Yes, race conditions need to be considered. Yes, there is overhead from threads. But this notion of "games need to be written from the ground up with parallelism in mind" is simply false and that was what I was referring to.
It's all interconnected back and forth in a way that doesn't lend itself to running each part on a different core.
Considering dependencies during threading is not a new problem, and has been solved many times. It's NOT an easy problem, but it's NOT impossible.

I think we are generally on the same page. Do you not think that:
Multi-threading factorio is hard?
It may have alot, very little, or negative impact on performance depending on implementation? (As we've been told, it's very little, but positive).

I've gotten my questions answered though. Thanks for your time! I am always open to friendly debate through PM though. :)
By the way, parallel processing on a single machine wouldn't even be my goal, specifically for multiplayer. At the moment every system in a game computes the whole world. Usualy each player is only interested in the part where he stands. Sometimes the part he zooms in on the map. So why not split the world into parts and distribute them across all players and then do some duplicate computations for the parts the player is looking at right that moment. Obviously the amount of network traffic would need to go way up to transmit updates between systems and hard to design right. But distributing the game like this and running at the speed near the sum of all systems instead of the speed of the slowest would be awesome.
orzelek
Smart Inserter
Smart Inserter
Posts: 3924
Joined: Fri Apr 03, 2015 10:20 am
Contact:

Re: Parrallel processing in games & applications

Post by orzelek »

mrvn wrote:
FrodoOf9Fingers wrote:Perhaps there's been some confusion here.

I am not saying that implementing multi-threading is easy. I am not saying that we should implement multi-threading. I know it's hard. But it's not as hard as many people seem to think. Yes, race conditions need to be considered. Yes, there is overhead from threads. But this notion of "games need to be written from the ground up with parallelism in mind" is simply false and that was what I was referring to.
It's all interconnected back and forth in a way that doesn't lend itself to running each part on a different core.
Considering dependencies during threading is not a new problem, and has been solved many times. It's NOT an easy problem, but it's NOT impossible.

I think we are generally on the same page. Do you not think that:
Multi-threading factorio is hard?
It may have alot, very little, or negative impact on performance depending on implementation? (As we've been told, it's very little, but positive).

I've gotten my questions answered though. Thanks for your time! I am always open to friendly debate through PM though. :)
By the way, parallel processing on a single machine wouldn't even be my goal, specifically for multiplayer. At the moment every system in a game computes the whole world. Usualy each player is only interested in the part where he stands. Sometimes the part he zooms in on the map. So why not split the world into parts and distribute them across all players and then do some duplicate computations for the parts the player is looking at right that moment. Obviously the amount of network traffic would need to go way up to transmit updates between systems and hard to design right. But distributing the game like this and running at the speed near the sum of all systems instead of the speed of the slowest would be awesome.
Thats a pretty interesting idea that would work for LAN play potentially.
If you go anywhere outside it you will die by latency and bandwith.
Imagine one player out on one side of map fighting biters and coming back to 10 thousand entity factory that he did not simulate needed to load state of all those objects from network.
It still keeps the main problem of object interaction and dependencies. Any cable/belt/train going from one place to another would make this approach impossible or very slow/latency based.
PunkSkeleton
Long Handed Inserter
Long Handed Inserter
Posts: 86
Joined: Sun Oct 09, 2016 2:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by PunkSkeleton »

Rseding91 wrote:
  • Belts cascade update other belts in any connected sequence otherwise they would trip over each other as they try to move items
That's why I suggested to divide by belt systems (interconnected sequences as you call them).
Rseding91 wrote:
  • Belts activate inserters, loaders, mining drills, and other belts that are waiting on the given belt
IMHO such activations should be always scheduled for the next frame (belt knows when the item will arrive so it can schedule arrival in advance). In general all actions should be scheduled for the next frame as this makes the code simpler. It would probably change the game in borderline cases like:

Code: Select all

Current if I understand correctly:
[list=][*]Assembly Machine asks all inserters around to bring it 1 iron plate.
[*]Inserters ask belts if there are any available to pick (in sequence).
[*]First inserter that gets a positive response picks an iron plate from the belt and signals the Assembly machine that it has got it.
[*]The assembly machine tells all inserters around that it does no longer wait for the iron plate.[/list]

Code: Select all

"clocked system":
[list=][*]Assembly Machine asks all inserters around to bring it 1 iron plate.
[*]ADVANCE FRAME
[*]Inserters ask belts if there are any available to pick (all of them, order doesn't matter anymore).
[*]ADVANCE FRAME
[*]All belts that have an Iron plate ready for those inserters put the plates into inserters arms. In case if belts have 1 plate and more "clients" conflicts should be resolved by the belt itself by a deterministic algorithm (for example the inserter which is on the same side of the belt has a priority, short handed inserted have priority etc.).
[*]ADVANCE FRAME
[*]The inserters tell assembly machine that they've got the plate (each with a plate does it).
[*]ADVANCE FRAME
[*]The assembly machine tells all inserters around that it does no longer wait for the iron plate.
[/list]
The only difference is that it is now possible to get more plates than necessary if someone is using multiple inserters for a given material. But who cares about that when using multiple inserters?
The delay effects of the new algorithm are obviously not difficult to correct.
Rseding91 wrote:
  • Entities going active/inactive changes the place the entity sits in the list of active entities for a given chunk
  • Entities going active/inactive changes the chunks active state changing where it sits in the list of active chunks on the surface
This is exactly what I would call "state machine with infinite number of transitions". In programming I always wanted the number of transitions to be finite as this makes the code easier to debug and understand.
Rseding91 wrote: So, you can't "... just make each thread update modulo 0-<number of threads> belt network." You would need to ensure that all operations happen in a deterministic way (regardless of thread count) without race conditions. And that's if it even helps in the end since there's a non-zero cost to start a thread (or start it working on something) that you don't pay when you do all of the work asynchronously.

Not to say it's impossible and not to say it wouldn't help but it's nowhere near as simple as you're trying to make it sound.
I understand you, I might have some ideas, I might do programs for dozens of threads but I do not know how Factorio works internally. But maybe, maybe you will be able to get some good ideas from what I'm writing and that'll make me happy.

As for UltraSPARC: the data sheet of T5 processor says each processor has 4*12.8Gb/sec = 51.2 Gb/sec. Unless they're using a different units that's not an order of magnitude more than 20 Gb/sec stated here and even less than 101 Gb/sec stated for Ivy Bridge in this article: http://www.admin-magazine.com/HPC/Artic ... ith-Stream. And I'm not even talking about old processors without OOE execution, T2 is so slow in single thread that even systems 5 year older beat it without any problems in single thread performance.
So no, UltraSPARC is not as good as it seems. The shrinking market share of UltraSPARC systems confirms that. The performance of modern Xeon based server is comparable to that of T5-2 server.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

PunkSkeleton wrote:
Rseding91 wrote:
  • Belts cascade update other belts in any connected sequence otherwise they would trip over each other as they try to move items
That's why I suggested to divide by belt systems (interconnected sequences as you call them).
Here is an idea as food for thought:

Belts are interconnected so the last tile of a belt needs to be processed first, then the second last and so till the start of the belt. Right?

Well, not always. if there is any gap in a tile of belt then the tile before that can move regardless of what the tile after it does. So you can not only split the belt systems into unconnected parts. You can also split it at any tile where the belt has gaps. Having gaps only on one side of the belt but not the other might makes things complex though. Maybe split belts by side and process sides separately.
Zavian
Smart Inserter
Smart Inserter
Posts: 1650
Joined: Thu Mar 02, 2017 2:57 am
Contact:

Re: Parrallel processing in games & applications

Post by Zavian »

If we are going to keep discussing this, then personally I would suggest dividing the map up by outpost (here your main base is considered as an outpost).

Splitting by outposts that are far enough apart that their only interactions are via rail network, power network, and circuit network seems like a more natural division that should be compatible with the belt optimizations coming in 0.16 (and probably most other performance optimizations).

If you require that outposts need to be isolated from each others belt, pipe and logistics networks, then I think you could run everything (drills + assemblers + belts + pipes + inserters + logistics bots etc) from one outpost in one hit in one thread, and be able to run multiple outposts in parallel in different threads. (Still probably need to do a little bit of combining of things like production totals at the end).

Obviously this won't benefit factories where everything is one giant factory, with no clean boundaries. But in practice most larger factories have at least mining outposts separate from the main factory. Most mega-bases seem to be split up to at least some extent. If there were known performance benefits from have multiple smaller outposts compared to one giant factory then most players with UPS problems would switch to separate outposts.

If you were going this way you would also keep track of how long each outpost takes to process on average, and schedule the slowest outposts first.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

Zavian wrote:If we are going to keep discussing this, then personally I would suggest dividing the map up by outpost (here your main base is considered as an outpost).

Splitting by outposts that are far enough apart that their only interactions are via rail network, power network, and circuit network seems like a more natural division that should be compatible with the belt optimizations coming in 0.16 (and probably most other performance optimizations).

If you require that outposts need to be isolated from each others belt, pipe and logistics networks, then I think you could run everything (drills + assemblers + belts + pipes + inserters + logistics bots etc) from one outpost in one hit in one thread, and be able to run multiple outposts in parallel in different threads. (Still probably need to do a little bit of combining of things like production totals at the end).

Obviously this won't benefit factories where everything is one giant factory, with no clean boundaries. But in practice most larger factories have at least mining outposts separate from the main factory. Most mega-bases seem to be split up to at least some extent. If there were known performance benefits from have multiple smaller outposts compared to one giant factory then most players with UPS problems would switch to separate outposts.

If you were going this way you would also keep track of how long each outpost takes to process on average, and schedule the slowest outposts first.
What is an outpost? Is it any set of chunks that are connected by belts, logistic networks or have an entity cross their border other than rails? [I assume you won't split chunks.]

As for keeping track of how long each outpost takes... That might be problematic. The speed would depend on the players system and would fluctuate randomly depending on other processes on the system. You would have to be careful too keep things deterministic so you get the same result no matter what order the outposts are processed. Main problem I see is anything that uses a random number (random number combinator for example).
Zavian
Smart Inserter
Smart Inserter
Posts: 1650
Joined: Thu Mar 02, 2017 2:57 am
Contact:

Re: Parrallel processing in games & applications

Post by Zavian »

mrvn wrote: What is an outpost? Is it any set of chunks that are connected by belts, logistic networks or have an entity cross their border other than rails? [I assume you won't split chunks.]

As for keeping track of how long each outpost takes... That might be problematic. The speed would depend on the players system and would fluctuate randomly depending on other processes on the system. You would have to be careful too keep things deterministic so you get the same result no matter what order the outposts are processed. Main problem I see is anything that uses a random number (random number combinator for example).
Most of this idea is based on viewtopic.php?f=5&t=39893&start=60#p238247 . I'm just proposing to partition the map differently. I also think you could probably multithread most of the biter movements, (based on 9 sweeps with problem chunks being added to a list that is then processed in a single thread after all the biters in other chunks have moved) but that is a discussion for another day.

Conceptually you could split chunks if you wanted to, but it would probably be simpler to implement it based on chunks. So each outpost would consist of a set of adjacent chunks. And between each outpost there would be at least one empty chunk. If 2 outposts grow such that they touch each other then you merge them into a single outpost. In order to make sure this merging was deterministic, construction bots would probably run in a single threaded portion of the code. If you wanted to update logistics bots as part of the outpost update, then anytime 2 separate logistics networks connected, you would also need to merge their outposts. Alternatively logistics bot updates could be multithreaded separately from outposts.

(If two logistics networks don't touch then they can't affect each other, so you should be able to update 2 different networks on 2 different threads simultaneously. Edit: 2 logistics networks that don't touch could conceivably both affect an inserter that touches 2 chests, one in each logistic network. However you should still be able to update bots with an outpost, because in this case the entire outpost and it's logistic bot networks is actually processed by a single thread. If you make sure that separate logistics bot networks are really separate by a chunk, then they shouldn't be able to interact. Remember that I suggested construction bots be handled separately in the single threaded portion of the tick).

With an empty chunk between outposts underground pipes and belts can't bridge the gap between 2 outposts. Even mods introducing longer underground pipes belts/pipes or warehouses should be fine, up to a length of about 30 tiles. For trains which could conceivably bridge 2 outposts, simply track contents per wagon, set a flag on the train to indicate that it's contents have changed, and then during the single threaded portion of the tick, iterate over all its wagons and update the contents of the train. (Or simply always calculate train contents from it's wagon's contents whenever a trains contents is required).

With this architecture the order two outposts are processed shouldn't matter. They can't affect each other. (Indeed if I have missed anything they can do which could affect another outpost, then that interaction needs to handled somehow, perhaps by moving it to the single threaded portion of the tick). So it shouldn't matter if 2 different clients process 2 outposts in a different order. The reason for wanting to run the slowest outpost first is so that you don't end up with one outpost that takes 10ms starting toward the end of the update. Ideally you want all the threads to finish at about the same time.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

For this discussion I consider the size of an inserter to include the pickup and drop off points. So an inserter at the edge of a chunk connects the next chunk too since it can affect it. So an inserter reaching across 2 logistic networks would make that one outpost in my mind.

As for train contents. I would assume whenever a train leaves one outpost and enters another it gets handed over to the other thread. That could safely happen between ticks, especially with your suggested neutral zone of one chunk between outpost. I'm not sure what you are talking about concerning train content. I don't see anything reading train content while the train is in motion. So you would read it when the train crosses an inventory sensor or reaches a station just like now.

Constrcution robots... If two outpost are far enough away for the construction network, which I ment by logistic networks, not to touch you could run them in parallel too. I think you actually would have to make that a condition of an outpost since construction robots interact with chests and roboports.

In conclusion I think this wouldn't be so helpful for most games. Most people have one large central factory and then a few mining outposts that are largely irrelevant to the cpu requirements I think. I would hate for the game performance to dictate how people design their factory even more than it already does.
Jap2.0
Smart Inserter
Smart Inserter
Posts: 2417
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: Parrallel processing in games & applications

Post by Jap2.0 »

A few mining outposts won't affect performance that much.
There are 10 types of people: those who get this joke and those who don't.
Zavian
Smart Inserter
Smart Inserter
Posts: 1650
Joined: Thu Mar 02, 2017 2:57 am
Contact:

Re: Parrallel processing in games & applications

Post by Zavian »

mrvn wrote:As for train contents. I would assume whenever a train leaves one outpost and enters another it gets handed over to the other thread. That could safely happen between ticks, especially with your suggested neutral zone of one chunk between outpost. I'm not sure what you are talking about concerning train content. I don't see anything reading train content while the train is in motion. So you would read it when the train crosses an inventory sensor or reaches a station just like now.
I was actually referring to the possible (admittedly unlikely) situation where one end of a long train is in one outpost (with inserters interacting with a wagon) and the other end was in another outpost (again with inserters interacting with a wagon). THere are lots of ways of handling this situation, I was just suggesting an alternative that I think might perform better than what Harkonnen suggested in this post viewtopic.php?f=5&t=39893&start=60#p238247 .
mrvn wrote:Constrcution robots... If two outpost are far enough away for the construction network, which I ment by logistic networks, not to touch you could run them in parallel too. I think you actually would have to make that a condition of an outpost since construction robots interact with chests and roboports.
When construction robots place things they can cause the outpost to expand and potentially merge with adjacent outpost(s), if they place roboports they can cause two logistics networks to merge etc. The merge code needs to be deterministic across multiple machines, and the easiest way to achieve that is probably to handle that in the single-threaded portion of the tick.
mrvn wrote:In conclusion I think this wouldn't be so helpful for most games. Most people have one large central factory and then a few mining outposts that are largely irrelevant to the cpu requirements I think. I would hate for the game performance to dictate how people design their factory even more than it already does.
Whilst this won't help with every game, at least for me the longer I play and the larger my base gets the more stuff is done away from my initial base in decentralised outposts. Initially mining outposts then I'll add decentralised smelting, then decentralised science/circuit production. The larger the factory is, the more stuff I do away from the initial base. The current game already leads me to build like that because smaller isolated bot networks tend to need less bots, and hence less cpu cycles than one large integrated factory.

At least until the belt improvements drop in 0.16, beacons + modules + bots is a more cpu friendly factory architecture for large factories. Atm I can't even achieve 150 of all science packs per minute in a mainbus belt-based factory layout without ups dropping below 60. (I have an old i7 860 cpu). Yet I can get around 900 science/minute or more from a decent decentralized beaconed + moduled + bots setup. Even after the belt improvements drop in 0.16, I wouldn't want to build a belt based mainbus factory for 900 science/minute. That would be require about 180 blue belts of iron and copper, and 38 blue belts of green circuits. (Ok less belts would be needed if I used production modules, but the bus would still be impractically wide). Easier to use a decentralised outposts + trains layout. (Also note that my next factory I'll probably aim for 1800-2000 of each science per minute. At those scales even with productivity modules everywhere, and even if belt based item movement was more cpu efficient than bots, I would still want decentralised smelting and a train based network because a 200+ belt bus is simply too unwieldy to want to deal with).

I do agree that ideally game performance shouldn't dictate how people design their factories. But the reality already is that once you get over about 150 science/minute then atm you need to start paying attention to the ups considerations of your design, or your ups will start to drop. That means in multiplayer some players will be unable to keep up, and either you slow the game down for everybody or those players drop out. Also some players want to build the largest fastest factory they can. For some (many?) players that is part of the appeal of the game. (Unfortunately this also mean that every performance improvement the devs make just leads to bigger factories that are still ups limited).

So for larger base I think you already want the sort of decentralised outpost based layout that would be ideal for this type of parallelization. It's less than ideal in that it won't help every base and having performance considerations dictate factory design is also less than ideal, but I think performance/ups considerations are already critical for running large factories at decent ups, and that this isn't a regression that way.
User avatar
MrHick
Inserter
Inserter
Posts: 30
Joined: Fri Mar 17, 2017 6:35 am
Contact:

Re: Parrallel processing in games & applications

Post by MrHick »

Let me the smartest man alive explain to the devs that work on this game every day how they should design and multi-thread it.

Multi-threading is when you put X2 next to a thread and it makes x2 threads, if you put x8 next to a thread it maxes x8 threads and that is it.
Easy and the game runs x8 times faster and better and bigger and smarter and all of it.

If you need more info let me know I been told something about this internet google thing that knows it all.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

Zavian wrote:
mrvn wrote:As for train contents. I would assume whenever a train leaves one outpost and enters another it gets handed over to the other thread. That could safely happen between ticks, especially with your suggested neutral zone of one chunk between outpost. I'm not sure what you are talking about concerning train content. I don't see anything reading train content while the train is in motion. So you would read it when the train crosses an inventory sensor or reaches a station just like now.
I was actually referring to the possible (admittedly unlikely) situation where one end of a long train is in one outpost (with inserters interacting with a wagon) and the other end was in another outpost (again with inserters interacting with a wagon). THere are lots of ways of handling this situation, I was just suggesting an alternative that I think might perform better than what Harkonnen suggested in this post viewtopic.php?f=5&t=39893&start=60#p238247 .
Oh, right. But then you would have to design a train station with a >32m gap. So in a 8 train car e.g. unload the first and last car only. That would screw things up. Maybe wagons of a train should be moved between threads individually for the purpose of interactions.
Zavian wrote:
mrvn wrote:Constrcution robots... If two outpost are far enough away for the construction network, which I ment by logistic networks, not to touch you could run them in parallel too. I think you actually would have to make that a condition of an outpost since construction robots interact with chests and roboports.
When construction robots place things they can cause the outpost to expand and potentially merge with adjacent outpost(s), if they place roboports they can cause two logistics networks to merge etc. The merge code needs to be deterministic across multiple machines, and the easiest way to achieve that is probably to handle that in the single-threaded portion of the tick.
I don't think that is necessary. They only need to set a flag that a merge happened. Then when the current tick is done you use a single thread to process all the merges, compute the new set of outposts, before starting the next tick.
mrvn wrote:In conclusion I think this wouldn't be so helpful for most games. Most people have one large central factory and then a few mining outposts that are largely irrelevant to the cpu requirements I think. I would hate for the game performance to dictate how people design their factory even more than it already does.
Whilst this won't help with every game, at least for me the longer I play and the larger my base gets the more stuff is done away from my initial base in decentralised outposts. Initially mining outposts then I'll add decentralised smelting, then decentralised science/circuit production. The larger the factory is, the more stuff I do away from the initial base. The current game already leads me to build like that because smaller isolated bot networks tend to need less bots, and hence less cpu cycles than one large integrated factory.

At least until the belt improvements drop in 0.16, beacons + modules + bots is a more cpu friendly factory architecture for large factories. Atm I can't even achieve 150 of all science packs per minute in a mainbus belt-based factory layout without ups dropping below 60. (I have an old i7 860 cpu). Yet I can get around 900 science/minute or more from a decent decentralized beaconed + moduled + bots setup. Even after the belt improvements drop in 0.16, I wouldn't want to build a belt based mainbus factory for 900 science/minute. That would be require about 180 blue belts of iron and copper, and 38 blue belts of green circuits. (Ok less belts would be needed if I used production modules, but the bus would still be impractically wide). Easier to use a decentralised outposts + trains layout. (Also note that my next factory I'll probably aim for 1800-2000 of each science per minute. At those scales even with productivity modules everywhere, and even if belt based item movement was more cpu efficient than bots, I would still want decentralised smelting and a train based network because a 200+ belt bus is simply too unwieldy to want to deal with).

[/quote]

But do you have >32m gaps between those decentralised parts? The other thing is power. Do you have a single electrical grid? Power consumption would have to be coordinated between threads.
Zavian
Smart Inserter
Smart Inserter
Posts: 1650
Joined: Thu Mar 02, 2017 2:57 am
Contact:

Re: Parrallel processing in games & applications

Post by Zavian »

mrvn wrote:
Zavian wrote: When construction robots place things they can cause the outpost to expand and potentially merge with adjacent outpost(s), if they place roboports they can cause two logistics networks to merge etc. The merge code needs to be deterministic across multiple machines, and the easiest way to achieve that is probably to handle that in the single-threaded portion of the tick.
I don't think that is necessary. They only need to set a flag that a merge happened. Then when the current tick is done you use a single thread to process all the merges, compute the new set of outposts, before starting the next tick.
You would still have edge cases. Consider for example a scenario with two outpost which have overlapping roboport construction zones. Now what happens if someone orders the robots to remove some trees in the overlap zone. On machine A outpost 1 dispatches it's robots first to remove the trees. One machine B, outpost 2 sends it's robots. Instant desync. Yes you could handle that by requiring that construction zones don't overlap, but that would double the needed space between roboports, if players want to keep the outposts separate. I'm sure you could handle it other ways as well. But given that in most large bot-based factories there are ten times as many logistics bots as construction bots, the logistics bots take more processing time, and hence the cpu time saved by multithreading them is higher. The devs can always multithread construction bots later if desired.
mrvn wrote: But do you have >32m gaps between those decentralised parts? The other thing is power. Do you have a single electrical grid? Power consumption would have to be coordinated between threads.
[/quote]
Typically most of my outposts are more than 60 tiles apart. As for power Harkonnen didn't suggest that was a problem. (I'm guessing that they already track how much power entities want. Then they work out how much power is available, and just tell every entity you got xx% of your power needs. Possibly with a bit more book-keeping to account for entities suddenly switching on and off).

But anyway, these are mostly implementation details we are discussing at this point. The devs are perfectly able to sort such things out without our help, if they choose to implement my suggested strategy.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

Zavian wrote:
mrvn wrote:
Zavian wrote: When construction robots place things they can cause the outpost to expand and potentially merge with adjacent outpost(s), if they place roboports they can cause two logistics networks to merge etc. The merge code needs to be deterministic across multiple machines, and the easiest way to achieve that is probably to handle that in the single-threaded portion of the tick.
I don't think that is necessary. They only need to set a flag that a merge happened. Then when the current tick is done you use a single thread to process all the merges, compute the new set of outposts, before starting the next tick.
You would still have edge cases. Consider for example a scenario with two outpost which have overlapping roboport construction zones. Now what happens if someone orders the robots to remove some trees in the overlap zone. On machine A outpost 1 dispatches it's robots first to remove the trees. One machine B, outpost 2 sends it's robots. Instant desync. Yes you could handle that by requiring that construction zones don't overlap, but that would double the needed space between roboports, if players want to keep the outposts separate. I'm sure you could handle it other ways as well. But given that in most large bot-based factories there are ten times as many logistics bots as construction bots, the logistics bots take more processing time, and hence the cpu time saved by multithreading them is higher. The devs can always multithread construction bots later if desired.
That's not actually what would happen.

Remember this would be multithreaded and usually run on multiple cores. So outpost 1 dispatches one robot to remove a tree and outpost 2 in the other thread also dispatches a robot to remove the same tree. Much bigger problem. You would need some mechanism (locks? urgs) to make sure only one robot is dispatched. And then you can worry about making that deterministic.

But I don't think that would be a problem with how bots work at the moment. Every tick only one ghost is considered overall. Make the determination which logistic network it chooses to fulfill the ghost deterministic (as it already should be) and everything is fine.
TheTom
Fast Inserter
Fast Inserter
Posts: 186
Joined: Tue Feb 09, 2016 8:33 am
Contact:

Re: Parrallel processing in games & applications

Post by TheTom »

mrvn wrote:
Zavian wrote:
mrvn wrote:
Zavian wrote: When construction robots place things they can cause the outpost to expand and potentially merge with adjacent outpost(s), if they place roboports they can cause two logistics networks to merge etc. The merge code needs to be deterministic across multiple machines, and the easiest way to achieve that is probably to handle that in the single-threaded portion of the tick.
I don't think that is necessary. They only need to set a flag that a merge happened. Then when the current tick is done you use a single thread to process all the merges, compute the new set of outposts, before starting the next tick.
You would still have edge cases. Consider for example a scenario with two outpost which have overlapping roboport construction zones. Now what happens if someone orders the robots to remove some trees in the overlap zone. On machine A outpost 1 dispatches it's robots first to remove the trees. One machine B, outpost 2 sends it's robots. Instant desync. Yes you could handle that by requiring that construction zones don't overlap, but that would double the needed space between roboports, if players want to keep the outposts separate. I'm sure you could handle it other ways as well. But given that in most large bot-based factories there are ten times as many logistics bots as construction bots, the logistics bots take more processing time, and hence the cpu time saved by multithreading them is higher. The devs can always multithread construction bots later if desired.
That's not actually what would happen.

Remember this would be multithreaded and usually run on multiple cores. So outpost 1 dispatches one robot to remove a tree and outpost 2 in the other thread also dispatches a robot to remove the same tree. Much bigger problem. You would need some mechanism (locks? urgs) to make sure only one robot is dispatched. And then you can worry about making that deterministic.

But I don't think that would be a problem with how bots work at the moment. Every tick only one ghost is considered overall. Make the determination which logistic network it chooses to fulfill the ghost deterministic (as it already should be) and everything is fine.
Actually no.

You could deal with 2 dispatches. Hey, inefficient.

Or you would merge them. Single threaded again due to overlapping zones. Simple. No edge case to start with.

Same with the train ;)

* Make the merge zone around a train station large.
* Whenever needed, expand it to merge in more. Want a super long train -> super large merge zone, all one thread. Never reset.

Those are edge cases that practically do not matter. You can just fold them all into larger "thread zones".
shadetheartist
Inserter
Inserter
Posts: 23
Joined: Mon Sep 18, 2017 10:14 pm
Contact:

Re: Parrallel processing in games & applications

Post by shadetheartist »

BenSeidel wrote:As always, if anyone has any questions or comments I'm happy to discuss them.
You say that this "clock based multi-threading" design pattern is undiscovered by the the subscribers of popular programming culture. Is that true? If so, is this post a revealing of cataclysmic changes to the meta of software as we know it? Or is there a backwater wiki page somewhere you're referencing.

Your argument comparing 90's game programming where "the code's fine, your computer is crap" to a prediction of the future of development has sparked my curiosity. It seems like if one wants to get a foot in the door to development jobs in the future one might want to consider your view on parallelism.
Post Reply

Return to “General discussion”