Page 16 of 22

Re: Parrallel processing in games & applications

Posted: Fri Jan 31, 2020 1:37 pm
by Olacken
BlueTemplar wrote:
Thu Jan 30, 2020 4:32 pm
Speaking of which, any way to have multiple Factorio instances share as much "stuff" as possible ?
(I'm especially thinking of atlases in VRAM, anything else ?)
Run a server for each one of them and acces it via only one client?
It's not much sharing

Re: Parrallel processing in games & applications

Posted: Sat Feb 01, 2020 1:13 am
by planetmaker
BlueTemplar wrote:
Thu Jan 30, 2020 4:32 pm
Speaking of which, any way to have multiple Factorio instances share as much "stuff" as possible ?
(I'm especially thinking of atlases in VRAM, anything else ?)
What would be the application of such?

Re: Parrallel processing in games & applications

Posted: Sat Feb 01, 2020 3:05 am
by BlueTemplar
One would be faking dual-screen mode - one of the screens showing map view (for instance).

Re: Parrallel processing in games & applications

Posted: Mon Feb 03, 2020 10:16 am
by mrvn
BlueTemplar wrote:
Sat Feb 01, 2020 3:05 am
One would be faking dual-screen mode - one of the screens showing map view (for instance).
At the cost of halving your game speed? No thank you. You want to share more than the VRAM with two clients running the same game. Computing the game state twice is just stupid.

Re: Parrallel processing in games & applications

Posted: Mon Feb 03, 2020 12:09 pm
by BlueTemplar
It only becomes an issue once your base is so big you can't "run double speed".
And except for the fighting part, Factorio, like other similar games is still "fine" down to ~15 FPS (YMMV).
And I *did* ask if more could be shared.
In my experience, when heavily modded, VRAM is the limiting factor. (Though 0.17 might have changed that. And depends on how great you want the game to look.)

Re: Parrallel processing in games & applications

Posted: Fri Feb 07, 2020 12:26 am
by Adamo
BlueTemplar wrote:
Thu Jan 30, 2020 4:32 pm
Speaking of which, any way to have multiple Factorio instances share as much "stuff" as possible ?
(I'm especially thinking of atlases in VRAM, anything else ?)
Has clusterio been mentioned, already? We run a cluster of factorio servers, so we're taking advantage of parallel processing this way. Clusterio allows us to share items, signals, electricity, even trains between servers. Someone made a button for transporting between worlds for the big clusterio event. The devs added a function in late version 17 to change servers for exactly this reason (and perhaps others). I made a button, too, that I added to my "spaceport edition" clusterio mod fork, but it doesn't QUITE work yet (right now it just pretends to teleport -- it wasn't a huge priority to finish it off relative to the other mods I finished off before moving states last year).

There's even strong gameplay potential here, and this is a long-term goal of mine and others, to use different scenarios across servers to have different "regions" of the same world. Right now it feels more like each server is a different planet. But for example there is a "diggy" scripted scenario which is essentially an underground cavern; it's associated with, uhh, crap, Red Something. They're on Discord. The idea could be that you can transfer from the surface of a server to its underground world in some associated regions on each server, for example. But there's no way to actually gate this, so I think many people end up feeling that it's easier to just disconnect and connect to the new server directly, if they had to actually run somewhere to change servers. Something like using different surfaces could work better, but applying scenario code (to do more to the maps than just change resource settings and the like) on a per-surface basis is time-consuming to set up; I don't think anyone has done it that way, yet.

Re: Parrallel processing in games & applications

Posted: Fri Feb 07, 2020 7:59 am
by BlueTemplar
Clusterio is awesome and much more on topic but pretty much the opposite of what I asked for (minimize the resources needed to run the exact same simulation over multiple instances running on the same computer).

Re: Parrallel processing in games & applications

Posted: Fri Feb 07, 2020 2:55 pm
by bobucles
The biggest factorio chokepoint revolves around memory access performance. Running more clones of factorio.exe would only multiply the amount of memory bandwidth that your PC requires. The synchronous nature of multiplayer requires that everyone runs the exact same simulation, so you can't truly dump any load to an offsite process. Your PC still needs to run the full map regardless.

Sharing memory resources between applications sounds like a headache and a security nightmare. Many years of software design are pushing towards doing the exact opposite of that, forcing individual applications to run in tiny pocket dimensions and be completely invisible to each other. Synchronizing data between the pocket dimensions requires a carefully constructed form of data sharing, which probably has more in common with socket connections than actual memory sharing? I wouldn't count on it happening without the core game being explicitly built for it. It's probably easier to just build a second PC, for the rare handful of users that want more factorios per factorio.

Clusterio is probably the only setup that truly breaks the game world into many smaller games. Each clusterio server can built with its own memory channels and CPU, so each server can individually expand into its own megabase. The end user is only required to simulate one node in the clusterio server at a time, so they don't feel the burden of a dozen megabases running at once. The only downside is that if one clusterio map slows down, it will be forced to affect the other maps in some way. Either they get reduced throughput from the slower map, or they are forced to sync speeds with the weakest link.

Re: Parrallel processing in games & applications

Posted: Sat Feb 08, 2020 2:15 am
by Adamo
BlueTemplar wrote:
Fri Feb 07, 2020 7:59 am
Clusterio is awesome and much more on topic but pretty much the opposite of what I asked for (minimize the resources needed to run the exact same simulation over multiple instances running on the same computer).
I'm sure it's been beat to death, but I'll throw in my thoughts. My understanding is that the *only* things that can't always be done in parallel are things that have to be matched in the time coordinate. That's essentially the same as saying "all of those things that have to be completed before the next tick". Anything else can be done in parallel without simulation issues. Anything that has to be done on time can potentially be done in parallel at least sometimes, but without knowing for sure, computational and code-creation resources have to be spent to allow the simulation to decide when to parallelize and when not to. There is a diminishing returns issue there. So it often becomes worth it to identify some of the big things that can be done in parallel, offload those, and leave it at that.

One example application of this would be if we wanted to split processing between surfaces. So the factory simulation on surface 1 uses processor 1, and surface 2 goes to processor 2. This is completely fine if neither surfaces interact with each other, ever. They don't have to be matched in time. But if, say, you wanted the pollution to cross from one surface to the other (like for the example of an underground cavern that pollutes out to the surface), suddenly you can't necessarily do that, since the pollution needs to be populated at the same moment in each simulation. So you can make decisions to hybrid this, like say that the pollution surface only has to be a rough statistical representation that over time transfers the total amount of pollution, thus the one thread only has to *roughly* keep up and there is some flexibility, but now your simulation always has to at least check.

Essentially it seems to me that it becomes difficult to parallelize much beyond the major meta-data handling and other processes that aren't directly impacting the simulation, so the pains quickly outweigh the gains.

Re: Parrallel processing in games & applications

Posted: Sun Feb 09, 2020 8:23 pm
by Nemo4809
Adamo wrote:
Sat Feb 08, 2020 2:15 am
My understanding is that the *only* things that can't always be done in parallel are things that have to be matched in the time coordinate. That's essentially the same as saying "all of those things that have to be completed before the next tick".
You forgot dependencies.

If calculation B uses the results of calculation A. You cannot do calculations A and B together. You have to do A first then B - i.e. serially.

Re: Parrallel processing in games & applications

Posted: Mon Feb 10, 2020 5:33 am
by Adamo
Nemo4809 wrote:
Sun Feb 09, 2020 8:23 pm
Adamo wrote:
Sat Feb 08, 2020 2:15 am
My understanding is that the *only* things that can't always be done in parallel are things that have to be matched in the time coordinate. That's essentially the same as saying "all of those things that have to be completed before the next tick".
You forgot dependencies.

If calculation B uses the results of calculation A. You cannot do calculations A and B together. You have to do A first then B - i.e. serially.
I don't think I did forget dependencies. The only dependencies that matter in terms of whether or not they can be split off into a new thread are ones that are along the chain of only those things that must be ready by the time the next tick hits, unless I'm mistaken?

Re: Parrallel processing in games & applications

Posted: Mon Feb 10, 2020 9:17 am
by planetmaker
Adamo wrote:
Mon Feb 10, 2020 5:33 am
I don't think I did forget dependencies. The only dependencies that matter in terms of whether or not they can be split off into a new thread are ones that are along the chain of only those things that must be ready by the time the next tick hits, unless I'm mistaken?
Consider the following: there's two inserters which can grab items from the same position of a belt in that timestep. But only one can grab them. You need to have a well-defined order for them, or in one case inserter 1 will be quicker, in another the other - and for instance in one case you build gears in the other you smelt steel.
So you need to serialize deterministically also the order of actions within this very timestep to remain deterministic overall.

Re: Parrallel processing in games & applications

Posted: Mon Feb 10, 2020 4:40 pm
by Nemo4809
Adamo wrote:
Mon Feb 10, 2020 5:33 am
Nemo4809 wrote:
Sun Feb 09, 2020 8:23 pm
Adamo wrote:
Sat Feb 08, 2020 2:15 am
My understanding is that the *only* things that can't always be done in parallel are things that have to be matched in the time coordinate. That's essentially the same as saying "all of those things that have to be completed before the next tick".
You forgot dependencies.

If calculation B uses the results of calculation A. You cannot do calculations A and B together. You have to do A first then B - i.e. serially.
I don't think I did forget dependencies. The only dependencies that matter in terms of whether or not they can be split off into a new thread are ones that are along the chain of only those things that must be ready by the time the next tick hits, unless I'm mistaken?
What I'm saying is, often things can't be parallelized while having nothing to do with time or "ticks".

If calculation B needs the results of calculation A, you can't run both of them together - you can't split either off into a separate thread so both calculations run side by side.

Re: Parrallel processing in games & applications

Posted: Tue Feb 11, 2020 5:34 am
by Adamo
planetmaker wrote:
Mon Feb 10, 2020 9:17 am
Consider the following: there's two inserters which can grab items from the same position of a belt in that timestep. But only one can grab them. You need to have a well-defined order for them, or in one case inserter 1 will be quicker, in another the other - and for instance in one case you build gears in the other you smelt steel.
So you need to serialize deterministically also the order of actions within this very timestep to remain deterministic overall.
Yes, but those things are also things that have to be done by the end-of-the tick, and thus are included in that class of things... If the inserters didn't have to activate *at a certain time*, but somehow still had some sort of existence outside of game time, then sure, the internal workings of the inserter couldn't be parallelized relative to eachother, but that's a side point: the inserter as a system could be parallelized away from the primary game simulation into a separate "inserter thread".
Nemo4809 wrote:
Mon Feb 10, 2020 4:40 pm
What I'm saying is, often things can't be parallelized while having nothing to do with time or "ticks".

If calculation B needs the results of calculation A, you can't run both of them together - you can't split either off into a separate thread so both calculations run side by side.
The only time a set of calculations can't be forked from the main game simulation into a new thread is when the result needs to be reported to the game simulation in a timely fashion. The internal workings of some set of calculations isn't relevant; the point is that *as a system*, those things can be forked off the main thread.

Re: Parrallel processing in games & applications

Posted: Tue Feb 11, 2020 6:41 am
by Nemo4809
Adamo wrote:
Tue Feb 11, 2020 5:34 am
Nemo4809 wrote:
Mon Feb 10, 2020 4:40 pm
What I'm saying is, often things can't be parallelized while having nothing to do with time or "ticks".

If calculation B needs the results of calculation A, you can't run both of them together - you can't split either off into a separate thread so both calculations run side by side.
The only time a set of calculations can't be forked from the main game simulation into a new thread is when the result needs to be reported to the game simulation in a timely fashion. The internal workings of some set of calculations isn't relevant; the point is that *as a system*, those things can be forked off the main thread.
A program is just a bunch of operations/calculations that must be done.

A program parallelizes well if you can take the operations and spread it out over multiple processors - reducing the time it takes to complete the program.

You can't do that if all of the operations depend on the results of the previous one. Such a program can only be run sequentially.

Parallelization has little to do with time. Factorio is just a "program" on a loop. The "program" runs every 16.6ms (for 60fps) and must complete all its operations within the 16.6ms - if it fails to do so, the frame/update rate drops below 60fps. Those operations are what you would try to parallelize so the "program" can run faster.

Re: Parrallel processing in games & applications

Posted: Tue Feb 11, 2020 12:11 pm
by Adamo
Nemo4809 wrote:
Tue Feb 11, 2020 6:41 am
You can't do that if all of the operations depend on the results of the previous one. Such a program can only be run sequentially.

Parallelization has little to do with time. Factorio is just a "program" on a loop. The "program" runs every 16.6ms (for 60fps) and must complete all its operations within the 16.6ms - if it fails to do so, the frame/update rate drops below 60fps. Those operations are what you would try to parallelize so the "program" can run faster.
I'm sorry, but you are just wrong. Parallelization of game simulation components has *everything* to do with time. If you want to understand more-deeply why that is, you should ask, but it is just rude to try to make it my problem that you don't understand this by insisting with your incorrect logic that I'm somehow wrong. Fundamentally, everything that MATTERS in a game simulation engine matters because it has to happen at a certain time. If it doesn't have to happen at a certain time, then it doesn't matter to the simulation, and thus can be parallelized away without worry. If some subsystem does have to be done in a timely fashion, then it can only be parallelized away *under the correct conditions*, and the simulation controller must test for those conditions in order to make the correct decisions about what to fork. That is how it works. That could include slowing down the simulation to wait for a thread to finish, sure, although at that point it probably wasn't worth forking the thread in the first place; and therein lies the rub: using actionable information to predict which forks can be done on time. That's a difficult task that is central to parallelizing subsystems in a game simulation.

Re: Parrallel processing in games & applications

Posted: Tue Feb 11, 2020 12:58 pm
by Nemo4809
Adamo wrote:
Tue Feb 11, 2020 12:11 pm
Nemo4809 wrote:
Tue Feb 11, 2020 6:41 am
You can't do that if all of the operations depend on the results of the previous one. Such a program can only be run sequentially.

Parallelization has little to do with time. Factorio is just a "program" on a loop. The "program" runs every 16.6ms (for 60fps) and must complete all its operations within the 16.6ms - if it fails to do so, the frame/update rate drops below 60fps. Those operations are what you would try to parallelize so the "program" can run faster.
I'm sorry, but you are just wrong. Parallelization of game simulation components has *everything* to do with time. If you want to understand more-deeply why that is, you should ask, but it is just rude to try to make it my problem that you don't understand this by insisting with your incorrect logic that I'm somehow wrong. Fundamentally, everything that MATTERS in a game simulation engine matters because it has to happen at a certain time. If it doesn't have to happen at a certain time, then it doesn't matter to the simulation, and thus can be parallelized away without worry. If some subsystem does have to be done in a timely fashion, then it can only be parallelized away *under the correct conditions*, and the simulation controller must test for those conditions in order to make the correct decisions about what to fork. That is how it works. That could include slowing down the simulation to wait for a thread to finish, sure, although at that point it probably wasn't worth forking the thread in the first place; and therein lies the rub: using actionable information to predict which forks can be done on time. That's a difficult task that is central to parallelizing subsystems in a game simulation.
You seem really fixated on this time-based ideology. LOL

You came up with it yourself or did you read about it somewhere?

I tried to be nice. But it seems that’s not appreciated.

Dependencies are frequently the only thing determining how well an algorithm can be parallelized - if at all. This is common knowledge. An example would be instruction level parallelism in modern CPUs.
Fundamentally, everything that MATTERS in a game simulation engine matters because it has to happen at a certain time.
No. Everything has to happen within 16.6ms. When in the 16.6 ms it happens isn't particularly important.

And that's only if you are targeting a 60ups. Many calculations in games, especially on the graphics side, are done every few frames to save on processing power and bandwidth - i.e. they target an update rate lower than 60ups.

I know what a game simulation tick is. I'm just telling you it has little to no bearing on parallelization of programs. Most programs don't even run on "ticks" - e.g. a file compressor.

Re: Parrallel processing in games & applications

Posted: Tue Feb 11, 2020 2:28 pm
by planetmaker
If you only have one game state like in a single-player game, you can like perfectly parallelize most stuff in one tick - as then it doesn't matter which of two inserters grabs an item and whether you produce a green circuit or a low density structure a result.

However, factorio is a multi-player game. And the game has to do the very same thing on each of the computers of all participating players, irrespective of their hardware. As a result it needs to make sure that things which compete for the same ressources happen in a deterministic order, even when it is supposed to happen "at the same time". That fundamentally screws with parallel processing during the update phase of a tick, as order of movement matters, as does order of placement of outputs. Thus you can parallelize maybe movement and production (if you cache previous assembler storage)... but you cannot completely parallelize item movement.

You can go to length to separate the factory in chunks which don't interact - but that's anything but an easy task (doubtful that's possible at all). Fluids seemingly have been successfully separated from belts etc.

Multithreaded performance

Posted: Thu Feb 13, 2020 3:02 pm
by sushi_eater
Having done quite a bit of multi-threaded programming, the claims that Factorio is memory-bound didn't make sense to me. The memory controller of current CPUs has quite a bit of parallelism and doesn't easily saturate with a small number of threads.

I ran some benchmarks with parallel Factorio instances on two different systems. The first one is a Intel Westmere Xeon running at 4GHz, identical in performance to a Nehalem Core i7 920 overclocked to 4GHz. RAM is 24GB triple channel DDR3-1420 (overclocked DDR3-1333). The second system is a Ryzen 3900X with 32GB dual channel DDR4-3733 CL16.

Runtime for 10'000 ticks, all Factorio instances are running the same headless benchmark using Steve Trovs 10k Science belt megabase (Factorio 18.06 on Linux).

Intel Xeon:
1 Factorio instance: 181s = 55 combined UPS
2 Factorio instances: 184.5s = 108 combined UPS
3 Factorio instances: 210s = 143 combined UPS
4 Factorio instances: 243.5s = 164 combined UPS

Ryzen 3900s:
1 Factorio instance: 113.5s = 88 combined UPS
2 Factorio instances: 122s = 164 combined UPS
3 Factorio instances: 136s = 221 combined UPS
4 Factorio instances: 151.5s = 264 combined UPS
5 Factorio instances: 167s = 299 combined UPS
6 Factorio instances: 181s = 331 combined UPS

So it's very clear that Factorio is far from being memory bandwidth constrained. A decent multi-threaded implementation should be able to improve UPS by 3x.

(Ryzen 3900x boost clock is much higher with lower thread counts, so the relatively higher performance per instance low thread counts reflects the higher boost clock, not memory constraints.)

Re: Multithreaded performance

Posted: Thu Feb 13, 2020 4:09 pm
by movax20h
You are mostly testing the memory bandwidth here. Not latency. It is possible to hide latency by using more thread and utilizing multithreading or very smart software prefetching, but that would complicate the code base a lot, and still doesn't work fully with all entities easily.

It is more of an open research area. Definitively doable, but requires a lot of work and restructuring / rearchitecting.

I am doing a research on a Factorio-like clone / simulator, that is mostly focused on multithreading, interaction graph partitioning and load balancing, and other tricks (like how to use some shared state asynchronously without locks, but with deterministic outcome) and it is non trivial. I do it in Erlang, which makes it nice to prototype, but it does induce interesting problems on its own (a bit higher memory per-entity). I already discovered quite a bit of bottlenecks in the process scheduler for example, mostly due to non-locality of entities.