Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
mmppolton
Long Handed Inserter
Long Handed Inserter
Posts: 58
Joined: Sun Apr 16, 2017 11:38 pm
Contact:

Re: Parrallel processing in games & applications

Post by mmppolton »

BenSeidel wrote:So I have seen quite a few posts around the place that simply state that "factorio is too complex to multi-thread" and that "Multithreaded code is hard" and other incorrect statements like that. I have even been accused of trolling because I say that Factorio can be multithreaded easily. Don't get me wrong, it may appear like multithreading is hard, and that Factorio is just not multi thread compatible as it's the dogma that the industry has been spouting since the dawn of time. However it's incorrect. I thought I would spend some time posting about not only some techniques for dealing with asynchronous code, but also a brief discussion about the current techniques of coding and the issues associated with them.

First, Why read the post(s)?
I have spent the last 6 years researching and writing a dynamic application whose structure is not known at compile time. The current implementation is capable of executing on a significant number of cores, not relying on pure CPU clock speed but more on memory transfer rates. I can't go into the exact details of the application, but I can discuss the lessons I have learnt along the way.

So, Multithreading is easy. Just ask any functional programming guru. Haskell supports as many cores as you can throw at it. There are even runtimes that will distribute your application over clusters of computers. For those that are wondering, Haskell is turing complete. If Factorio was written in Haskell then it would be using all the cores on everyone's machines. Unfortunately, functional programming is terrible, just ask any procedural programmer (Note that procedural programming is the binary opposite of functional programming, not a "step-down" or alternate to object-orientated programing). Now, don't get me wrong, procedural programming is just as bad as functional programming. While they both have their strengths, they both have their weaknesses. Inserting an item into an array is trivial in a procedural language but extremely expensive in a functional one. Multithreading comes naturally to a functional language, but is extremely difficult in a procedural one. We have two extremely different, incompatible techniques at our disposal that solve different issues but we are unable to use them both. So what do we do? We suffer.

Reactionary programming to the rescue!
Yes that's right. Reactive programming. As embodied in RX.net and the LINQ programming techniques put out by Microsoft over the last 5? or so years. With it's roots back in a paper published in 1997 entitled "Functional Reactive Programming", Reactive programming is supposed to merge the functional world with the procedural world. Unfortunately it doesn't. It's crap. Don't touch it with a 10 foot pole. You may think that this is my opinion, and you can think that. But you will eventually realise that it does not work when you try to use it. Reactionary programming leads you to believe that you get all the benefits of functional programming: multithreading in a thread-safe manner while keeping the mutability of your data. Unfortunately that's not the case. Issues with event propagation orders and mutability of data start cropping up in all but the most trivial examples. Why? Because you cannot have thread-safe processes WITH mutable data.

Oops... What am I saying. Of course you can. That is what semaphores are used for right? Well not really. Current techniques for controlling access to memory involve ensuring that only one thread can write to that memory at a time while no others can read from it or ensuring that the result of a process being run twice gives the same results. If you only have one thread accessing your data then it's not multi-threaded. Blocking operations, while not an issue on a single core machine, fail when put onto multi-cored machines. There are non-blocking algorithms for many tasks, but you usually find them hidden away somewhere within these higher-level memory access control structures, or are specific to one use-case.

So what do we do? Do I have some new technique that no one knows about? Hell no, It's been around for decades. If you want to program asynchronously, then all you have to do is ask the experts: CPU designers. Every component on the chip executes its function independently of the others. What do you think the pipeline is? So how do they keep this all in check? How are they able to keep their mutable state from destroying their concurrent execution of the various parts of the CPU? They use a clock. The clock synchronises all the executions within the CPU so that information can be moved from one area of the chip to another area of the chip. At no other time does information get moved through these "clock boundaries". In this way they can have memory holding state between executions, yet still allow each process to be executed concurrently.

Programming with a clock:
Yes, you need a clock. However, a clock is not a literal clock or a time-piece. A clock is simply some point at which you say "I am done, here is my result", except that everything running must say it at the same time. Clocks allow separate, asynchronous pieces of code the ability to synchronise. It's the time where one thread publishes its results of its previous task and takes the input it needs to do its next task. Clocks are the key to high-level asynchronous programming.

How does this apply to Factorio? and games in general? Well, they have a natural clock - the frame update cycle. This is a natural starting point to implement a clock. At the end of each frame-rendering (or in the case of Factorio, the game update cycle) each process can say "I have finished with my input, and here is my output". If we look at what occurs in a chip, we would see the output bits being copied across the clock boundary, being placed in the output lines of the chip component. As these output lines are connected to the input lines of other components, they write to all the input memory of the dependant components. As we don't want to copy all our memory we can use a tick-tock memory structure to get the same effect by redefining what is the "output line" memory the same way double-buffered video memory works.

So what is left?
To define an algorithm that takes the "output lines" memory to generate the next state of the game that can be used as the next cycle's "output lines".

That is it. Using a clock you can develop applications that can use thousands or millions of cores, in the same way a CPU has millions of components. Once these components are defined, you put them in a list, divide the lists up over your threads and iterate. It may seem like it's too good to be true, and for the most part it is. It's about as good as object orientated programming. Sure it took you some time to get your head around the ideas behind it, and sure after years of it being the industry standard you are beginning to realise that it's not a one-stop solution for all your issues, but by hell does it solve a specific subset of the issues you are facing. Clock based programming is exactly the same. It does not remove the need for semaphores and locks, they are there to solve an entirely different issue. It only helps with getting your program to run over many threads, at a reasonably high level in your code.
Edit:
The following section seems to be a major sticking point in the rest of the thread.
For those that don't know what example means: the dictionary definines it as "A thing characteristic of its kind or illustrating a general rule". As stated within the first line of the following, It is an example, not the way it should be/will be/is the best way to/is viable, etc implementing it.
Applying it to factorio specifically I will use an example of a chest with 2 inserter arms, both dropping items into the chest.
The chest is given the previous states of the 2 inserter arms. It checks if the input arms are ready to drop an item. If they are then the chest calculates the amount that the inserter arms can drop and where the items will go. If there is a tie then some predefined, deterministic tiebreaker is used to pick what inserter arm will drop the item. The chest then writes it's new contents to it's "back buffer", leaving its previous state - the state that all the other processes can see, the same. In a separate execution, possibly on another thread, the inserter arm is updated. It is given the previous state of the chest and the previous state of all inserting inserters. It does the same calculation to determine if it is able to drop any items in the chest. If it is able, it calculates its state including the amount currently being held (it may not have dropped them all), writing them to its "back buffer".

This is an extremely basic example of how you would code a clock-based factorio. It's not optimised at all and does not take into account possible low-level synchronisation that may be able to be made to increase performance. For example, it may be possible to merge the inserter component and the chest component into a more complex component that updates both the inserter arm and chest at the same time. It's also possible to use low level cross-thread synchronisation to attach and detach the inserter from the amalgamated component as the arm swings back and forth between the chest and its pickup location, or to use a CAS on some part of memory to write the drop count to cache it for the other component's execution. It may also be possible to eliminate the double-buffering of the component's state IF it's possible to ensure that that information is not required in another update process, etc.

But anyway, Clocks (and clock boundaries) are an extremely powerful technique that I have never seen applied in software. If any programmers out there spend some time thinking about it then they should be able to see how it would allow an application to compartmentalise its functionality to be executed across as many threads as needed. If you think that it can't be done or that software is just "too complex" then I implore you to learn VHDL and see what chip designers have been doing since the the 1980's.

There is one issue with this style of programming though: The industry does not support it. By this I mean that the CPU designers are focused more on micro-parallelisation such as the out-of-order instruction processing, the vector extension operations, branch prediction and all the other nifty things that make your sequential program run faster. CPU's with a large number of cores have been coming for over 15 years now, but we really only see 4. Not until mainstream games require at least 16+ cores will you see Intel making them standard. They follow where software developers lead.

If you have read all that then congratulations.
i so firm agreed i am some what burn out of making mega base because of the limit i would rater more fps and ups if they have to cut corner then cut corner
ChoMar
Long Handed Inserter
Long Handed Inserter
Posts: 96
Joined: Sun Aug 07, 2016 2:00 am
Contact:

What happened to Multithread?

Post by ChoMar »

Last year I read a lot about that topic in FFs and the Forum. Together with the fluid overhaul and such. But recently noone seems to talk about it. So, what happened to multithreading? Was it abandoned for being too complex? Was it postponed to stabilize .17?
Mytronix Entertainment
Serenity
Smart Inserter
Smart Inserter
Posts: 1017
Joined: Fri Apr 15, 2016 6:16 am
Contact:

Re: What happened to Multithread?

Post by Serenity »

The game is already multithreaded. But pretty much anyone who demands multithreading in games, thinking the game doesn't already use it, or that it will solve all performance issues doesn't know what they are talking about
Pandrosos
Inserter
Inserter
Posts: 29
Joined: Mon Aug 12, 2019 1:38 am
Contact:

Re: What happened to Multithread?

Post by Pandrosos »

Indeed. I don't know how well Factorio is multithreaded. I do know that it's very well optimised, and that's what matters. Unless you're on a very poor computer you won't suffer lag until you're firmly into megabase scale, well beyond the game's notional goal.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: What happened to Multithread?

Post by mrvn »

Serenity wrote: Mon Aug 19, 2019 10:25 am The game is already multithreaded. But pretty much anyone who demands multithreading in games, thinking the game doesn't already use it, or that it will solve all performance issues doesn't know what they are talking about
Except there is only one thread doing all the game updates at 100% CPU speed and the others only wake up ocasionally to do some side business, like prepare the GPU for the next frame. So on a 16 core system factorio runs at about 120% core usage instead of 1600%.

With the pipes now being split into fluid systems of connected pipes + fluid boxes it should be possible to do all the fluid systems in parallel. Is that already being done, still to come or never going to happen?
Serenity
Smart Inserter
Smart Inserter
Posts: 1017
Joined: Fri Apr 15, 2016 6:16 am
Contact:

Re: What happened to Multithread?

Post by Serenity »

Probably still being worked on. There were some other issues with the new fluid system

The problem with multithreading is that people assume that you can easily divide the game into equally sized chunks and then run them in parallel. That's just not the case. One large main thread and a bunch of smaller side threads is how things usually work. There just aren't too many discrete systems that can run alongside each other. And a lot of the time they'll be waiting on each other to be done with something. If you try to thread stuff that isn't suited for it you also run into issues with synchronizing data between them. 16 cores will never mean 16 times the performance.

Running separate fluid networks on their own threads will help, but the main thread will still be by far the biggest
Koub
Global Moderator
Global Moderator
Posts: 7955
Joined: Fri May 30, 2014 8:54 am
Contact:

Re: Parrallel processing in games & applications

Post by Koub »

[Koub] Merged into older topic with that same discussion about how much the game is multithreaded.
The devs have and will multithread as much as they can, but closely related to how the game engine works, and the fact they need deterministic behaviour 100% of the time, not everything can be split in parallel threads.
@OP (ChoMar), I encourage you to read thoroughly the topic I have merged yours into, and especially the devs' posts.
Koub - Please consider English is not my native language.
posila
Factorio Staff
Factorio Staff
Posts: 5428
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: What happened to Multithread?

Post by posila »

mrvn wrote: Mon Aug 19, 2019 11:07 amWith the pipes now being split into fluid systems of connected pipes + fluid boxes it should be possible to do all the fluid systems in parallel. Is that already being done, still to come or never going to happen?
That is already done.
mrvn wrote: Mon Aug 19, 2019 11:07 amExcept there is only one thread doing all the game updates at 100% CPU speed and the others only wake up ocasionally to do some side business, like prepare the GPU for the next frame. So on a 16 core system factorio runs at about 120% core usage instead of 1600%.
Try to set "Max render threads" in Graphics settings to 1, to see how 120% core usage really looks like :)
Adamo
Filter Inserter
Filter Inserter
Posts: 481
Joined: Sat May 24, 2014 7:00 am
Contact:

Re: Parrallel processing in games & applications

Post by Adamo »

The trick to parallel processing in factorio is to use headless servers run through clusterio. Once the UPS on the first map starts to drop, you stop, hook up your clusterio buildings to interface with the cluster (loading and unloading), and move to server two to start on a fresh map with the support of your first map.

I've been considering making a companion mod for clusterio that requires you to first turn your base into a "spaceport" by launching a rocket, which then grants access to the cluster inventory and research.
Yandersen
Long Handed Inserter
Long Handed Inserter
Posts: 93
Joined: Sun Jun 30, 2019 6:54 am
Contact:

Re: Parrallel processing in games & applications

Post by Yandersen »

Jeez, seeing a topic like this surprised me as much as a cooler on my laptop which sometimes just stops and chills when I play Factorio. Honestly, the level of optimization this game has right now is just epic. Whoever wrote the code must be a magician and truly deserves a cookie. Really, the quality of the game impressed me that much. Must be an impression hardened by the contrast of Stellaris I played a lot before. Few thousand pops and ships - and that shitty game lags down with both GPU and CPU melting no matter how good your machine is. Paradox sucks cocks. Those freaks are only good in advertising. But Wube rules! So far. Hope they never turn into smg like Paradox.

What I am trying to say is IMHO there is no need to teach the devs how to write a code. Pretty obvious the fact they are truly professionals, surely a head above money-making incompetent chubakkas from bigger companies.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: What happened to Multithread?

Post by mrvn »

posila wrote: Mon Aug 19, 2019 11:51 am
mrvn wrote: Mon Aug 19, 2019 11:07 amWith the pipes now being split into fluid systems of connected pipes + fluid boxes it should be possible to do all the fluid systems in parallel. Is that already being done, still to come or never going to happen?
That is already done.
mrvn wrote: Mon Aug 19, 2019 11:07 amExcept there is only one thread doing all the game updates at 100% CPU speed and the others only wake up ocasionally to do some side business, like prepare the GPU for the next frame. So on a 16 core system factorio runs at about 120% core usage instead of 1600%.
Try to set "Max render threads" in Graphics settings to 1, to see how 120% core usage really looks like :)
I have. So once per frame 15 cores wake up, initialize some data for the GPU, trigger the DMA and go back to sleep till the next frame. That's the extra 20% spread over 15 cores.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14816
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: What happened to Multithread?

Post by Rseding91 »

mrvn wrote: Mon Aug 19, 2019 3:27 pm I have. So once per frame 15 cores wake up, initialize some data for the GPU, trigger the DMA and go back to sleep till the next frame. That's the extra 20% spread over 15 cores.
What?... if you set "max render threads" to 1 then it runs the prepare step in the main thread (the 1 thread already running). There are no 15 cores waking up in that case.
If you want to get ahold of me I'm almost always on Discord.
Adamo
Filter Inserter
Filter Inserter
Posts: 481
Joined: Sat May 24, 2014 7:00 am
Contact:

Re: What happened to Multithread?

Post by Adamo »

Rseding91 wrote: Mon Aug 19, 2019 6:33 pm
I think he's saying that this is what would be spread over 15 cores when the core number is set to max.
User avatar
BlueTemplar
Smart Inserter
Smart Inserter
Posts: 3234
Joined: Fri Jun 08, 2018 2:16 pm
Contact:

Re: Parrallel processing in games & applications

Post by BlueTemplar »

Yandersen wrote: Mon Aug 19, 2019 2:12 pm Jeez, seeing a topic like this surprised me as much as a cooler on my laptop which sometimes just stops and chills when I play Factorio. Honestly, the level of optimization this game has right now is just epic. Whoever wrote the code must be a magician and truly deserves a cookie. Really, the quality of the game impressed me that much. Must be an impression hardened by the contrast of Stellaris I played a lot before. Few thousand pops and ships - and that shitty game lags down with both GPU and CPU melting no matter how good your machine is. Paradox sucks cocks. Those freaks are only good in advertising. But Wube rules! So far. Hope they never turn into smg like Paradox.

What I am trying to say is IMHO there is no need to teach the devs how to write a code. Pretty obvious the fact they are truly professionals, surely a head above money-making incompetent chubakkas from bigger companies.
Well, in "defense" of Paradox, Stellaris uses the Clausewitz Engine*, first used in Europa Universalis 3 (2007!), which has support for Windows... 2000 !
Whereas Factorio (with 0.17) has a brand-new engine, and dropped 32-bit support a few years ago...

If you want to compare with performance-(self)-conscious developers, see Stardock.

*One might wonder why Paradox would still use such an ancient engine for new games... my guess is that they didn't want to risk launching a genre-breaking game with also a brand new, untested engine ? (Especially after what happened to Sword of the Stars 2...)
BobDiggity (mod-scenario-pack)
Adamo
Filter Inserter
Filter Inserter
Posts: 481
Joined: Sat May 24, 2014 7:00 am
Contact:

Re: Parrallel processing in games & applications

Post by Adamo »

BlueTemplar wrote: Mon Aug 19, 2019 9:10 pm
Possibly. Plus I really hate to hate on Paradox because they are one of the few who take linux support seriously.

Kudos to Wube for doing the same, of course.
mrvn
Smart Inserter
Smart Inserter
Posts: 5969
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: What happened to Multithread?

Post by mrvn »

Rseding91 wrote: Mon Aug 19, 2019 6:33 pm
mrvn wrote: Mon Aug 19, 2019 3:27 pm I have. So once per frame 15 cores wake up, initialize some data for the GPU, trigger the DMA and go back to sleep till the next frame. That's the extra 20% spread over 15 cores.
What?... if you set "max render threads" to 1 then it runs the prepare step in the main thread (the 1 thread already running). There are no 15 cores waking up in that case.
Ups, sorry. I skipped the "1" in that sentence. Somehow I read 'set "max render threads" to 15'.

I don't say that factorio will run the same with one core as with many. The extra threads certainly help. But it's far from making full use of all cores.
Adamo
Filter Inserter
Filter Inserter
Posts: 481
Joined: Sat May 24, 2014 7:00 am
Contact:

Re: What happened to Multithread?

Post by Adamo »

mrvn wrote: Thu Aug 22, 2019 9:21 am I don't say that factorio will run the same with one core as with many. The extra threads certainly help. But it's far from making full use of all cores.
It might be a difference in how you're using the phrase "full use". I think many, when they say "full use" or something similar, here, would mean that Factorio does as much parallel work as it can under the restrictions of the simulation, which is far less than completely using all available cores.
Vegemeister
Long Handed Inserter
Long Handed Inserter
Posts: 85
Joined: Sun Dec 04, 2016 9:18 pm
Contact:

Re: Parrallel processing in games & applications

Post by Vegemeister »

Rseding91 wrote: Wed Feb 15, 2017 2:26 am Entities aren't updated in the order they're allocated and additionally virtually all entities that are updatable have dynamically allocated member data or touch other entities/data sets they don't own.

So, none of that actually works in a real-world scenario where all things that do logic end up being dynamic and fragmented in memory.

When something isn't fragmented or has incredibly simple logic it's never something that is capable of doing any useful logic due to those limitations: not being able to touch memory outside of itself.
Rseding91 wrote: Wed Feb 15, 2017 6:07 pm
Zulan wrote:The question that arises: Are entities saved by geographic location...
Yes they are.
If you're not already doing it, what about allocating entities in space-filling-curve order? To keep performance from deteriorating over time, you might re-load the save on each autosave.
nafira
Fast Inserter
Fast Inserter
Posts: 107
Joined: Fri Mar 16, 2018 12:20 pm
Contact:

Multi-threading thread ;)

Post by nafira »

Hi guys,

I'm not very familiar with inside bottlenecks in Factorio, but I've kept in mind that the actual main problem is due to the event system on entities.
It's pretty strong, avoiding many bugs, but it creates a funnel of decision leading to mono-threading most of the activity.

Please correct me as soon as I'm assuming wrong. You know more than me on this matter.
I'm just here to understand how it works and maybe propose another approach. All ideas are welcome I think.
TL;DR
I'm thinking that we can transform that monster of UPC crushing machine single threaded, into smaller ones, by creating "links" between calculation threads, to exchange event info.
Read it if you have time and prepared yourself for a probable good laugh
So basically, the recent pathfinder algorithm and particle system, made me think of a potential improvement.
There's 2 types of events : entity events, and "whole map" events (like trains or bots in a certain way).
To keep it consistent across the whole map, I assume that there's a master thread doing all the work, and keeping it without flaws.

What if you consider the map as nodes with N*N size. The goal is to exchange events between adjacent nodes, making it possible to parallelize node calculation. The only work needed is to share events across nodes which is a [8][N] event object array (taking the 8 surrounding nodes).
Then two problem pops up :
  • How do you insert map-wide events (or multi-nodes, it's identical)
  • How do you deal with more than one node in distance
For the first one, I think it's safe enough to say that you can use a pointer to refer to all nodes. Just use a locking system, and it should be enough. Those are quite rare (I think ?) problems. And in either case, they can be processed very fast.
For the second, there's two options : use a N* algorithm to search for all targetable nodes (like for artillery) since it's often a range operation search. This will make the sharing array bigger, but only occasionally.


In the end, taking a 100*100 nodes, you'll have a [8][100] sharing event array, with four of them having only a size of 1 (corners). Inside the nodes, the main logic of the order of checks between local and map-wide events, can be processed, allowing multi-threaded decisions.
If I apply the logic to the actual map, wherever you are and whatever happens, it's all calculated in the same thread (mainly).
If you change node size to a 1000*1000, on a relatively small map (4000*4000), you can have 16 threads to separate calculation.


I'm surely missing something, because it's either too simplistic, or more consuming than the actual system (I've not made any calculations).
To do a parallel with some other systems, it's like going from a MDM system (Master Data Management), with logic locks at every step, whether or not it's needed.
My approach is more like having distinct API across Kubernetes nodes, communicating with an event messaging system like Webhooks, containing only info for neighbors and targets. and every Kubernete node can access the database with the same locking system, but only when needed.
It's transforming a big server optimized, into smaller and slower ones, but allowing parallelism.


For sure, if any of it make sense and is applicable, there a ton of orchestration coming, but since it's a bottleneck for many megabases starting @ 1kSPM, I think it's worth throwing ideas to make it better, even if I'm totally wrong on the actual system and making myself ridiculous.
I think mistakes helps more than victory in computing, and at worse I would have learn something :)

My idea came from imagining a 1000*1000 map filled with inserters moving items. If you reduce that to some info back and forth between smaller nodes, you can calculate inserters behavior in separate threads. An inserter can only exchange info with the interface with a neighbor. Else, it can calculate itself inside a thread, it don't need to be calculated with the one 100 meters away in another node. It's simplified, but it's the idea.



Thank you for your reading

If you have ideas, questions, we can try to gives the team ideas to get rid of this mono-threading problem. Every angle can help, even the dumbest ones, I've learned that.
Koub
Global Moderator
Global Moderator
Posts: 7955
Joined: Fri May 30, 2014 8:54 am
Contact:

Re: Parrallel processing in games & applications

Post by Koub »

[Koub] Merged into the older discussion about how the devs should multithread the game.
I suggest you read the whole thread, especially the devs' answers.
Koub - Please consider English is not my native language.
Post Reply

Return to “General discussion”