Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
Post Reply
BenSeidel
Filter Inserter
Filter Inserter
Posts: 543
Joined: Tue Jun 28, 2016 1:44 am
Contact:

Parrallel processing in games & applications

Post by BenSeidel » Sun Jan 15, 2017 7:41 am

So I have seen quite a few posts around the place that simply state that "factorio is too complex to multi-thread" and that "Multithreaded code is hard" and other incorrect statements like that. I have even been accused of trolling because I say that Factorio can be multithreaded easily. Don't get me wrong, it may appear like multithreading is hard, and that Factorio is just not multi thread compatible as it's the dogma that the industry has been spouting since the dawn of time. However it's incorrect. I thought I would spend some time posting about not only some techniques for dealing with asynchronous code, but also a brief discussion about the current techniques of coding and the issues associated with them.

First, Why read the post(s)?
I have spent the last 6 years researching and writing a dynamic application whose structure is not known at compile time. The current implementation is capable of executing on a significant number of cores, not relying on pure CPU clock speed but more on memory transfer rates. I can't go into the exact details of the application, but I can discuss the lessons I have learnt along the way.

So, Multithreading is easy. Just ask any functional programming guru. Haskell supports as many cores as you can throw at it. There are even runtimes that will distribute your application over clusters of computers. For those that are wondering, Haskell is turing complete. If Factorio was written in Haskell then it would be using all the cores on everyone's machines. Unfortunately, functional programming is terrible, just ask any procedural programmer (Note that procedural programming is the binary opposite of functional programming, not a "step-down" or alternate to object-orientated programing). Now, don't get me wrong, procedural programming is just as bad as functional programming. While they both have their strengths, they both have their weaknesses. Inserting an item into an array is trivial in a procedural language but extremely expensive in a functional one. Multithreading comes naturally to a functional language, but is extremely difficult in a procedural one. We have two extremely different, incompatible techniques at our disposal that solve different issues but we are unable to use them both. So what do we do? We suffer.

Reactionary programming to the rescue!
Yes that's right. Reactive programming. As embodied in RX.net and the LINQ programming techniques put out by Microsoft over the last 5? or so years. With it's roots back in a paper published in 1997 entitled "Functional Reactive Programming", Reactive programming is supposed to merge the functional world with the procedural world. Unfortunately it doesn't. It's crap. Don't touch it with a 10 foot pole. You may think that this is my opinion, and you can think that. But you will eventually realise that it does not work when you try to use it. Reactionary programming leads you to believe that you get all the benefits of functional programming: multithreading in a thread-safe manner while keeping the mutability of your data. Unfortunately that's not the case. Issues with event propagation orders and mutability of data start cropping up in all but the most trivial examples. Why? Because you cannot have thread-safe processes WITH mutable data.

Oops... What am I saying. Of course you can. That is what semaphores are used for right? Well not really. Current techniques for controlling access to memory involve ensuring that only one thread can write to that memory at a time while no others can read from it or ensuring that the result of a process being run twice gives the same results. If you only have one thread accessing your data then it's not multi-threaded. Blocking operations, while not an issue on a single core machine, fail when put onto multi-cored machines. There are non-blocking algorithms for many tasks, but you usually find them hidden away somewhere within these higher-level memory access control structures, or are specific to one use-case.

So what do we do? Do I have some new technique that no one knows about? Hell no, It's been around for decades. If you want to program asynchronously, then all you have to do is ask the experts: CPU designers. Every component on the chip executes its function independently of the others. What do you think the pipeline is? So how do they keep this all in check? How are they able to keep their mutable state from destroying their concurrent execution of the various parts of the CPU? They use a clock. The clock synchronises all the executions within the CPU so that information can be moved from one area of the chip to another area of the chip. At no other time does information get moved through these "clock boundaries". In this way they can have memory holding state between executions, yet still allow each process to be executed concurrently.

Programming with a clock:
Yes, you need a clock. However, a clock is not a literal clock or a time-piece. A clock is simply some point at which you say "I am done, here is my result", except that everything running must say it at the same time. Clocks allow separate, asynchronous pieces of code the ability to synchronise. It's the time where one thread publishes its results of its previous task and takes the input it needs to do its next task. Clocks are the key to high-level asynchronous programming.

How does this apply to Factorio? and games in general? Well, they have a natural clock - the frame update cycle. This is a natural starting point to implement a clock. At the end of each frame-rendering (or in the case of Factorio, the game update cycle) each process can say "I have finished with my input, and here is my output". If we look at what occurs in a chip, we would see the output bits being copied across the clock boundary, being placed in the output lines of the chip component. As these output lines are connected to the input lines of other components, they write to all the input memory of the dependant components. As we don't want to copy all our memory we can use a tick-tock memory structure to get the same effect by redefining what is the "output line" memory the same way double-buffered video memory works.

So what is left?
To define an algorithm that takes the "output lines" memory to generate the next state of the game that can be used as the next cycle's "output lines".

That is it. Using a clock you can develop applications that can use thousands or millions of cores, in the same way a CPU has millions of components. Once these components are defined, you put them in a list, divide the lists up over your threads and iterate. It may seem like it's too good to be true, and for the most part it is. It's about as good as object orientated programming. Sure it took you some time to get your head around the ideas behind it, and sure after years of it being the industry standard you are beginning to realise that it's not a one-stop solution for all your issues, but by hell does it solve a specific subset of the issues you are facing. Clock based programming is exactly the same. It does not remove the need for semaphores and locks, they are there to solve an entirely different issue. It only helps with getting your program to run over many threads, at a reasonably high level in your code.
Edit:
The following section seems to be a major sticking point in the rest of the thread.
For those that don't know what example means: the dictionary definines it as "A thing characteristic of its kind or illustrating a general rule". As stated within the first line of the following, It is an example, not the way it should be/will be/is the best way to/is viable, etc implementing it.
Applying it to factorio specifically I will use an example of a chest with 2 inserter arms, both dropping items into the chest.
The chest is given the previous states of the 2 inserter arms. It checks if the input arms are ready to drop an item. If they are then the chest calculates the amount that the inserter arms can drop and where the items will go. If there is a tie then some predefined, deterministic tiebreaker is used to pick what inserter arm will drop the item. The chest then writes it's new contents to it's "back buffer", leaving its previous state - the state that all the other processes can see, the same. In a separate execution, possibly on another thread, the inserter arm is updated. It is given the previous state of the chest and the previous state of all inserting inserters. It does the same calculation to determine if it is able to drop any items in the chest. If it is able, it calculates its state including the amount currently being held (it may not have dropped them all), writing them to its "back buffer".

This is an extremely basic example of how you would code a clock-based factorio. It's not optimised at all and does not take into account possible low-level synchronisation that may be able to be made to increase performance. For example, it may be possible to merge the inserter component and the chest component into a more complex component that updates both the inserter arm and chest at the same time. It's also possible to use low level cross-thread synchronisation to attach and detach the inserter from the amalgamated component as the arm swings back and forth between the chest and its pickup location, or to use a CAS on some part of memory to write the drop count to cache it for the other component's execution. It may also be possible to eliminate the double-buffering of the component's state IF it's possible to ensure that that information is not required in another update process, etc.

But anyway, Clocks (and clock boundaries) are an extremely powerful technique that I have never seen applied in software. If any programmers out there spend some time thinking about it then they should be able to see how it would allow an application to compartmentalise its functionality to be executed across as many threads as needed. If you think that it can't be done or that software is just "too complex" then I implore you to learn VHDL and see what chip designers have been doing since the the 1980's.

There is one issue with this style of programming though: The industry does not support it. By this I mean that the CPU designers are focused more on micro-parallelisation such as the out-of-order instruction processing, the vector extension operations, branch prediction and all the other nifty things that make your sequential program run faster. CPU's with a large number of cores have been coming for over 15 years now, but we really only see 4. Not until mainstream games require at least 16+ cores will you see Intel making them standard. They follow where software developers lead.

If you have read all that then congratulations.
Last edited by BenSeidel on Tue Jan 24, 2017 2:40 am, edited 1 time in total.

User avatar
DaveMcW
Smart Inserter
Smart Inserter
Posts: 2797
Joined: Tue May 13, 2014 11:06 am
Contact:

Re: Parrallel processing in games & applications

Post by DaveMcW » Sun Jan 15, 2017 8:23 am

BenSeidel wrote:I have even been accused of trolling because I say that Factorio can be multithreaded easily.
+1

It's certainly possible to multithread Factorio, but it's not easy. The #1 priority is perfect determinism, and multithreading is not allowed to break that.

Efficient multithreading also requires compromises. For example, the brownout system is a nightmare to multithread. It would be better to eliminate it and use a simple 100% / 0% power network. But there is a lot of inertia to make such a big change.

Daid
Fast Inserter
Fast Inserter
Posts: 162
Joined: Sun Jul 03, 2016 7:42 am
Contact:

Re: Parrallel processing in games & applications

Post by Daid » Sun Jan 15, 2017 9:20 am

For the TL;DR version for people:
Making factorio multi threaded capable is easy, it just has to be rewritten in a way that 90% of the software engineers in the world are not trained in. But this happens to be my specialized field, so it is easy.

I've met you. Well, not you, but people like you. And I can only say one simple think to you. Get out into the world. You have no real software experience. You never worked with the 80% of the software engineers that are out there. You've only worked with the 10% in your specific field, which are really smart people. The rest of the people that actually build the software in this world are not as smart as you are. Feel good about that part, but remember that the rest of the world also needs software. You cannot build everything for the whole world with those 10% people.


Who am I? I'm just a simple software engineer with ~15 years of experience. Who has done everything from maintenance on horrible projects, to starting new products from scratch. I've worked with 1 man teams, to 10 man teams. People that work with me generally say that I'm smart. I'm not. I'm just practical. I know how to get from A to B in a decent way that most people will understand. I'm pretty decent a practical implementations of technology.
(My biggest visible achievement is that I did a revolution on 3D printing software as a 1 man team a few years ago. Which was a big contribution to our company growing from 15 people to 250 in 3 years)

evildogbot100
Fast Inserter
Fast Inserter
Posts: 138
Joined: Sun Dec 18, 2016 3:02 pm
Contact:

Re: Parrallel processing in games & applications

Post by evildogbot100 » Sun Jan 15, 2017 10:52 am

The rest of the people that actually build the software in this world are not as smart as you are.
I think the devs are that smart. They have Russian programming champion working on optimization.


User avatar
Deadly-Bagel
Smart Inserter
Smart Inserter
Posts: 1493
Joined: Wed Jul 13, 2016 10:12 am
Contact:

Re: Parrallel processing in games & applications

Post by Deadly-Bagel » Sun Jan 15, 2017 3:53 pm

BenSeidel wrote:So I have seen quite a few posts around the place that simply state that "factorio is too complex to multi-thread" and that "Multithreaded code is hard" and other incorrect statements like that. I have even been accused of trolling because I say that Factorio can be multithreaded easily. Don't get me wrong, it may appear like multithreading is hard, and that Factorio is just not multi thread compatible as it's the dogma that the industry has been spouting since the dawn of time. However it's incorrect. I thought I would spend some time posting about not only some techniques for dealing with asynchronous code, but also a brief discussion about the current techniques of coding and the issues associated with them.

First, Why read the post(s)?
This is as far as I got before rolling my eyes and scrolling past.

I appreciate that you've put a lot of time and effort into your post but you seem to be implying that the devs are either lazy or incompetent. Seriously, have you even looked at the game? How stupidly efficient it is? If Minecraft tried to do something on the same scale as the average Factorio factory, there's probably only a handful of computers in the world that could run it at more than a few frames per second, if any at all. And then they got it running for over 400 people in a single game. That is how stable and accurate the code is while still being more efficient than pretty much any other game in existence.

And it's not like they flat out say "no we can't multithread", they've gone into detail that some parts of the code just can't be multithreaded, or could be but would require a total rewrite of huge parts of game logic for a small benefit. Then there's cases where having to manage the multiple threads to prevent them from conflicting or desynching actually costs more time than is saved by multithreading. These sort of tasks prevent anything else from being faster, and there's no point in multithreading a task if it then needs to wait for something else.

Even forgetting all that, have you even seen the game code? How do you know how easily something can be done if you don't know what it is exactly that needs to be done, or even how it currently works. "Multithreading can be done in Factorio easily." You're probably being accused of trolling because the alternative is that you're arrogant and ignorant. The game is already multithreaded, you'd know that if you read the above linked FFF. Was it easy? I don't know, I wasn't involved, but chances are any "easy" multithreading has been done already, and AFAIK they're working on doing more.

So I'll assume you just didn't know that and got a bit high and mighty for a moment (happens to everyone) because I like giving people the benefit of the doubt. You should try it some time.
Money might be the root of all evil, but ignorance is the heart.

greep
Fast Inserter
Fast Inserter
Posts: 108
Joined: Mon Apr 25, 2016 3:12 am
Contact:

Re: Parrallel processing in games & applications

Post by greep » Sun Jan 15, 2017 4:08 pm

Well, I think the reason mainstream games don't care much or at all about parallel processing is because I can't see it mattering much in games. People can spend days/weeks working on a ~2x speedup throughout their game with multithreaded solutions that can be both buggy and difficult to debug, or they can just cut/simplify/invent a new solution for this one part of their game for a 100x speedup. It's not like a real life physics simulation that has a rigid correct answer and must be done in a very specific way. Unsurprisingly, in those scenarios you see parallel processing everywhere.

An example of factorio would be trains. Trains massively speed up their calculations for transporting materials in a way that absolutely dwarfs any multithreading calculations they could do with conveyor belts or bots.

And sure they can spend time optimizing stack inserters and bots to squeeze out another doubling in performance. Or they can make everything 10x faster by just implementing higher tier stack insterters and bots that have 10x/10x cost/stack size, add a new tint and name, and call it a day. Speaking of optimization, guess which one is more optimal by several orders of magnitude for a developers time? I'll give a hint: it's not multithreading xD

Yoyobuae
Filter Inserter
Filter Inserter
Posts: 310
Joined: Fri Nov 04, 2016 11:04 pm
Contact:

Re: Parrallel processing in games & applications

Post by Yoyobuae » Sun Jan 15, 2017 5:06 pm

How much time and effort has been spent on improving the single thread performance of Factorio? How much of that time and effort will need to be thrown away in order to get a net positive performance gain with multithreading?

Maybe going all out multithreading right now might imply throwing away years of work due to code rewrite. There's an inherent quality that comes to code which has been polished over the years, and Factorio certainly has a lot of that going.

Masterfox
Long Handed Inserter
Long Handed Inserter
Posts: 58
Joined: Wed May 18, 2016 5:24 pm
Contact:

Re: Parrallel processing in games & applications

Post by Masterfox » Sun Jan 15, 2017 5:13 pm

Deadly-Bagel wrote:
BenSeidel wrote:So I have seen quite a few posts around the place that simply state that "factorio is too complex to multi-thread" and that "Multithreaded code is hard" and other incorrect statements like that. I have even been accused of trolling because I say that Factorio can be multithreaded easily. Don't get me wrong, it may appear like multithreading is hard, and that Factorio is just not multi thread compatible as it's the dogma that the industry has been spouting since the dawn of time. However it's incorrect. I thought I would spend some time posting about not only some techniques for dealing with asynchronous code, but also a brief discussion about the current techniques of coding and the issues associated with them.

First, Why read the post(s)?
This is as far as I got before rolling my eyes and scrolling past.
[...]
If Minecraft tried to do something on the same scale as the average Factorio factory, there's probably only a handful of computers in the world that could run it at more than a few frames per second, if any at all. And then they got it running for over 400 people in a single game.
[...]
You're probably being accused of trolling because the alternative is that you're arrogant and ignorant.
[...]
Well, if you read it, you would have found a good example, but reading the beginning only and ignoring anything else is just so much easier, right?
On the other side, Minecraft does have Factories of this size, although only with mods. And even though mods are going to be less efficient then native code, it works just fine. So yeah, in SP, this is definetly false. In MP, sure. Minecraft MP is ... bad, simply put, severs require an absurd amount of RAM and Processor time to work well when modded, even with few people, because it was not thought for big servers.
So yeah, I guess it is pretty clear who is arrogant and ignorant in that post. Hint: If you read on, you see he is a developer himself.

henke37
Long Handed Inserter
Long Handed Inserter
Posts: 85
Joined: Mon Jul 18, 2016 5:43 pm
Contact:

Re: Parrallel processing in games & applications

Post by henke37 » Sun Jan 15, 2017 7:04 pm

The issues with the proposed "clock" system are as follows:

Buffers aren't free. You are basically asking for the complete game state to be duplicated. My computer is struggling as is with the current memory use of Factorio.

You think in silicon, not software. Silicon is great in that it can have dedicated circuitry for each task. And you don't need to worry about task switching. Software is the opposite, you don't get dedicated processing units and you do have to worry about task switching. With factories in Factorio having tens of thousands of entities to update, there is a very notable overhead here. And that's on top of the issue of avoiding work. Factorio is doing its best not to update entities that don't need to be updated.

Xeanoa
Fast Inserter
Fast Inserter
Posts: 188
Joined: Tue Apr 26, 2016 4:32 pm
Contact:

Re: Parrallel processing in games & applications

Post by Xeanoa » Sun Jan 15, 2017 10:01 pm

henke37 wrote:Buffers aren't free.
Memory is almost free. 4 GiB of high-performance memory (DDR4-3200 CL15) are like 20-30 Euro. Nothing should be constrained by memory these days, even phones come with 6-8 GiB nowadays.

quinor
Filter Inserter
Filter Inserter
Posts: 390
Joined: Thu Mar 07, 2013 3:07 pm
Contact:

Re: Parrallel processing in games & applications

Post by quinor » Sun Jan 15, 2017 10:35 pm

henke37 wrote:The issues with the proposed "clock" system are as follows:

Buffers aren't free. You are basically asking for the complete game state to be duplicated. My computer is struggling as is with the current memory use of Factorio.

You think in silicon, not software. Silicon is great in that it can have dedicated circuitry for each task. And you don't need to worry about task switching. Software is the opposite, you don't get dedicated processing units and you do have to worry about task switching. With factories in Factorio having tens of thousands of entities to update, there is a very notable overhead here. And that's on top of the issue of avoiding work. Factorio is doing its best not to update entities that don't need to be updated.
I might have an answer to that: it's copy on write. One of smarter tricks I've seen done by computers :). Also, the game state is already duplicated if I am correct. It's a couple of tens of megabytes only, so not such big deal.

Yoyobuae
Filter Inserter
Filter Inserter
Posts: 310
Joined: Fri Nov 04, 2016 11:04 pm
Contact:

Re: Parrallel processing in games & applications

Post by Yoyobuae » Sun Jan 15, 2017 11:18 pm

psorek wrote:I might have an answer to that: it's copy on write. One of smarter tricks I've seen done by computers :). Also, the game state is already duplicated if I am correct. It's a couple of tens of megabytes only, so not such big deal.
From what I understand Factorio is memory bandwidth limited. It also writes data a lot. Any additional copying will eat up memory bandwidth, and there will be more copying since memory is modified a lot.

BenSeidel
Filter Inserter
Filter Inserter
Posts: 543
Joined: Tue Jun 28, 2016 1:44 am
Contact:

Re: Parrallel processing in games & applications

Post by BenSeidel » Mon Jan 16, 2017 6:07 am

Well the responses are far better than I normally receive.

@David
I understand why you are saying that, and that would have been true except for the last year when we have employed a team of "regular" programmers. While there was a period of time to adjust their thinking they pretty much all got it in the end. The only ones that really didn't were the couple saying "it should be web deployed" (it's not an exaggeration). It's not some "specialised" field I am working it, it's a simple desktop application using a multithreaded UI.

It's really about a big a change from procedural programming to object orientated programming. It's not a change of what needs to be executed, but a change of where it is being executed and the appropriate constructs that need to be used to get it all to work. With a good library that automatically handles the tick-tock read & writing, the code should look virtually identical. If it had it's own language constructs then it would be like creating a "class" and annotating it's function with "public" or "private".

greep wrote:Well, I think the reason mainstream games don't care much or at all about parallel processing is because I can't see it mattering much in games. People can spend days/weeks working on a ~2x speedup throughout their game with multithreaded solutions that can be both buggy and difficult to debug, or they can just cut/simplify/invent a new solution for this one part of their game for a 100x speedup. It's not like a real life physics simulation that has a rigid correct answer and must be done in a very specific way. Unsurprisingly, in those scenarios you see parallel processing everywhere.
That is usually the response I get from people. This is like saying "games don't need a new graphics processor, a 3D rage card should be fine". If you double your processing then you can double your polygon count. I'm not saying that you don't spend time in your code getting the 100x speedup through your optimisations, I'm saying why would you not write your code so that you can say "the code is fine, it's your computer that is slow". That's the way I remember the game industry being in the 90's. Also the physical simulations are multithreaded because they have MONEY. If you had the cash and asked Wube write Factorio so that it would run on your 10,000 machine datacenter then I think they would write it for you.

I believe the real reason why the mainstream doesn't care about multithreading is because of the view that it is difficult to implement, full of bugs and hence is expensive to develop. Well I say poo to that! The issue is the mentality of the developers who have spent their years thinking "we do this, THEN this, THEN this..." instead of thinking "I have these things to do... go do them". The rest is just a decent library (admittedly the biggest sticking point). There are other "apparent" issues, such as the memory bandwidth issue, but these are actually symptoms of the underlying issue that we still don't even want to consider that multithreading is a requirement.
Yoyobuae wrote:How much time and effort has been spent on improving the single thread performance of Factorio? How much of that time and effort will need to be thrown away in order to get a net positive performance gain with multithreading?
Any efforts spent on increasing the speed of one thread has the multiplicative benefit of increasing the speed of all the threads! With a multithreaded program the incentive is even larger to make optimisations in your code! It's all in the way you look at it. A bad multithreaded implementation is still a bad implementation. The only difference is that now you can wait 6 years for the number of cores to increase by 16x. (Moore's law)
henke37 wrote:My computer is struggling as is with the current memory use of Factorio
Mine isn't and I only have 6gb of ram. Factorio for me tops out at about 2gb, so less than a third of the total (about half of the available). I think you mean that it's struggling with the memory bandwidth. Running with multiple threads increases your memory bandwidth on some (most?) chips. Each NUMA node has its own memory channel(s), so even if you use two threads you will double your memory throughput. If the pressure on Intel & AMD was to increase the amount of memory transferrable to each chip independently, that is what you would get. Instead it's all about getting maximum throughput on a single thread.

To answer all the issues with the memory bandwidth issue and the double buffering: I don't think it would make a difference. Psorek had it correct, the copy on write functionality makes the entire system relatively cost-free. The copying is not done until a write is done. I think the CPU's are even smart enough to integrate the COW functionality into the cache-writing if you are able not to call a main memory synchronising function. I'm not an x86 instruction expert, so if anyone actually knows the real cost of copying a memory page then let me know. I do know that two virtual memory addresses can actually point to the same physical memory address until one thread writes.
DaveMcW wrote:Efficient multithreading also requires compromises. For example, the brownout system is a nightmare to multithread. It would be better to eliminate it and use a simple 100% / 0% power network. But there is a lot of inertia to make such a big change
I'm not sure why the brownout system should be difficult to implement. There is already a natural ordering of processing where the amount of electricity used is calculated THEN the entities are processed. If there needs to be a two or three phase process within a frame to update the system then just have that many clock cycles, one each to break up the different phases of the frame update.

Such changes to concurrent processing should not require a massive amount of code to be changed. There is no requirement for the entire application to be re-written, only the section that is chosen. The function call to start the process should look exactly the same as should the result. Specific aspects may be made asynchronous independently of the other areas of the code. I understand that they are rewriting the fluid mechanics, so why not take this opportunity to do it asynchronously?

Anyway,
Most of the issues facing the industry at the moment are self-imposed. There are no libraries or development going into getting multithreading up-to-scratch. It's been decades now that we have known of the upper limit of CPU speeds and almost a full decade where that limit has been reached but still a multithreaded program is the exception, not the norm.

As always, if anyone has any questions or comments I'm happy to discuss them.
If anyone has a project and they think that this technique would be appropriate I am happy to help.

I also need to say that when I use the term "multithreaded" I don't mean running on 2,3 or even 4 threads, I mean that it will run on as many cores as it can, one for each inserter if you have enough of them.

hoho
Filter Inserter
Filter Inserter
Posts: 673
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Parrallel processing in games & applications

Post by hoho » Mon Jan 16, 2017 9:06 am

@OP, you didn't address the need of perfect determinism in your response to criticism of your original post.

Sure, you could "double buffer" the game and split the computations but you still *have* to ensure perfect determinism. E.g you can't have inserters moving stuff from the same chest be handled by different threads and when you are already building in "defenses" for such occasions, that's extra work on top of what you already have to do anyway. The final merge of the results from different threads also can't be parallelised.

E.g take a situation with a chest that holds 2 items and 3 that take away 1 item each. There must be 2 inserters that get an item to pull out of the chest and one that doesn't. If you have those inserters handled by different threads, how do you ensure that on each of the networked computers, it's always guaranteed to be the same inserter that won't get to have an item?

Daid
Fast Inserter
Fast Inserter
Posts: 162
Joined: Sun Jul 03, 2016 7:42 am
Contact:

Re: Parrallel processing in games & applications

Post by Daid » Mon Jan 16, 2017 10:47 am

I believe the real reason why the mainstream doesn't care about multithreading is because of the view that it is difficult to implement, full of bugs and hence is expensive to develop. Well I say poo to that! The issue is the mentality of the developers who have spent their years thinking "we do this, THEN this, THEN this..." instead of thinking "I have these things to do... go do them". The rest is just a decent library (admittedly the biggest sticking point). There are other "apparent" issues, such as the memory bandwidth issue, but these are actually symptoms of the underlying issue that we still don't even want to consider that multithreading is a requirement.
A) That's because most things ARE do this, do this, do this. Where every "do action" depends on the results of the previous "do action".
B) Creating multi threaded code isn't that problematic. Debugging and maintaining is.
C) 90% of your execution time is spend in 10% of your code. Making the other 90% of the code more complex and harder to debug to optimize it, is just stupid.

User avatar
Deadly-Bagel
Smart Inserter
Smart Inserter
Posts: 1493
Joined: Wed Jul 13, 2016 10:12 am
Contact:

Re: Parrallel processing in games & applications

Post by Deadly-Bagel » Mon Jan 16, 2017 1:56 pm

Masterfox wrote:Well, if you read it, you would have found a good example, but reading the beginning only and ignoring anything else is just so much easier, right?
On the other side, Minecraft does have Factories of this size, although only with mods. And even though mods are going to be less efficient then native code, it works just fine. So yeah, in SP, this is definetly false. In MP, sure. Minecraft MP is ... bad, simply put, severs require an absurd amount of RAM and Processor time to work well when modded, even with few people, because it was not thought for big servers.
So yeah, I guess it is pretty clear who is arrogant and ignorant in that post. Hint: If you read on, you see he is a developer himself.
When you have a one year old that is both intelligent and energetic, who hates sleeping to start with and then gets a cold, two page reports of technical stuff are a bit hard to get into, yeah.

I've played Minecraft with those sort of mods and it's not great. Native code isn't efficient either, it gets by because it only processes within a certain radius of the player and effectively speaking it's not that far. Enough for vanilla, but for these sort of contraptions it's laughable. With a quarry running, pumping lava from the nether, and importing ore from the moon (using keep-alive beacons), I would often get sluggish performance not to mention a lot of weird problems that probably stem from poor determinism (items popping out of pipes is one that comes to mind). That's only two outposts and a base, and not even anything in between as it's all teleported. Never had any problems with Factorio and I've had some decent factories. Anyway, confused as to what you're trying to get at here, are you arguing Factorio isn't ludicrously efficient or...?

And I know he's a developer, most of the ones I know are arrogant and ignorant, but as I said I'm giving him the benefit of the doubt.

So if you're implying I am ignorant, let's flatten that idea and have a look at his example. Yep, he's made assumptions about how Inserters are processed and suggested basically totally rewriting their logic without addressing several problems I can think of off the top of my head, and I barely classify as a developer. He's suggested merging the Inserter with the chest, even forgetting you can have, what, 15 Inserters into a single chest using mods, Inserters pull from inventories too which involves the exact same problems. Do you merge them too? Then the Inserters inserting into those inventories, and the inventories those Inserters are pulling from, then you've got Logistics bots, trains, circuit logic, power networks... It's not as simple as he is making out.

To be clear I'm not saying I know how it all works, only the Factorio devs know that so they're the only ones who can really comment on the feasibility of such projects. Oh what do you know, they've commented that they can't just split everything into multiple threads without breaking the whole game.

So are you implying that the devs are incompetent, or lazy? I'm curious.
Money might be the root of all evil, but ignorance is the heart.

BenSeidel
Filter Inserter
Filter Inserter
Posts: 543
Joined: Tue Jun 28, 2016 1:44 am
Contact:

Re: Parrallel processing in games & applications

Post by BenSeidel » Tue Jan 17, 2017 3:38 am

hoho wrote:@OP, you didn't address the need of perfect determinism in your response to criticism of your original post.
Hmm... because I don't see that there is any indeterminism. Your question about the two inserters removing things from the same chest was addressed in my first post, so I'm not sure what you are asking. The idea is to have a set of inputs that is not mutable for the duration of the asynchronous processing. This is achieved through the double buffering and the use of the clock to swap the buffers. If you have two inserters taking from a chest, buffer enough information so that they both don't take too much from the chest. Give this same buffered information to the chest update and the chest can see how much has/will been removed from it allowing it to set it's next state, just the same as the inserters.
hoho wrote:The final merge of the results from different threads also can't be parallelised.
That is based upon the only currently available "stable" style of multithreading. "I have a list, I want to add it all together", or "I have a list, I want to transform it in some way". If you do that then yes, you have to have one thread merging the results. However, this is not what I am suggesting. I am suggesting that the application be broken down into areas, each with a clock-controlled memory bound. These areas, called "Components" in VHDL are your "execution contexts" or "chunks" or "I-have-no-idea-what-to-call-thems". These are added to a list and then your threads are simply told to "sic-em boys". We are able, and have been for a long time, to have globally shared resources such as lists that can be modified by many threads at once. Use them for your "To be processed" and "Have been processed" lists. Once everything is moved over, increment your clock (flip the buffers) and start again.
Daid wrote:A) That's because most things ARE do this, do this, do this. Where every "do action" depends on the results of the previous "do action".
Yes, to a certain extent. I'm not saying that you can completely remove this from your application. This is a technique that is applicable to CERTAIN aspects of an application. There is no way to process frame 10 of the game without processing frames 0-9 first. This is evident in the CPU's as well. An instruction MUST be decoded first before it can be executed. The issue is that we are unable (don't have the tools/experience) to identify these boundaries. As I said, it's about as big a change as it was going to object orientated programming . There it was about the encapsulation of closely related data and the functions that are executed on it. Here we are talking about "execution contexts" or some other catch-phrase. All you have to do is identify the data needed as the input to the state update, package it up with the function to be executed, and we have a component that can be executed asynchronously.
Daid wrote:B) Creating multi threaded code isn't that problematic. Debugging and maintaining is.
Yes, this was also true with global variables back in the days when C ruled the earth. We invented a technique for handling it: object orientated programming. I am putting forward my experience with a technique that can be used to encapsulate the multi threadable aspects of an application so that we don't face these issues as much as we currently do. I'm not saying it's a magic bullet, but it does help.
Daid wrote:C) 90% of your execution time is spend in 10% of your code. Making the other 90% of the code more complex and harder to debug to optimize it, is just stupid.
Only multithread the hotspots? I think you see this as "Throw out your old code" where as I am saying "Look it's C++, It works with your old C code".
Deadly-Bagel wrote: And I know he's a developer, most of the ones I know are arrogant and ignorant, but as I said I'm giving him the benefit of the doubt.
I'm not a software developer. I'm a forklift driver :)
I've also had that experience with a few software developers so I can understand your point. Saying that though, I'm the only software developer that I know that will never tell you that there is only one way of doing things. We are artists.
Deadly-Bagel wrote:So if you're implying I am ignorant, let's flatten that idea and have a look at his example. Yep, he's made assumptions about how Inserters are processed and suggested basically totally rewriting their logic without addressing several problems I can think of off the top of my head, and I barely classify as a developer. He's suggested merging the Inserter with the chest, even forgetting you can have, what, 15 Inserters into a single chest using mods, Inserters pull from inventories too which involves the exact same problems. Do you merge them too? Then the Inserters inserting into those inventories, and the inventories those Inserters are pulling from, then you've got Logistics bots, trains, circuit logic, power networks... It's not as simple as he is making out.
I think the only assumption I have made about the inserters, in my example at least, is that they are trying to pick up and item in the chest. I have simply suggested a way that the execution contexts can be set up to get this behaviour. I have absolutely no idea how the current code is structured or written, I am only showing - in an absolutely un-optimised way, how you would structure this particular update phase of the game cycle IF you write it using this style. I have no idea if there are constraints that would prevent this from being written without all the current code being dumped and it all started from scratch. Maybe there are data unions all over the place or maybe there are global lists that need to be processed in a specific order, I simply don't know, nor do I claim to.

The merging of these contexts was a possible optimisation. There is no way of knowing if it would increase or decrease performance. If you choose to do this, then yes, you would put all 15 inserters into the one context and process them as a group, but only when they are waiting to drop off or pick up an item. The same could be done for belts. Merge the belts & all inserters needing to pick up or drop onto the belt into one execution context. During the swing phase of the arm, don't have it attached to another context, have it in it's own. On the clock-cycle before it will need to try to pick up an item, re-merge the contexts.

It is as simple as I am making it out, at least in terms of the technique. You seem to be getting the effort required for a complete rewrite confused with the effort to learn and use the technique for new code. At no point have I ever said that all the code in factorio should be rewritten, far from it. They are running a business, they need to make money and for that they need to get the game finished. All I have said is that IF Factorio was to be rewritten to be truly multithreaded than it's not as hard as everyone thinks that it is. It's still hard, but not climb Mt Everest hard, more learning to read hard.
Deadly-Bagel wrote:To be clear I'm not saying I know how it all works, only the Factorio devs know that so they're the only ones who can really comment on the feasibility of such projects. Oh what do you know, they've commented that they can't just split everything into multiple threads without breaking the whole game.
That is generally the feeling that most people have when they are asked to multithread their code. It is my belief that it is because they are lacking the appropriate tool to do it. I am offering my opinion that a clock-based system is the appropriate tool for this particular multithreading issue.
Deadly-Bagel wrote:So are you implying that the devs are incompetent, or lazy? I'm curious.
I don't know about Masterfox, but I certainly don't think so. Were people incompetent when they didn't know that electricity and magnetism are the same force? No, but once they did, just look at all they have achieved.


This technique is really easy to integrate into an existing discreetly update application like Factorio. As the main game loop is essentially the clock generator, we can naturally start there.
Firstly, you need to identify what sections you are going to re-write to be asynchronous. As the fluid mechanics are being redone, I'll start with that example (in hope that maybe they might make a new branch and give it a quick try).

The changes to the main loop are minimal. At the beginning of each update cycle, swap the buffers over. Initiate the multithreaded section of the code and then execute the current body of the update loop. Depending on the current code user inputs may be executed before the entity state updates. I'm not sure on this, so I'm just going to wave my hands and say "User input happens somewhere", probably in a sequential manner either before or after the entity update cycle.

In defining the updates of the fluids, I would start with an execution context of a "pipe". A pipe is a collection of all the pipe entities where fluid can flow between them, that is they are connected. The inputs to the context would need to be the current amounts of fluid in each pipe entity, flow directions and amounts being inserted or removed. So this data needs to be coded using the double buffer.

To do this we create "buffered" variables. A buffered variable is simply a variable duplicated, one for reading, on for writing. So, the current amount of fluid in the pipe becomes pipe.contents[Global.readBuffer] while the newly calculated amount of fluid in the pipe is written to pipe.contents[Global.writeBuffer]. The value can be read by the entities that are inserting or removing from the pipe segment safely in another thread independently of the update process of the pipe. So when it comes time to update the chemical factory in the current loop, it can see if there is enough liquid in the pipe, and update its state.

Now, here we come to an issue, and something that I need to point out. A rewrite of one section will have a small impact on the definitions of other areas of the application, namely that things read from your new component must also be controlled by the clock. But again, it's not a major change. In the case above the current amount of liquid in the factory as well as possibly the maximum amount required in the factory need to be swapped to be double buffered. This way the pipe can read how much the chemical factory will need to take and make appropriate adjustments to its next "contents" figure. The same occurs for entities pushing fluid into the pipe.

As we have all the information we need in buffered variables, all we do now is execute THE SAME CODE we would have if it was a single threaded application. That's it. As I said, it's really as complex as writing object oriented code.

Things to note,
The above example uses an array of 2 items as the buffer. This means that all the crossthreaded variables must be copied accross manually at each update to ensure they are correct. With an appropriate library and good optimisations the double buffered sections can be handled by a COW memory operation, but again they are optimisations and are part of a separate discussion, I am only trying to put forward the concepts. It would be like trying to explain a vtable to someone learning VB.

I hope that this example becomes a bit more clear why I am saying it's more like the change to OOP. You should be able to imagine a language that has a "crossthread" modifier that automatically handles the buffer read & write changes as well as the allocation of each "execution context" to the globally synchronised "to be processed" list whenever the "new" operator is called and other such boilerplate code.

Thanks for reading.

greep
Fast Inserter
Fast Inserter
Posts: 108
Joined: Mon Apr 25, 2016 3:12 am
Contact:

Re: Parrallel processing in games & applications

Post by greep » Tue Jan 17, 2017 7:01 am

FYI, you really should read that link smarty posted, it's pretty obvious they've considered everything you've said:

"Multithreading the update itself, which is the real bottleneck in big factories is the most needed step currently, but it is also the most complicated. Factorio needs to be fully deterministic and many things are tightly related across the whole map. This is why we won't just make the whole update run in threads as it would be huge amount of work that would break almost everything. We will just pick some specific processing tasks suitable for multithreading and implement these, while the rest of the logic will run in single thread as before"

hoho
Filter Inserter
Filter Inserter
Posts: 673
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Parrallel processing in games & applications

Post by hoho » Tue Jan 17, 2017 7:55 am

BenSeidel wrote:
hoho wrote:@OP, you didn't address the need of perfect determinism in your response to criticism of your original post.
Hmm... because I don't see that there is any indeterminism. Your question about the two inserters removing things from the same chest was addressed in my first post, so I'm not sure what you are asking.
It did talk a little bit about it but I was specifically interested in the scenario I described:
fewer items in the chest than inserters try to pick out. To make things more fun, make it a provider chest and have bots try to take things from it as well.

Or, another alternative, instead of chest have a transport belt with one item running on it but several inserters trying to take it off. Also add those inserters and belt to wire network that connects to machines in half the factory.


I'd be interested to hear the logic you'd propose for grouping up such vaguely connected things to be dealt with by a single thread instead of breaking them up over several threads.

Post Reply

Return to “General discussion”

Who is online

Users browsing this forum: No registered users