Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen »

Rows of chunks were considered at first, but this was declined because they will spread unevenly between threads. also there will be problems like having horizontal stretch of pipes going 1 thread vs. vertical getting into 8 threads, so overaly it will just make threads waiting each other for longer time. Getting bigger size of chunks - yes, I am thinking of that at times. like 64x64 tiles for threading, this is something we will probably try. Another thought - that grid does not actially need to be 3x3, it probably can be 2x2, so 4 generations instead of 9 because even when two chunks affect some common chunk from different threads - they will affect different entities on different sides of that common chunk, so still no collision. The problem here is that chunks have lists of active entities on them which would require to postpone those shared list changes. Also there is a problem with random-number generator. Single-threaded factorio uses one single RNG for the whole map. With that chunked multithreading every chunk gets personal RNG, which is fine for 3x3 grid, but still a problem for 2x2 grid.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14843
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

Threading the entity update cycle of Factorio has one *massive* problem:

The top 5 CPU consuming entities:
  • Belts (and all derivatives)
  • Biters
  • Inserters
  • Pipes
  • Robots
All of them are "slow" because they interact with other entities or other data sets outside the given entity and as such can't be threaded outright without adding data-races. Threading everything except the "external interaction" gives very minimal gains and makes the codebase shit to work with.

Right now our turn around time on most bugs is < 1 day. In most cases less than an hour after a developer starts working on a given bug. Adding hacked threading into the mix will as I said above: make the codebase shit to work with and make bug fixing and feature work take magnitudes more time while in the end giving minimal gains (<10%~).

It's just not worth it. At this stage in Factorio development adding in threading would be a colossal mistake that would slow development and cause far more harm than good.
If you want to get ahold of me I'm almost always on Discord.
BenSeidel
Filter Inserter
Filter Inserter
Posts: 591
Joined: Tue Jun 28, 2016 1:44 am
Contact:

Re: Parrallel processing in games & applications

Post by BenSeidel »

Rseding91 wrote:At this stage in Factorio development adding in threading would be a colossal mistake that would slow development and cause far more harm than good
The longer you wait the worse it will get because your single-threaded improvements will generally create causality constraints that don't work across multiple threads. But, Most importantly: Factorio doesn't NEED to be multi-threaded, the factories we can build are far larger than required. Some of us feel that would be nice because we like building big and a MMO server is fun!

It appears that the codebase is past the point of no return, so why not just admit that (at least to yourselves as developers). Once you have you can look at it objectively and decide if multi-threading is what you guys want to do with your time or if you just want to finish polishing it and get it out the door? Getting it multi-threaded will be a challenge (less so with artifacts) and must be a goal, not a hack, and should be as important as any single-threaded optimisation.

If you do decide that it is going to be a goal then you need to find a model that works for you. Ad-hoc multi-threading (what most people call multi-threading) is as bad an idea as global variables and goto's and is not a model. If you find yourself thinking "this needs to be controlled by a lock" then you have done it incorrectly. Multithreading should be done automatically by the model you have chosen. Things like functional programming, clock based double-buffering or event message based actors (each with their "own" thread) do this for you, so are good models. Find a model (or come up with something of your own) and then use it. Just imagine trying to program Factorio without classes, or even functions... it would be impossible because functions and classes are conceptual models that allow our puny human brain to cope with code. Programming with instruction level synchronisations (locks and CAS) is equivalent to multi-threaded assembly.

No matter what I say it won't change your opinion. You have the code in front of you and all you are doing is trying to see how to get that code to run over multiple threads. It's like looking at a tree and wondering how you are going to eat your picnic on it. You need to understand what it is you are trying to create and have the correct tools at your disposal before you can see how to make what you have work.
hoho
Filter Inserter
Filter Inserter
Posts: 684
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Parrallel processing in games & applications

Post by hoho »

BenSeidel wrote:The longer you wait the worse it will get because your single-threaded improvements will generally create causality constraints that don't work across multiple threads.
It works both ways. IIRC devs said that the way they made belt processing >10x faster isn't exactly easy to implement in multithreaded way. To get 10x speedup with parallelization, you'd need way more than 10 cores due to the game never being "embarrasingly parallel" just because it has to have 60 ticks per second when all threads have finished their processing and have to sync up.

There is also the thing that multithreading can ever only give *at best* linear scaling with core count. Optimizing algorithms is often much more than linear increase in performance, even if it means having to sacrifice ability to multithread things.
Zulan
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Mon Jan 25, 2016 5:55 pm
Contact:

Re: Parrallel processing in games & applications

Post by Zulan »

Rseding91 wrote:... make bug fixing and feature work take magnitudes more time while in the end giving minimal gains (<10%~). ...
Harkonnen wrote:The code of Factorio is so clean that I managed to inject threading in roughly 2 months with seeing quite big codebase for the first time :) ... usual gains are around 10-30%
Again, I do really appreciate the open communication, but I'm getting some conflicting information here.
posila
Factorio Staff
Factorio Staff
Posts: 5430
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: Parrallel processing in games & applications

Post by posila »

Zulan wrote:Again, I do really appreciate the open communication, but I'm getting some conflicting information here.
As every group of people our team does not have uniform opinions, and Rseding and Harkonnen are polar opposites when it comes to multithreading in Factorio.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by ratchetfreak »

Rseding91 wrote:Threading the entity update cycle of Factorio has one *massive* problem:

The top 5 CPU consuming entities:
  • Belts (and all derivatives)
  • Biters
  • Inserters
  • Pipes
  • Robots
All of them are "slow" because they interact with other entities or other data sets outside the given entity and as such can't be threaded outright without adding data-races. Threading everything except the "external interaction" gives very minimal gains and makes the codebase shit to work with.

Right now our turn around time on most bugs is < 1 day. In most cases less than an hour after a developer starts working on a given bug. Adding hacked threading into the mix will as I said above: make the codebase shit to work with and make bug fixing and feature work take magnitudes more time while in the end giving minimal gains (<10%~).

It's just not worth it. At this stage in Factorio development adding in threading would be a colossal mistake that would slow development and cause far more harm than good.
Is the robot bottleneck updating positions or looking through the jobs needing to be done?

For simply flying you can separate the path into segments and then make the robot logic go to sleep for x frames, only when it needs to render do you then need to know the exact position (which is then a lerp) or when the segment enters the detection range of an enemy (biter, worm, enemy turrets). For battery life you can pre-make the decision of when the battery runs out and the speed drops or when a diversion to a charging point needs to happen.

Then when the robot nears a destination (roboport, chest) can you insert the robot into the chunk-based threaded processing for the interaction (inventory interaction, charging or entering the roboport) using the chunk of the destination.

With that having 50k robots in the air means only processing a few hundred per frame (not counting job management&selection).
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen »

Robots flying never were an issue, and yeah - apart from that travel prediction they can also be easily threaded when they fly for they hardly interact with anything. They just don't show up in profiler on flying even in single-threaded form.

As for polarities - well, I never was pushing multithreading :) It was my sorta "test task", and it's my baby that sorta works. And for us it was not a plan to merge into master branch right away, it was an experiment from the very start - if it's worthy for Factorio at that chunk-level threading kovarex was thinking about for a long time ago or if it's not. It appeared that it's not because other parts (like biters) are main consumers and really would benefit from threading - regarding them nothing was threaded yet.

As for codebase becoming shit - well, I'm not to judge here because codebase is still new to me :) Rseding91 is our knowledge base who knows most parts of the code and devotes himself to that code a lot, probably more than anybody else. So if he says that multithreading is making things too over-compilcated - yeah, I agree to him. Also my long post a few pages ago was opting on his side, so there is actually no polarity. I'm just talking about threading a lot that makes him nervous that I am about to enfroce merging it into master. Nah, that's not the reason of all this talking :) I just like to have a rough vision ahead for several years, and all those ideas and discussions pop up in the brain at important time in the future. Although Factorio is to be released soon, we expect ports to more platforms (that would be stupid not to do), some ipad might become too slow without threading even at average factory, there may come Factorio 2, there may be some other project centered around update loop where all those ideas will come up, within a fresh codebase. Currently - yes, it's like trying to make a picnic on a tree :)

As for data structures - yes, we try to keep vectors as much as possible, not lists. We also try to use arrays/small_vector - those things that put data directly into the class for less indirection and likeliness of automatic prefetch after accessing 'this'. Still bad things regarding data locality happen, mostly because entities are updated in some "random order", and each of them is heap-allocated with 'new' operator. So I think that our next step fighting all those caches would be to achieve as much address proximity for entities on a single chunk as possible, with reordering of their update order to resemble their memory layout (of course deterministic).
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by ratchetfreak »

Harkonnen wrote: Still bad things regarding data locality happen, mostly because entities are updated in some "random order", and each of them is heap-allocated with 'new' operator. So I think that our next step fighting all those caches would be to achieve as much address proximity for entities on a single chunk as possible, with reordering of their update order to resemble their memory layout (of course deterministic).
I think having a per chunk-family memory pool for entities would be a first step. Then (if you can deterministically order allocates into that pool) you can iterate over the pool's objects for the updates.

Though that's more appropriate for entities that need constant updates. But most factorio stuff sleeps more than it doesn't I believe.

Except for the power buffer, but you could make so the common case (just do progress and consume a bit of power) is fast by pulling out all the power buffer bits into it's own array. That way you can iterate over all the power buffers and update the progress bar on the current action based on available power. When action is complete it triggers the entity proper to wake up and do it's logic and perhaps queue a next action requiring power.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14843
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

ratchetfreak wrote:Is the robot bottleneck updating positions or looking through the jobs needing to be done?
No, it's everything about the robot. Every time a line of code has to go touch something outside the immediate robot it has to go load some other piece of memory to read it. Checking if the destination is still valid, checking if it still has the items it needs to finish it's task, if not - checking with the logistic network for what to do next. Checking if it has enough energy to move else finding a roboport to recharge at. If it finished it's task it has to find a roboport to recharge and dock at. If it has been flying to a roboport and is low on energy it needs to check if there's a better one with less robots closer on the way that might have freed up in the meantime.

Meanwhile any movement has to update the actual entity position on the chunk which is a non-trivial operation to do.

Then the entire time it's moving biters/script/the player can interact with the robot and destroy it, change its inventory contents, or adjust its energy levels which changes what it would be doing the next update tick.
If you want to get ahold of me I'm almost always on Discord.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14843
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

ratchetfreak wrote:
Harkonnen wrote: Still bad things regarding data locality happen, mostly because entities are updated in some "random order", and each of them is heap-allocated with 'new' operator. So I think that our next step fighting all those caches would be to achieve as much address proximity for entities on a single chunk as possible, with reordering of their update order to resemble their memory layout (of course deterministic).
I think having a per chunk-family memory pool for entities would be a first step. Then (if you can deterministically order allocates into that pool) you can iterate over the pool's objects for the updates.

Though that's more appropriate for entities that need constant updates. But most factorio stuff sleeps more than it doesn't I believe.

Except for the power buffer, but you could make so the common case (just do progress and consume a bit of power) is fast by pulling out all the power buffer bits into it's own array. That way you can iterate over all the power buffers and update the progress bar on the current action based on available power. When action is complete it triggers the entity proper to wake up and do it's logic and perhaps queue a next action requiring power.
We don't currently make wide use of memory pools and as such we don't have memory leaks, off-by-one errors, or memory-access related problems and I want it to stay that way. Every time I watch a video about some other game company they're always talking about how they use "this" custom allocator or "that" custom allocator to detect when they leak memory or access memory outside their allocated set and we just don't have that.

The nature of every CPU heavy object is dynamic - that's what makes it slow. Ones that do little or are mostly self-contained things (mining drills, assembling machines, beacons, lamps) take almost nothing to process because they rarely reach outside their local "self" during the update cycle or have to iterate through something like a std::vector (heap allocated dynamic size memory).
If you want to get ahold of me I'm almost always on Discord.
ratchetfreak
Filter Inserter
Filter Inserter
Posts: 952
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by ratchetfreak »

Rseding91 wrote:
ratchetfreak wrote:Is the robot bottleneck updating positions or looking through the jobs needing to be done?
Non, it's everything about the robot. Every time a line of code has to go touch something outside the immediate robot it has to go load some other piece of memory to read it. Checking if the destination is still valid, checking if it still has the items it needs to finish it's task, if not - checking with the logistic network for what to do next. Checking if it has enough energy to move else finding a roboport to recharge at. If it finished it's task it has to find a roboport to recharge and dock at. If it has been flying to a roboport and is low on energy it needs to check if there's a better one with less robots closer on the way that might have freed up in the meantime.

Meanwhile any movement has to update the actual entity position on the chunk which is a non-trivial operation to do.

Then the entire time it's moving biters/script/the player can interact with the robot and destroy it, change its inventory contents, or adjust its energy levels which changes what it would be doing the next update tick.
It sounds like quite a few of those polling operations could instead go the other way as pushing operations, if an entity has a logistics task underway as the destination and becomes invalid (a rare event without circuit networks) the entity would notify the robot(s) by setting a flag. Inventory changes mid-flight only happen when scripts are involved so that can also set a flag.

All energy based stuff can be precalculated on when it takes effect (unless scripts get involved but that is what the flag is for).

My proposal also pulls robots outside the chunk-based entity system to avoid the overhead of cross-chunk operation (right up until it has to interact with entities for dropoff/pickup/recharge/dock again then it can go into the chunk again for a bit). As for biter interaction you can use the start and end points of the current bit of path as a course-grained collision detection. After that hits you double check with the actual position.
Zulan
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Mon Jan 25, 2016 5:55 pm
Contact:

Re: Parrallel processing in games & applications

Post by Zulan »

posila wrote:
Zulan wrote:Again, I do really appreciate the open communication, but I'm getting some conflicting information here.
As every group of people our team does not have uniform opinions, and Rseding and Harkonnen are polar opposites when it comes to multithreading in Factorio.
<10% and 10%-30% are mutually exclusive and not part of the opinion ;). Assuming both numbers are correct, there must be different contexts, I'm just curious about those.

It must be clear that the discussion by people who have not seen the source code only considers the theoretic aspects of the problem. Nevertheless that can be an interesting and useful discussion.

I do have high assumptions about the code quality, that said clean interfaces, encapsulation and a modular design are of course beneficial when it comes to implementing parallelism.

Rseding, I don't understand your remarks regarding memory pools. Custom allocators are used to detect memory issues, but they generally don't cause them. From the description of current bottlenecks, it seams that the next optimization step, regardless of parallelism, is optimizing the memory layout. With a clean understanding of ownership and lifetime as well as encapsulation, there should not be any fear of increased memory errors.

One general question that would help my understanding of what concepts can be applied: In what order are the main update methods of entities currently executed?
OkariDraconis
Inserter
Inserter
Posts: 41
Joined: Wed Mar 02, 2016 3:33 pm
Contact:

Re: Parrallel processing in games & applications

Post by OkariDraconis »

Klonan wrote: But multithreading isn't some amazing performance multiplier, and it only really has an impact if you have more cores, so its great if you have a shiny modern 8 core i7 or something, but for people on laptop or low end CPUs with maybe only a dual core, the performance improvement might not even be there.
I did want to say, that AMD gives you a lot of cores on the cheap... So sometimes a lower end / medium quality computer will have a lot of cores, and this is especial true moving forward. As 64 core processors are coming out now (on the server side granted) consumers are going to only get more and more cores at this point on the cheap. And with hyperthreading on most Intel chips, their cores can handle an extra thread. (not all the time, but i'm not going to get into it).

However it looks like you guys are pushing Multi-threading in the right areas so keep kicking ass :)
Please review this idea when you get a chance
Swarm Biters (locusts) - The Evolved Response to TurretCreep

5+ years game development experiance
Rseding91
Factorio Staff
Factorio Staff
Posts: 14843
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

Zulan wrote:One general question that would help my understanding of what concepts can be applied: In what order are the main update methods of entities currently executed?
In the order they're marked as active on a given chunk which changes as they go offline due to not having work to do.
If you want to get ahold of me I'm almost always on Discord.
Zulan
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Mon Jan 25, 2016 5:55 pm
Contact:

Re: Parrallel processing in games & applications

Post by Zulan »

Rseding91 wrote:
Zulan wrote:One general question that would help my understanding of what concepts can be applied: In what order are the main update methods of entities currently executed?
In the order they're marked as active on a given chunk which changes as they go offline due to not having work to do.
So it's not even grouped by type of entity (e.g. first Factories, then drills, then belts, then inserters, ...)? Is there a particular reason for that ordering?
Rseding91
Factorio Staff
Factorio Staff
Posts: 14843
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 »

Zulan wrote:
Rseding91 wrote:
Zulan wrote:One general question that would help my understanding of what concepts can be applied: In what order are the main update methods of entities currently executed?
In the order they're marked as active on a given chunk which changes as they go offline due to not having work to do.
So it's not even grouped by type of entity (e.g. first Factories, then drills, then belts, then inserters, ...)? Is there a particular reason for that ordering?
No. I tried grouping entities by type and updating them in that order and it had zero measurable impact - neither positive or negative.
If you want to get ahold of me I'm almost always on Discord.
Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen »

Zulan wrote:<10% and 10%-30% are mutually exclusive and not part of the opinion ;). Assuming both numbers are correct, there must be different contexts, I'm just curious about those.
10-30% were with belts threaded. <10% was without belts threaded. in original unmerged form of belts (i.e. before recent FFF). So everything that can be threaded with that chunk-based approach is around 10%, so that chunk-based approach hardly gets into codebase. Still we did not abandon completely idea of threading biters for example or other aspects we didn't touch yet which will require something completely different than splitting jobs at chunk-by-chunk bassis and these changes should not be so intrusive.
Zulan wrote:So it's not even grouped by type of entity (e.g. first Factories, then drills, then belts, then inserters, ...)? Is there a particular reason for that ordering?
Grouping by type of entity would help instruction cache. Data is still in disarray :)
BenSeidel
Filter Inserter
Filter Inserter
Posts: 591
Joined: Tue Jun 28, 2016 1:44 am
Contact:

Re: Parrallel processing in games & applications

Post by BenSeidel »

hoho wrote:It works both ways. IIRC devs said that the way they made belt processing >10x faster isn't exactly easy to implement in multithreaded way
Hoho, do you have a blind-spot for words such as "generally" or "typically"?
A single-threaded optimisation will not prevent all the multi-threading optimisations and a multi-threaded optimisation will not stop all single-threaded optimisations, but as an optimisation is any movement away from the conceptual model due to the performance requirements of the code, then by definition you will not be able to do all the things you would otherwise be able to do.
hoho wrote:There is also the thing that multithreading can ever only give *at best* linear scaling with core count. Optimizing algorithms is often much more than linear increase in performance, even if it means having to sacrifice ability to multithread things.
Do you understand the concept of orthogonal (or near orthogonal as the above is) concerns?

Have a read of https://en.wikipedia.org/wiki/Productio ... y_frontier and that should give you an idea of what I saying.
posila wrote:As every group of people our team does not have uniform opinions, and Rseding and Harkonnen are polar opposites when it comes to multithreading in Factorio.
So Rseding... Do you want to be convinced otherwise?
Harkonnen wrote:Currently - yes, it's like trying to make a picnic on a tree
Hahah, sooo true... But once you have your conceptual model of how to transform your tree into a park bench, you are set. Granted *some* of the tree will be discarded (bark, leaves etc) and you will have to supply *some* additional components (bolts, paint/lacquer), But if you are a good carpenter then you should be able to use most of the original tree. It's not a case (well shouldn't be anyway) of having to put the tree through a wood-chipper then having to put it back together piece-by-piece. If you think that your code will have to go through a similar mulching process then you are missing a good model.
Zulan
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Mon Jan 25, 2016 5:55 pm
Contact:

Re: Parrallel processing in games & applications

Post by Zulan »

Harkonnen wrote: 10-30% were with belts threaded. <10% was without belts threaded. in original unmerged form of belts (i.e. before recent FFF). So everything that can be threaded with that chunk-based approach is around 10%, so that chunk-based approach hardly gets into codebase. Still we did not abandon completely idea of threading biters for example or other aspects we didn't touch yet which will require something completely different than splitting jobs at chunk-by-chunk bassis and these changes should not be so intrusive.
Thanks for clarifying that.
Harkonnen wrote:
Zulan wrote:So it's not even grouped by type of entity (e.g. first Factories, then drills, then belts, then inserters, ...)? Is there a particular reason for that ordering?
Grouping by type of entity would help instruction cache. Data is still in disarray :)
I was thinking more in terms of grouping in order to have clearer dependencies. I would be surprised if icache is a significant issue.
hoho wrote:There is also the thing that multithreading can ever only give *at best* linear scaling with core count.
By the way, there is super-linear speedup. Allthough that is something you wouldn't count on / expect.
Post Reply

Return to “General discussion”