Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
Rseding91
Factorio Staff
Factorio Staff
Posts: 9779
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 » Sun Feb 19, 2017 12:00 am

GeniusIsme wrote:
Rseding91 wrote: We don't currently make wide use of memory pools and as such we don't have memory leaks, off-by-one errors, or memory-access related problems and I want it to stay that way.
Have you started, by any chance, to use smart pointers?
Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
If you want to get ahold of me I'm almost always on Discord.

hoho
Filter Inserter
Filter Inserter
Posts: 673
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Parrallel processing in games & applications

Post by hoho » Tue Feb 21, 2017 1:00 pm

Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)

User avatar
madpav3l
Long Handed Inserter
Long Handed Inserter
Posts: 71
Joined: Sat Oct 31, 2015 10:24 pm
Contact:

Re: Parrallel processing in games & applications

Post by madpav3l » Tue Feb 21, 2017 8:16 pm

NotABiter wrote:
Rseding91 wrote:
hansinator wrote:
Harkonnen wrote:In modern times memory bandwidth is the main limiter which is around 20Gb/s in hardware where factorio targets itself.
I've heard that before from a dev that memory bandwidth is the limiting factor. I believe you are just guessing, because I have numbers that show a different situation. I ran some benchmarks and used Intel's PCM tools to measure memory bandwidth untilization. PCM tools directly query performance counters of the memory controller, so it is very precise.
No guessing involved at all: https://www.reddit.com/r/factorio/comme ... ed_fpsups/ http://imgur.com/a/QJiHE
But what you're linking sure looks like a guess to me. I mean, yes, you're seeing a performance difference when you change memory speed, but you're still guessing about why. When you change memory speed you are changing both your memory latency and your memory bandwidth, so you don't know which (or what mix) of them is responsible for the change in performance.

The tool hansinator is using clearly (no guesswork) shows that you're only using a tiny fraction of total available memory bandwidth. Now it's still possible that for some small fraction of the frame time you are memory bandwidth bound, but for at least the vast majority of the frame time you're clearly not.

That is to say, it looks like your performance goes up with faster memory entirely or almost entirely due to decreased memory latency, and not due to increased memory bandwidth. That (combined with hansinator's CPU core not being maxed) indicates you are doing a lot of non-sequential memory accesses, and the resulting memory stalls are eating a lot of your time.
A little late but I need to clarify this, the test I did is no guess. All the different memory speeds (1600, 2133, 3200 MHz) are with the same timings/latency. Only in this test http://imgur.com/2FPr3su I did try different timings to see how it affects the game speed.

Zulan
Inserter
Inserter
Posts: 47
Joined: Mon Jan 25, 2016 5:55 pm
Contact:

Re: Parrallel processing in games & applications

Post by Zulan » Tue Feb 21, 2017 8:53 pm

madpav3l wrote:A little late but I need to clarify this, the test I did is no guess. All the different memory speeds (1600, 2133, 3200 MHz) are with the same timings/latency. Only in this test http://imgur.com/2FPr3su I did try different timings to see how it affects the game speed.
Unless I'm very much mistaken RAM timings are configured in clock cycles. Hence, when you change the memory clock rate (1600, 2133, 3200 MHz), you do modify the effective latencies (as in nanoseconds).

(This does not devalue your impressive set of insightful benchmarks!)

GeniusIsme
Burner Inserter
Burner Inserter
Posts: 15
Joined: Sun Sep 28, 2014 7:26 am
Contact:

Re: Parrallel processing in games & applications

Post by GeniusIsme » Thu Feb 23, 2017 5:07 pm

hoho wrote:
Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)
Because they are. Or do you have some other "smart pointers" in mind?

Rseding91
Factorio Staff
Factorio Staff
Posts: 9779
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Rseding91 » Thu Feb 23, 2017 8:02 pm

hoho wrote:
Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)
The only two things I normally think of when someone says "smart pointer" is unique_ptr and shared_ptr. If you mean any kind of class/wrapper we use for managing allocation/deallocation of an object then we have tons of those.
If you want to get ahold of me I'm almost always on Discord.

ratchetfreak
Filter Inserter
Filter Inserter
Posts: 938
Joined: Sat May 23, 2015 12:10 pm
Contact:

Re: Parrallel processing in games & applications

Post by ratchetfreak » Wed Mar 08, 2017 11:26 am

To help with cache performance profiling you may want to check out ig-cachesim a tool by Insomniac Games that lets you simulate cache use based on actual memory accesses during gameplay.

It's currently setup for the Jaguar architecture but is customizable to any cache architecture.

roidal
Inserter
Inserter
Posts: 22
Joined: Mon Mar 07, 2016 9:49 pm
Contact:

Re: Parrallel processing in games & applications

Post by roidal » Fri May 19, 2017 10:58 am

viewtopic.php?f=5&t=39893&start=60#p238247:
Harkonnen wrote: ... and it gets its first update tick to produce red circuit. And it estimates that it will finish producing in like 1000 ticks, so it puts itself into another timed-way sleep-list, and for another 999 ticks it's not updated ...
Are this timed-way sleep-lists already in 0.15.x or is this in the not merged multithreading branch?

Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen » Thu May 25, 2017 10:56 am

roidal
They are in there for things like fires and smokes. For assembly machines this was an experiment, still not in 0.15 due to electric brownout logics.

roidal
Inserter
Inserter
Posts: 22
Joined: Mon Mar 07, 2016 9:49 pm
Contact:

Re: Parrallel processing in games & applications

Post by roidal » Thu May 25, 2017 2:25 pm

Harkonnen wrote:roidal
They are in there for things like fires and smokes. For assembly machines this was an experiment, still not in 0.15 due to electric brownout logics.
Thanks for your answer.
Exactly about electric system i was wondering how you would handle that.

May i ask whats the plan about that? Waking up all assembly machines from the timed sleeplist if electricity is not 100%?

PS.: what's the status of the merge-system for the transport-belts (to make longer segments)?

Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen » Thu May 25, 2017 4:43 pm

roidal
Yes, waking up everything during brownout is the only option I guess (but that would cause UPS drop during brownouts), but on the other hand everything is slow anyway during brownouts, and big-factory players rarely have brownouts. Another option would be for electric network to control update cycle, i.e. update only entities that got electricity leading to random once-in-N-tick updates during brownout leading to natural slowdown. Not sure if we will ever implement that, just a thought.

On belts - they are coming 0.16. The reason we did not put them in 0.15 - it has many other changes already, and belts are a very massive change, so if I made some desync-prone bugs in belts, it would be terribly hard for us to decypher desync reports if they came from belts or from something else.

roidal
Inserter
Inserter
Posts: 22
Joined: Mon Mar 07, 2016 9:49 pm
Contact:

Re: Parrallel processing in games & applications

Post by roidal » Thu May 25, 2017 5:41 pm

Harkonnen wrote:roidal
Yes, waking up everything during brownout is the only option I guess (but that would cause UPS drop during brownouts), but on the other hand everything is slow anyway during brownouts, and big-factory players rarely have brownouts. Another option would be for electric network to control update cycle, i.e. update only entities that got electricity leading to random once-in-N-tick updates during brownout leading to natural slowdown. Not sure if we will ever implement that, just a thought.

On belts - they are coming 0.16. The reason we did not put them in 0.15 - it has many other changes already, and belts are a very massive change, so if I made some desync-prone bugs in belts, it would be terribly hard for us to decypher desync reports if they came from belts or from something else.
Thanks again! :D

To the belts: When belt-segments are implemented how will they count? Is it still one entity per belt-piece and a segment is a "internal"-datastructure, or is it one entity per belt-segment? And how will it work with belts visible on the screen...is there an exception for them to calculate item-position for visualisation?

Thanks in advance!

Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen » Sun May 28, 2017 4:32 pm

roidal
Belts and splitters are no longer updatable entities, it all moved to sort of "transport-line-manager" like we have for electric networks. Item position is important not just for rendering, but also for inserters to chase items and stop chasing when item goes to next belt in a sequence. There are special data structures that can tell from belt piece things like "I am part of transport-line NNN from 6.34 till 7.34 of its overall length" while transport line handles actual items movement. Belts know nothing directly about items on them.

roidal
Inserter
Inserter
Posts: 22
Joined: Mon Mar 07, 2016 9:49 pm
Contact:

Re: Parrallel processing in games & applications

Post by roidal » Tue May 30, 2017 6:12 am

Harkonnen wrote:roidal
Belts and splitters are no longer updatable entities, it all moved to sort of "transport-line-manager" like we have for electric networks. Item position is important not just for rendering, but also for inserters to chase items and stop chasing when item goes to next belt in a sequence. There are special data structures that can tell from belt piece things like "I am part of transport-line NNN from 6.34 till 7.34 of its overall length" while transport line handles actual items movement. Belts know nothing directly about items on them.
Interessting!
Do you plan something similar for pipes too?

Harkonnen
Fast Inserter
Fast Inserter
Posts: 207
Joined: Fri Sep 02, 2016 9:23 am
Contact:

Re: Parrallel processing in games & applications

Post by Harkonnen » Tue May 30, 2017 6:48 am

Yes.

FrodoOf9Fingers
Long Handed Inserter
Long Handed Inserter
Posts: 97
Joined: Sat Apr 29, 2017 11:13 pm
Contact:

What's currently preventing more parallelization?

Post by FrodoOf9Fingers » Mon Sep 04, 2017 2:50 pm

Just out of curiosity mostly.

I know that there's a push for optimizing single thread work before jumping into the glorious chaos of multi threading, but is there anything else keeping the devs from opening the floodgates? 10% is nice, but 75% (assuming work is easily parallelizeable) sounds better :p.

User avatar
DaveMcW
Smart Inserter
Smart Inserter
Posts: 2915
Joined: Tue May 13, 2014 11:06 am
Contact:

Re: What's currently preventing more parallelization?

Post by DaveMcW » Mon Sep 04, 2017 3:09 pm

Factorio was originally a single player game. It took over a year to add multiplayer and fix all the major bugs.

Factorio was originally a single threaded program...

t-lor
Long Handed Inserter
Long Handed Inserter
Posts: 61
Joined: Tue Apr 25, 2017 9:28 pm
Contact:

Re: What's currently preventing more parallelization?

Post by t-lor » Mon Sep 04, 2017 3:24 pm

actually multiplayer and the deterministic needed for that is one of the reasons its almost impossible
theres multiple posts on this. the conclusion for now is "not gonna happen any time soon"

Rseding91
Factorio Staff
Factorio Staff
Posts: 9779
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: What's currently preventing more parallelization?

Post by Rseding91 » Mon Sep 04, 2017 3:29 pm

Largely: it doesn't help. At best you might see 10-25% improvements in hyper-specific scenarios but the general case can end up suffering a performance loss by trying to run the game update loop in multiple threads in most scenarios.

Single-core performance has been optimized to the point where it's almost always waiting for RAM. Adding more threads increases the RAM demand and as I mentioned above: can cause both threads to run slower for having run at the same time due to contention for memory than if they ran in series on a single thread.

Take that, and then still make sure everything is deterministic regardless of thread count without sacrificing the intricate entity interaction and most things simply can't be done in multiple threads: in order to update entity B entity A must first be updated as it can change what entity B will do during its update.
If you want to get ahold of me I'm almost always on Discord.

BlakeMW
Filter Inserter
Filter Inserter
Posts: 951
Joined: Thu Jan 21, 2016 9:29 am
Contact:

Re: What's currently preventing more parallelization?

Post by BlakeMW » Mon Sep 04, 2017 3:33 pm

This has been discussed to death before... but what it basically comes down to is that there are single-core optimizations that do not play nicely with multi-core. One over-simplified example, a CPU L1 cache is really fast, it's like 300x faster than a main memory lookup, a modern computer tries to predict what memory will be needed ahead of time and loads that memory into the CPU cache before it is needed, so the CPU can keep calculating like crazy and isn't sitting around twiddling its thumbs for a small eternity waiting for data to be moved from main memory into the CPU cache. There are all kinds of optimizations around making your program easy to predict and making critical code fit in the cache, this is not impossible for parallelized code but it is certainly harder.

Furthermore there is a great deal of overhead whenever two cores have to be coordinated rather than ore core blazing away at a single task.

By way of example, imagine a post hole needs to be dug. It's not very easy for two people to dig one hole simultaneously because their shovels get in the way of each other. But two people can dig two holes twice as fast as one person can. Two CPU cores can't work together to complete the same task in half the time: but they can do two separate tasks in half the time. So the trick is, how could you break each Factorio update (60 times per second) into two or four completely separate but equal tasks? That is not an easy thing because everything is so interconnected. You can't just have each core process half the belts, because what when items move from one core's belts to the other? That's a coordination problem which must be solved in a deterministic way so as to not break multiplayer. And pretty much any way you try to break up the game you're going to run into these coordination problems.

There are theoretical solutions: Instead of breaking each update into 4 tasks, break it into 10000 and get each core to do 2500 of those tasks (this involves non-thread based parallelism). But the thing is that whenever a core switches tasks (that is to say, the prediction strategy failed) there is overhead, so the more you break the program up into pieces, the less those optimizations I talked about earlier can be exploited. And the program really has to be designed from the ground up to use this kind of massive-parallelism strategy, with determinism taken into account. (by the way for this kind of massive parallelism often GPU is used instead of CPU).

Post Reply

Return to “General discussion”

Who is online

Users browsing this forum: No registered users