Page 10 of 22

Re: Parrallel processing in games & applications

Posted: Sun Feb 19, 2017 12:00 am
by Rseding91
GeniusIsme wrote:
Rseding91 wrote: We don't currently make wide use of memory pools and as such we don't have memory leaks, off-by-one errors, or memory-access related problems and I want it to stay that way.
Have you started, by any chance, to use smart pointers?
Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.

Re: Parrallel processing in games & applications

Posted: Tue Feb 21, 2017 1:00 pm
by hoho
Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)

Re: Parrallel processing in games & applications

Posted: Tue Feb 21, 2017 8:16 pm
by madpav3l
NotABiter wrote:
Rseding91 wrote:
hansinator wrote:
Harkonnen wrote:In modern times memory bandwidth is the main limiter which is around 20Gb/s in hardware where factorio targets itself.
I've heard that before from a dev that memory bandwidth is the limiting factor. I believe you are just guessing, because I have numbers that show a different situation. I ran some benchmarks and used Intel's PCM tools to measure memory bandwidth untilization. PCM tools directly query performance counters of the memory controller, so it is very precise.
No guessing involved at all: https://www.reddit.com/r/factorio/comme ... ed_fpsups/ http://imgur.com/a/QJiHE
But what you're linking sure looks like a guess to me. I mean, yes, you're seeing a performance difference when you change memory speed, but you're still guessing about why. When you change memory speed you are changing both your memory latency and your memory bandwidth, so you don't know which (or what mix) of them is responsible for the change in performance.

The tool hansinator is using clearly (no guesswork) shows that you're only using a tiny fraction of total available memory bandwidth. Now it's still possible that for some small fraction of the frame time you are memory bandwidth bound, but for at least the vast majority of the frame time you're clearly not.

That is to say, it looks like your performance goes up with faster memory entirely or almost entirely due to decreased memory latency, and not due to increased memory bandwidth. That (combined with hansinator's CPU core not being maxed) indicates you are doing a lot of non-sequential memory accesses, and the resulting memory stalls are eating a lot of your time.
A little late but I need to clarify this, the test I did is no guess. All the different memory speeds (1600, 2133, 3200 MHz) are with the same timings/latency. Only in this test http://imgur.com/2FPr3su I did try different timings to see how it affects the game speed.

Re: Parrallel processing in games & applications

Posted: Tue Feb 21, 2017 8:53 pm
by Zulan
madpav3l wrote:A little late but I need to clarify this, the test I did is no guess. All the different memory speeds (1600, 2133, 3200 MHz) are with the same timings/latency. Only in this test http://imgur.com/2FPr3su I did try different timings to see how it affects the game speed.
Unless I'm very much mistaken RAM timings are configured in clock cycles. Hence, when you change the memory clock rate (1600, 2133, 3200 MHz), you do modify the effective latencies (as in nanoseconds).

(This does not devalue your impressive set of insightful benchmarks!)

Re: Parrallel processing in games & applications

Posted: Thu Feb 23, 2017 5:07 pm
by GeniusIsme
hoho wrote:
Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)
Because they are. Or do you have some other "smart pointers" in mind?

Re: Parrallel processing in games & applications

Posted: Thu Feb 23, 2017 8:02 pm
by Rseding91
hoho wrote:
Rseding91 wrote:Rarely. Most everything interacting with network code uses unique or shared pointers. The core game logic primarily does not.
Some people define shared/unique/scoped pointers as "smart" ones too :)
The only two things I normally think of when someone says "smart pointer" is unique_ptr and shared_ptr. If you mean any kind of class/wrapper we use for managing allocation/deallocation of an object then we have tons of those.

Re: Parrallel processing in games & applications

Posted: Wed Mar 08, 2017 11:26 am
by ratchetfreak
To help with cache performance profiling you may want to check out ig-cachesim a tool by Insomniac Games that lets you simulate cache use based on actual memory accesses during gameplay.

It's currently setup for the Jaguar architecture but is customizable to any cache architecture.

Re: Parrallel processing in games & applications

Posted: Fri May 19, 2017 10:58 am
by roidal
viewtopic.php?f=5&t=39893&start=60#p238247:
Harkonnen wrote: ... and it gets its first update tick to produce red circuit. And it estimates that it will finish producing in like 1000 ticks, so it puts itself into another timed-way sleep-list, and for another 999 ticks it's not updated ...
Are this timed-way sleep-lists already in 0.15.x or is this in the not merged multithreading branch?

Re: Parrallel processing in games & applications

Posted: Thu May 25, 2017 10:56 am
by Harkonnen
roidal
They are in there for things like fires and smokes. For assembly machines this was an experiment, still not in 0.15 due to electric brownout logics.

Re: Parrallel processing in games & applications

Posted: Thu May 25, 2017 2:25 pm
by roidal
Harkonnen wrote:roidal
They are in there for things like fires and smokes. For assembly machines this was an experiment, still not in 0.15 due to electric brownout logics.
Thanks for your answer.
Exactly about electric system i was wondering how you would handle that.

May i ask whats the plan about that? Waking up all assembly machines from the timed sleeplist if electricity is not 100%?

PS.: what's the status of the merge-system for the transport-belts (to make longer segments)?

Re: Parrallel processing in games & applications

Posted: Thu May 25, 2017 4:43 pm
by Harkonnen
roidal
Yes, waking up everything during brownout is the only option I guess (but that would cause UPS drop during brownouts), but on the other hand everything is slow anyway during brownouts, and big-factory players rarely have brownouts. Another option would be for electric network to control update cycle, i.e. update only entities that got electricity leading to random once-in-N-tick updates during brownout leading to natural slowdown. Not sure if we will ever implement that, just a thought.

On belts - they are coming 0.16. The reason we did not put them in 0.15 - it has many other changes already, and belts are a very massive change, so if I made some desync-prone bugs in belts, it would be terribly hard for us to decypher desync reports if they came from belts or from something else.

Re: Parrallel processing in games & applications

Posted: Thu May 25, 2017 5:41 pm
by roidal
Harkonnen wrote:roidal
Yes, waking up everything during brownout is the only option I guess (but that would cause UPS drop during brownouts), but on the other hand everything is slow anyway during brownouts, and big-factory players rarely have brownouts. Another option would be for electric network to control update cycle, i.e. update only entities that got electricity leading to random once-in-N-tick updates during brownout leading to natural slowdown. Not sure if we will ever implement that, just a thought.

On belts - they are coming 0.16. The reason we did not put them in 0.15 - it has many other changes already, and belts are a very massive change, so if I made some desync-prone bugs in belts, it would be terribly hard for us to decypher desync reports if they came from belts or from something else.
Thanks again! :D

To the belts: When belt-segments are implemented how will they count? Is it still one entity per belt-piece and a segment is a "internal"-datastructure, or is it one entity per belt-segment? And how will it work with belts visible on the screen...is there an exception for them to calculate item-position for visualisation?

Thanks in advance!

Re: Parrallel processing in games & applications

Posted: Sun May 28, 2017 4:32 pm
by Harkonnen
roidal
Belts and splitters are no longer updatable entities, it all moved to sort of "transport-line-manager" like we have for electric networks. Item position is important not just for rendering, but also for inserters to chase items and stop chasing when item goes to next belt in a sequence. There are special data structures that can tell from belt piece things like "I am part of transport-line NNN from 6.34 till 7.34 of its overall length" while transport line handles actual items movement. Belts know nothing directly about items on them.

Re: Parrallel processing in games & applications

Posted: Tue May 30, 2017 6:12 am
by roidal
Harkonnen wrote:roidal
Belts and splitters are no longer updatable entities, it all moved to sort of "transport-line-manager" like we have for electric networks. Item position is important not just for rendering, but also for inserters to chase items and stop chasing when item goes to next belt in a sequence. There are special data structures that can tell from belt piece things like "I am part of transport-line NNN from 6.34 till 7.34 of its overall length" while transport line handles actual items movement. Belts know nothing directly about items on them.
Interessting!
Do you plan something similar for pipes too?

Re: Parrallel processing in games & applications

Posted: Tue May 30, 2017 6:48 am
by Harkonnen
Yes.

What's currently preventing more parallelization?

Posted: Mon Sep 04, 2017 2:50 pm
by FrodoOf9Fingers
Just out of curiosity mostly.

I know that there's a push for optimizing single thread work before jumping into the glorious chaos of multi threading, but is there anything else keeping the devs from opening the floodgates? 10% is nice, but 75% (assuming work is easily parallelizeable) sounds better :p.

Re: What's currently preventing more parallelization?

Posted: Mon Sep 04, 2017 3:09 pm
by DaveMcW
Factorio was originally a single player game. It took over a year to add multiplayer and fix all the major bugs.

Factorio was originally a single threaded program...

Re: What's currently preventing more parallelization?

Posted: Mon Sep 04, 2017 3:24 pm
by t-lor
actually multiplayer and the deterministic needed for that is one of the reasons its almost impossible
theres multiple posts on this. the conclusion for now is "not gonna happen any time soon"

Re: What's currently preventing more parallelization?

Posted: Mon Sep 04, 2017 3:29 pm
by Rseding91
Largely: it doesn't help. At best you might see 10-25% improvements in hyper-specific scenarios but the general case can end up suffering a performance loss by trying to run the game update loop in multiple threads in most scenarios.

Single-core performance has been optimized to the point where it's almost always waiting for RAM. Adding more threads increases the RAM demand and as I mentioned above: can cause both threads to run slower for having run at the same time due to contention for memory than if they ran in series on a single thread.

Take that, and then still make sure everything is deterministic regardless of thread count without sacrificing the intricate entity interaction and most things simply can't be done in multiple threads: in order to update entity B entity A must first be updated as it can change what entity B will do during its update.

Re: What's currently preventing more parallelization?

Posted: Mon Sep 04, 2017 3:33 pm
by BlakeMW
This has been discussed to death before... but what it basically comes down to is that there are single-core optimizations that do not play nicely with multi-core. One over-simplified example, a CPU L1 cache is really fast, it's like 300x faster than a main memory lookup, a modern computer tries to predict what memory will be needed ahead of time and loads that memory into the CPU cache before it is needed, so the CPU can keep calculating like crazy and isn't sitting around twiddling its thumbs for a small eternity waiting for data to be moved from main memory into the CPU cache. There are all kinds of optimizations around making your program easy to predict and making critical code fit in the cache, this is not impossible for parallelized code but it is certainly harder.

Furthermore there is a great deal of overhead whenever two cores have to be coordinated rather than ore core blazing away at a single task.

By way of example, imagine a post hole needs to be dug. It's not very easy for two people to dig one hole simultaneously because their shovels get in the way of each other. But two people can dig two holes twice as fast as one person can. Two CPU cores can't work together to complete the same task in half the time: but they can do two separate tasks in half the time. So the trick is, how could you break each Factorio update (60 times per second) into two or four completely separate but equal tasks? That is not an easy thing because everything is so interconnected. You can't just have each core process half the belts, because what when items move from one core's belts to the other? That's a coordination problem which must be solved in a deterministic way so as to not break multiplayer. And pretty much any way you try to break up the game you're going to run into these coordination problems.

There are theoretical solutions: Instead of breaking each update into 4 tasks, break it into 10000 and get each core to do 2500 of those tasks (this involves non-thread based parallelism). But the thing is that whenever a core switches tasks (that is to say, the prediction strategy failed) there is overhead, so the more you break the program up into pieces, the less those optimizations I talked about earlier can be exploited. And the program really has to be designed from the ground up to use this kind of massive-parallelism strategy, with determinism taken into account. (by the way for this kind of massive parallelism often GPU is used instead of CPU).