Multithreaded game update loop

Post all other topics which do not belong to any other category.
tux_mark_5
Long Handed Inserter
Long Handed Inserter
Posts: 68
Joined: Thu Jan 15, 2015 2:20 pm
Contact:

Re: Multithreaded game update loop

Post by tux_mark_5 »

You certainly have enthusiasm, but its clouding your view of the reality of multi-core computing. Even a linear increase in speed requires what we call an "embarrassingly parallel" problem that lends itself to problems that are more IO bound than CPU bound (my experience here is writing Fortran that runs on hundreds or thousands of cores at once). The game loop likely does not qualify as embarrassingly parallel and can except to see much less than linear scaling across multiple cores.

Secondly, consider your goal of bigger factories. If you have a factory that is 50x50 (just for arguments sake) and want to double its dimensions to 100x100, that is 4 times as much area and could be as much as four times more processing power to solve. Throwing cores at this problem is going to hit diminishing returns very quickly.

Lastly, parallelisation isn't a matter of just spawning more threads. It takes careful design to do right and when done wrong can actually hurt performance and create lots of hard to find bugs. Properly distributing load across threads and proper synchronization are vital and the algorithms in use must be designed with this in mind. How are you going to share data between threads? Is your solution going to impact cache locality (performance killer)? As mentioned by another poster, it is better to spend the time now squeezing all of the performance available out of the serial thread and fixing its bugs.
By doubling the factory size, I mean exactly that. Not quadrupling. Factories are essentially made of various entities. Double the size == double the entity count. How these entities are distributed in a map shouldn't matter.

As kovarex has already mentioned, game is already organized by chunks. Without knowing exact implementation details I can guess that each chunk also contains a list of entities that exist in that chunk. Basically, entire map is divided to chunks. In a hypothetical situation, where one chunk cannot affect another, parallel update is trivial - you only need a thread pool that processes a job list, where each job is meant to update a single active chunk. This is why I agreed with ssilk's idea to divide map into sectors and to update those (rather than chunks) in parallel instead - because for the most part sectors could be completely independent.

I know for a fact that game can run in parallel quite well without running into IO bottlenecks or cache locality problems: I can easily run 4 instances of factorio on my 6-core CPU at the same time. All 4 instances are running the same save that contains a fairly large factory that takes 12ms to update. When running 4 instances, this time increases to 15ms. So the game logic is obviously not IO dependent. If only 1 instance was rendering instead of all 4, I'm sure I could have increased factorio instance count to 6.

If the game had a deterministic Lua bindings API for IPC or some other form of communication between instances (like unix sockets), I could implement a crude version sector parallelism as a mod myself by running several instances of the game at the same time and teleporting items between instances. I would be more or less okay with something similar like that.

However, as kovarex mentioned in his latest post, the most intense game update parts can be easily split into separate threads. Also, he believes that per-chunk parallelism can be implemented without dividing map further into sectors.

So there you have it. The game is capable of parallel scalability. The game can be made to run parallel in two ways - either by processing chunks or sectors. Both methods gameplay-wise are suitable for me. If you don't want to run into various multithreading issues, the easy way is to handle game on sector-by-sector basis. But if the developers believe that they can implement even better version of parallelism (chunk-by-chunk), I'm definitely not going to argue.
keyboardhack
Filter Inserter
Filter Inserter
Posts: 478
Joined: Sat Aug 23, 2014 11:43 pm
Contact:

Re: Multithreaded game update loop

Post by keyboardhack »

tux_mark_5 wrote: However, as kovarex mentioned in his latest post, the most intense game update parts can be easily split into separate threads
kovarex wrote: There is no need to split the map into some parts, we have chunks already, and the most time consuming tasks can be divided into chunks, but it will have to do A LOT of special logic for proper synchronising of any action that can affect global state.

Not impossible, but hard, and yes optimising the core loop is much better now. Most of the stuff the engine is computing now can be simplified/avoided.
Hard = easily?
tux_mark_5 wrote: If the game had a deterministic Lua bindings API for IPC or some other form of communication between instances (like unix sockets), I could implement a crude version sector parallelism as a mod myself by running several instances of the game at the same time and teleporting items between instances. I would be more or less okay with something similar like that.
Say i want to run my factory on 4 cores with your mod. I now need to run 4 independent factorios with their own ram requirements. My world takes 4gb(5gb is not excessive because you want multithreading to work on big maps) of memory so i would need over 16gb of ram to run factorio on 4 cores. I highly doubt that a lot of people has enough ram for that.

Multithreading would be cool, but we shouldn't push for it to happen sooner, when it's better to optimize factorio for a single thread. If that's not what this thread is about atm then i don't know what it's for.
Last edited by keyboardhack on Sat Feb 21, 2015 2:23 am, edited 1 time in total.
Waste of bytes : P
tux_mark_5
Long Handed Inserter
Long Handed Inserter
Posts: 68
Joined: Thu Jan 15, 2015 2:20 pm
Contact:

Re: Multithreaded game update loop

Post by tux_mark_5 »

By 'easily' I referred to 'the most time consuming tasks can be divided into chunks' bit.

When I tested 4 instances of factorio, they consumed less than 8 GB of ram combined, which is a reasonable amount of RAM for any descent machine. Also, it is unclear how much of that memory is textures, sounds and other data than can be shared across instances. And in the end, if player's pc is a wooden toaster, then you shouldn't expect great performance with vast maps anyways.

And I am not pushing for it to happen sooner. Guys, please read the previous posts first before making assumptions.

It's almost like you don't want a better factorio and are arguing just for the sake of it.
User avatar
SHiRKiT
Filter Inserter
Filter Inserter
Posts: 706
Joined: Mon Jul 14, 2014 11:52 pm
Contact:

Re: Multithreaded game update loop

Post by SHiRKiT »

kovarex wrote:There is no need to split the map into some parts, we have chunks already, and the most time consuming tasks can be divided into chunks, but it will have to do A LOT of special logic for proper synchronising of any action that can affect global state.

Not impossible, but hard, and yes optimising the core loop is much better now. Most of the stuff the engine is computing now can be simplified/avoided.
The general rule of thumb is this. Normally, optimizing the core logic is better than to parallize it. And since parallel computation brings so much bugs, and the Factorio team loves spending hours and hours fixing them, I still agree that's just not worthwise.
User avatar
cpy
Filter Inserter
Filter Inserter
Posts: 839
Joined: Thu Jul 31, 2014 5:34 am
Contact:

Re: Multithreaded game update loop

Post by cpy »

I guess if you can thread everything it will take a lot of time and effort. But maybe one day we'll see it happen? :D
When big factory is not big enough! Damn this determinism.
hoho
Filter Inserter
Filter Inserter
Posts: 681
Joined: Sat Jan 18, 2014 11:23 am
Contact:

Re: Multithreaded game update loop

Post by hoho »

tux_mark_5 wrote:By doubling the factory size, I mean exactly that. Not quadrupling. Factories are essentially made of various entities. Double the size == double the entity count. How these entities are distributed in a map shouldn't matter.
No, usually doubling the size of a factory consumes more than twice the amount of computing power and especially memory bandwidth as it likely won't fit to cache any more. I'm simplifying here but increasing the amount of entities also increases the amount of entities that have to be searched to find the right one to perform actions with. With naive algorithms it means doubling the amount quadruples the amount of lookup operations made.
tux_mark_5 wrote:I know for a fact that game can run in parallel quite well without running into IO bottlenecks or cache locality problems: I can easily run 4 instances of factorio on my 6-core CPU at the same time
You aren't hitting *any* cache locality problems with that sort of "test". You had literally zero data sharing between the game instances even if they were in same multiplayer world.

In best-case scenario modern CPUs can move a couple of tens of GBs of data per second from core to core *when that data is streamed*. When you need to update a few bytes here and there and flush cache lines to get it propagated to other cores it drops down to just a tiny fraction of that and it *will* cause extra cache misses that require very expensive trips to either L3 or RAM (not much of a difference in terms of latency for AMD stuff, somewhat better on Intel). A cache miss means several hundred cycles of the CPU just idling doing nothing and with parallel programming you (well, the compiler, actually) often need to force those cache misses to be able to get your data to other threads/cores.

Then there is the other "problem" of the game running at 60Hz update time or around 17ms per tick. I'm not sure how it is with windows 8 and up but before that when one allowed the OS to manage the threads it could easily mean they'd be put to sleep for 10ms. In linux it varies from 1-10ms depending on kernel settings. If the OS decides it's a good idea to throw your computing thread off the core it'll mean the cache gets trashed by whatever other thing gets put there. Not to mention it has to copy off the state of registers too.
tux_mark_5
Long Handed Inserter
Long Handed Inserter
Posts: 68
Joined: Thu Jan 15, 2015 2:20 pm
Contact:

Re: Multithreaded game update loop

Post by tux_mark_5 »

You aren't hitting *any* cache locality problems with that sort of "test". You had literally zero data sharing between the game instances even if they were in same multiplayer world.
Since I love repeating myself, I'll state this: one of the main points I made in my previous posts that a variant of parallelism can be implemented that requires little to no data sharing across threads, thus making your argument irrelevant. Also, factories can be laid out in a map in such a way that respective chunks would be independent (different electrical grids, no roboport overlaps, etc) achieving the same effect. And my test was mostly intended to disprove the notion that factories are IO bound and implementing any kind of parallelism is futile.

Edit:
A cache miss means several hundred cycles of the CPU just idling doing nothing and with parallel programming you (well, the compiler, actually) often need to force those cache misses to be able to get your data to other threads/cores.
You are wrong there. Compiler has no notion of threads beyond atomic operations (and even those are probably implemented using inline assembly). It does nothing to force cache-misses and so on.
Then there is the other "problem" of the game running at 60Hz update time or around 17ms per tick. I'm not sure how it is with windows 8 and up but before that when one allowed the OS to manage the threads it could easily mean they'd be put to sleep for 10ms. In linux it varies from 1-10ms depending on kernel settings. If the OS decides it's a good idea to throw your computing thread off the core it'll mean the cache gets trashed by whatever other thing gets put there.
Even if every update cycle is performed on a different core compared to the previous update, I doubt it would affect overall performance much. Factorio's working set is already larger than that of even most high-end CPU caches, thus it can be safe to assume, that all or most data involved in single update is transferred from RAM every update cycle.
Not to mention it has to copy off the state of registers too.
Really? That almost feels like gripping on straws. Register preservation has such a negligible performance impact that can be easily ignored.
User avatar
Alekthefirst
Fast Inserter
Fast Inserter
Posts: 104
Joined: Sat Feb 07, 2015 7:39 pm
Contact:

Re: Multithreaded game update loop

Post by Alekthefirst »

so this ended up being the "shit-throwing-thread"....
Factorio is a game about automating everything. One day, i hope i can automate shitty signatures just like this one.
mike_smit
Burner Inserter
Burner Inserter
Posts: 19
Joined: Mon Jan 19, 2015 10:59 am
Contact:

Re: Multithreaded game update loop

Post by mike_smit »

Since kovarex has stated that he will optimize single core performance it is best that you use this graph when buying your next CPU. Many people have a multicore CPU but a cheap crappy model

Use this to buy your next
http://www.cpubenchmark.net/singleThread.html
tux_mark_5
Long Handed Inserter
Long Handed Inserter
Posts: 68
Joined: Thu Jan 15, 2015 2:20 pm
Contact:

Re: Multithreaded game update loop

Post by tux_mark_5 »

Alekthefirst wrote:so this ended up being the "shit-throwing-thread"....
Sorry :(
Post Reply

Return to “General discussion”