Parrallel processing in games & applications

Post all other topics which do not belong to any other category.
mrvn
Smart Inserter
Smart Inserter
Posts: 5696
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

Rseding91 wrote:
Mon Feb 08, 2021 8:30 pm
elfstone wrote:
Mon Feb 08, 2021 6:03 pm
... it might be possible to run different surfaces on different cores, since events on one surface don't interact with things on other surfaces, which should make multi threading a lot easier.
Events are global; Lua events even more so. So they do interact with everything regardless of surfaces. Surfaces in fact are nothing special except more chunks. Very little is actually seperate per-surface. The entirety of force-related anything is global, events are global, player stuff is global.
Linked chests connect surfaces. And can't mods add circuit wires between entities on different surfaces too?

mrvn
Smart Inserter
Smart Inserter
Posts: 5696
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

Trific wrote:
Tue Feb 09, 2021 8:07 pm
Programming for multithreading is a bit like designing a train network in Factorio. In both cases you have to avoid deadlocks. Except in a program, you may have millions or billions of trains each carrying a unique cargo, and you're up against the constraints that you can't have two trains carrying the same cargo, and no amount of throwing extra trains or fancy signaling at the problem is going to get a single cargo somewhere any faster than the max speed of the train.
And that is probably the biggest misconception of why most multithreading code shows no improvements.

You don't put 1000 trains on a single rail line and expect them to be faster. That just remains sequential no matter how you multithread it. And the extra signals will slow everything down so your trains will actually take far longer to arrive.

You build 1000 rail lines with one train each and no cross overs. Now all trains run in parallel at 1000 times the throughput.

The way factorio is designed everything depends on everything else and every entity has to be evaluated in the correct (or at least same on all clients) order or you desync. That makes multithreading basically impossible any more than it is. You first have to isolate groups of entities that don't interact at al. Like the pipes are split into fluid systems that do not connect anywhere. Those things that are easily split like that wube has split by now.

But that is a fundamental design problem of how entity updates work in factorio. The action of one entity affects other entities in the same tick so everything depends on everything else and thus you would need locking or clever dependency analysis to multithreading, both of which add more overhead than you gain from more cores. And with updates not being broken down into small chunks that fit in caches the memory is the true limiting factor as the devs have stated frequently.

Going back to the train analogy: Once you put down your tracks all you can do is mitigate the track design with signals to gain some parallelity. But ultimately the way you designed the tracks will limit you. Only way to improve at a certain point (which factorio seem to have reached) is to deconstruct it all and design a better track layout.

mrvn
Smart Inserter
Smart Inserter
Posts: 5696
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

NotoriousPyro wrote:
Sun Feb 27, 2022 9:44 pm
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
https://stackoverflow.com/a/34681101/3812928

If you can do it so easily then why don't you apply for a job and do it? If you want it so bad you would do it for free if it's so easy.
That's not asynchronous, just plain multithreaded. Asynchronous, multithreaded would be one cook putting the eggs on the stove and the toast in the toaster and setting timers and another taking them out when the timers go off. Or Multithreaded, asynchronous would be one cook putting eggs on the stove and setting a timer for all the order and taking them off then each timer goes off. And another cook doing the same with toasts.


Where the analogy breaks down is having to pay them. The customers already bought the CPUs so they are free to use. It's like every customer brings their own cooks into the restaurant.

FuryoftheStars
Smart Inserter
Smart Inserter
Posts: 2530
Joined: Tue Apr 25, 2017 2:01 pm
Contact:

Re: Parrallel processing in games & applications

Post by FuryoftheStars »

mrvn wrote:
Tue Oct 04, 2022 3:52 pm
NotoriousPyro wrote:
Sun Feb 27, 2022 9:44 pm
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
https://stackoverflow.com/a/34681101/3812928

If you can do it so easily then why don't you apply for a job and do it? If you want it so bad you would do it for free if it's so easy.
That's not asynchronous, just plain multithreaded.
My understanding may be off, but that does still seem like asynchronous, multithreaded to me. In that example you've actually got 3 cooks: the original cook who is the one taking in the main task and then assigning sub-tasks for the order to the other two (eggs & toast). The original cook can then do other needed tasks for that order (prepping the plate or anything else needed for it), but ultimately needs to wait until both the eggs and toast are done before being able to send it out the door.
My Mods: Classic Factorio Basic Oil Processing | Sulfur Production from Oils | Wood to Oil Processing | Infinite Resources - Normal Yield | Tree Saplings (Redux) | Alien Biomes Tweaked | Restrictions on Artificial Tiles

mrvn
Smart Inserter
Smart Inserter
Posts: 5696
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Parrallel processing in games & applications

Post by mrvn »

FuryoftheStars wrote:
Tue Oct 04, 2022 4:27 pm
mrvn wrote:
Tue Oct 04, 2022 3:52 pm
NotoriousPyro wrote:
Sun Feb 27, 2022 9:44 pm
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
https://stackoverflow.com/a/34681101/3812928

If you can do it so easily then why don't you apply for a job and do it? If you want it so bad you would do it for free if it's so easy.
That's not asynchronous, just plain multithreaded.
My understanding may be off, but that does still seem like asynchronous, multithreaded to me. In that example you've actually got 3 cooks: the original cook who is the one taking in the main task and then assigning sub-tasks for the order to the other two (eggs & toast). The original cook can then do other needed tasks for that order (prepping the plate or anything else needed for it), but ultimately needs to wait until both the eggs and toast are done before being able to send it out the door.
Maybe. Maybe better described as the worker-threads or thread pool model. You have a bunch of cooks and one chef that orders them around and coordinates the kitchen.

Note: only the chef is working asynchronous. All the cooks work sequentially in the above, that's why I said it's not asynchronous.

FuryoftheStars
Smart Inserter
Smart Inserter
Posts: 2530
Joined: Tue Apr 25, 2017 2:01 pm
Contact:

Re: Parrallel processing in games & applications

Post by FuryoftheStars »

mrvn wrote:
Tue Oct 04, 2022 5:17 pm
Note: only the chef is working asynchronous. All the cooks work sequentially in the above, that's why I said it's not asynchronous.
Yes, but I think that was the point of the examples that were given in that post. The "synchronous", "asynchronous, single threaded", etc were supposed to be from the viewpoint of "you" (the original cook), not the additional cooks or all aspects of the given examples.
My Mods: Classic Factorio Basic Oil Processing | Sulfur Production from Oils | Wood to Oil Processing | Infinite Resources - Normal Yield | Tree Saplings (Redux) | Alien Biomes Tweaked | Restrictions on Artificial Tiles

blazespinnaker
Filter Inserter
Filter Inserter
Posts: 665
Joined: Wed Sep 16, 2020 12:45 pm
Contact:

Re: Parrallel processing in games & applications

Post by blazespinnaker »

mrvn wrote:
Tue Oct 04, 2022 3:45 pm
And that is probably the biggest misconception of why most multithreading code shows no improvements.
There's a misconception. Multithreading does exist (belts, one of the biggest mainstays of the game). The choice seemed to be to do it tactically rather than architecturally. My guess is that it was all built with a single core in mind in order to increase stability, which paid off I think for speedrunning (noticeable lack of crazy hacks you find in other speedruns). And then people built these crazy bases, and so they shoe-horned some stuff in...

In the end though, there is nothing magic about 'determinism' and there is nothing all that end-of-the-world with parallelism. It's just a series of trade offs and choices that are made. I'm skeptical it makes sense to fight mod authors on this, mods will always be flakey regardless, but it is what it is.
OptimaUPS Mod, pm for info.

cashtrevor
Manual Inserter
Manual Inserter
Posts: 2
Joined: Tue Sep 06, 2022 5:30 pm
Contact:

Re: Parrallel processing in games & applications

Post by cashtrevor »

This thread has had me curious about using GPU's for game development.

I've worked on a top-down simulation game that uses OpenCL as the primary language for ingame logic. So far it has lead to some interesting design challenges.

1. All OpenCL game logic code uses fixed-point numbers for determinism. primarily to make the addition of numbers across all threads deterministic. The result of A = B + C is the same as A = C + B.

2. All OpenCL game structures use pointer offsets integers instead of native pointers to keeping gamestate coherent for transfer across network and saving/loading to disk.

3. The Transfers TO/FROM CPU<->GPU are minimized by issuing rendering calls on the CPU and altering OpenGL VBO's on the GPU (OpenGL shared graphics objects) The only need for communication back to the CPU is for sound effects and other metadata.

4. For lockstep networking - commands are sent to the GPU and then processed there.

I have been able to simulate 35k units with simple fixedpoint physics and collision detection at < 32fps all determistically in sync across the network. With A* Pathfinding and such as well.

The Prototype can be downloaded here for anyone wishing to try it out:
https://gameprototypes.itch.io/astroid-miner
https://www.youtube.com/watch?v=8bW2YIXNxsA
Should work on NVIDIA and AMD graphics cards with at least 4gb ram. (tested on Radion RX 460, and NVIDIA RTX 2080)

FuryoftheStars
Smart Inserter
Smart Inserter
Posts: 2530
Joined: Tue Apr 25, 2017 2:01 pm
Contact:

Re: Parrallel processing in games & applications

Post by FuryoftheStars »

cashtrevor wrote:
Tue Nov 08, 2022 6:13 pm
This thread has had me curious about using GPU's for game development.

[...]

I have been able to simulate 35k units with simple fixedpoint physics and collision detection at < 32fps all determistically in sync across the network. With A* Pathfinding and such as well.
Here's one that I happened to be looking at the other day that does pathfinding through the GPU: https://store.steampowered.com/app/1468 ... mulator_2/

Course, none of that helps this game now. :P
My Mods: Classic Factorio Basic Oil Processing | Sulfur Production from Oils | Wood to Oil Processing | Infinite Resources - Normal Yield | Tree Saplings (Redux) | Alien Biomes Tweaked | Restrictions on Artificial Tiles

sthalik
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Tue May 01, 2018 9:32 am
Contact:

Re: Parrallel processing in games & applications

Post by sthalik »

To the amazing devs,

Have you tried running SwapBuffers in parallel, with or without multiple contexts? The basic idea is to run the event loop, window handling and SwapBuffers on the main thread, then do updates and rendering on thread #2. When it's time to submit a frame after drawing, signal the main thread to wake up. When it's time to draw again (on thread #2), make sure the previous SwapBuffers call on the main thread has already completed.

It used to be the case that most render time (up to 7 ms) was spent inside that call, not on drawing arrays or binding textures as I'd initially suspect on my 1060 and 270X's.

User avatar
TheKillerChicken
Long Handed Inserter
Long Handed Inserter
Posts: 70
Joined: Sat Mar 02, 2019 7:06 am
Contact:

Re: Parrallel processing in games & applications

Post by TheKillerChicken »

I use the NUMA setup to get the max out of my memory. My system can support 8-channel DDR3-1866 LR3DS ram. But I would like to see SMT technology integrated into this game at some point. Right now I just funnel all my bandwidth through my 4 sticks of DRAM whereas my server uses 2 sticks. My problem is I like to put yotta-scale values and 4 billion stacks with 10 million+ construction robots, so I am bound to feel the wrath of DRAM bottlenecking to the ultra-extreme. The funny thing is, my server UPS stays at 60. My gamer and my server are the exact same hardware specs CPU-wise. I have 128G on my gamer, and 64G on my server so I can catch up easily. I am not familiar with paraellisation (I can't spell it) but perhaps SMT is possible later on. Great thing about my gamer is that I have a mainframe motherboard, so I can manipulate where the bandwidth can go within my QPI. I wonder how this game will run off of DDR XD-RAM? Fastest I have seen is 5.6TB/s bandwidth on XD-RAM. Either way, this is an awesome game and Wube is an incredible company. Thanks for making such a game.

Ondrashek06
Manual Inserter
Manual Inserter
Posts: 2
Joined: Sat Apr 30, 2022 8:25 pm
Contact:

Re: Parrallel processing in games & applications

Post by Ondrashek06 »

At this point, we should just say "rewrite Factorio in ASM". C++ is already highly performant, unless the devs want to use C, ASM is the only option.

mov eax, ebx
int 0x80
push eax
add eax, ecx
pop eax
...etc. For Factorio to emerge, there would have to be hundreds of thousands of ASM instructions.

cashtrevor
Manual Inserter
Manual Inserter
Posts: 2
Joined: Tue Sep 06, 2022 5:30 pm
Contact:

Re: Parrallel processing in games & applications

Post by cashtrevor »

I wanted to follow up with my experiments with parrellism from above.

While using OpenCL does work. I came to the conclusion that the tooling is just not there yet for game development completely on GPU. (as compared with compilers etc for x86_64, arm, etc). one starts to miss C++ features. CUDA uses c++ but there is no good way to support AMD as well. (the clang compiler may get there at some point).

porting game code to CPU (while still using an arbritrary amount of threads) was not too bad, and runs comparitively fast - (faster single threaded performance). Entity updates are split evenly between all threads so 1000 entities and 10 threads would be 100 entities updating in each thread.

So I am still using a massively multithreaded approach. where a game update consists of N number of stages where threads must hit a thread barrier and synchronize after each stage). Then fitting game mechanics inside those stages.

https://gameprototypes.itch.io/astroid-miner

Premu
Fast Inserter
Fast Inserter
Posts: 100
Joined: Wed Nov 13, 2019 4:40 pm
Contact:

Re: Parrallel processing in games & applications

Post by Premu »

Ondrashek06 wrote:
Sat Sep 09, 2023 7:43 am
At this point, we should just say "rewrite Factorio in ASM". C++ is already highly performant, unless the devs want to use C, ASM is the only option.

mov eax, ebx
int 0x80
push eax
add eax, ecx
pop eax
...etc. For Factorio to emerge, there would have to be hundreds of thousands of ASM instructions.
Good luck with that. :D There's a reason more abstract programming languages were developped, and compilers take care about optimizations on machine level. In most cases, any attempt of a programmer to outdo the compiler backfire.

Besides that, the issue with parallel computing is to identify which parts of the software can be run in parallel while staying deterministic. You don't simplify that by making your code far harder to understand and maintain.

Post Reply

Return to “General discussion”