Using the graphics card to help the CPU out…

mrvn · Post by **mrvn** » Mon Sep 27, 2021 3:44 pm

ptx0 wrote: Mon Sep 27, 2021 3:27 pm
quinor wrote: Fri Sep 24, 2021 9:16 pm Yeah, that is part of the issue. To really get a lot of performance out of Factorio the engine would have to be designed from scratch in an extremely parallellizable way: kindof cellular-automaton style where state of each object depends only on previous state of a itself and a limited number of neighbours. That also means removing all of the annoying race conditions like two inserters trying to pick up the same object / from the same container and all other things that depend on entity update order. It is possible, but it would likely result in a game that "feels" different in a number of small ways - and it could be that some content would have to be modified or cut.

But I'm pretty sure they have thought of most of that already - as I said, I have a great amount of respect for the team and trust that they are very capable.
it'd be interesting to have the game's entity update loop work on chunk groups that behave as a dependency graph.

chunk groups would be designated during building placement time, based upon a grouping of all other entities this one must be able to interact with, and place it into a thread group.

but if your factory is all interconnected, that's not much help.

You can have all inserters pick up items first, then move all belts and run assemblers and furnaces and then have all inserters drop items or such. Then your groups would be inserters taking from the same chest or dropping into the same assembler.

While the whole factory interacts that is separated by time. Anything that interacts only next tick is fine. You just have to catch those thing that interact in the same tick.

What you really need to solve is that all inserters behave the same no matter what order you process them in. If you can do that then it's trivial to run them in threads, be that 8 CPU cores or 8000 GPU cores.

Post by **ssilk** » Tue Sep 28, 2021 5:03 am

mrvn wrote: Mon Sep 27, 2021 3:44 pm What you really need to solve is that all inserters behave the same no matter what order you process them in. If you can do that then it's trivial to run them in threads, be that 8 CPU cores or 8000 GPU cores.

I think that’s the crux: you have to have an order, otherwise on inserter can pick an item depending, which of them comes first. It would not be deterministic then.

SoShootMe · Post by **SoShootMe** » Tue Sep 28, 2021 9:57 am

mrvn wrote: Mon Sep 27, 2021 3:44 pm While the whole factory interacts that is separated by time. Anything that interacts only next tick is fine. You just have to catch those thing that interact in the same tick.

What you really need to solve is that all inserters behave the same no matter what order you process them in. If you can do that then it's trivial to run them in threads, be that 8 CPU cores or 8000 GPU cores.

Your first point implies keeping track of "groups" such as multiple inserters taking items from the same chest, which must be handled in a particular order. However, one thing that often causes many otherwise-independent things to - potentially - interact is power.

It's still not clear this would be practical to implement, would offer significant opportunity for parallelism, or that any benefit wouldn't be lost due to overhead in enabling parallelism (such as transferring data in the GPU case). But that's no criticism; the last in particular seems challenging to reliably assess without doing most of the work.

mrvn · Post by **mrvn** » Tue Sep 28, 2021 4:10 pm

ssilk wrote: Tue Sep 28, 2021 5:03 am
mrvn wrote: Mon Sep 27, 2021 3:44 pm What you really need to solve is that all inserters behave the same no matter what order you process them in. If you can do that then it's trivial to run them in threads, be that 8 CPU cores or 8000 GPU cores.
I think that’s the crux: you have to have an order, otherwise on inserter can pick an item depending, which of them comes first. It would not be deterministic then.

That's where the grouping of inserters would come into play. Instead of seeing 2 inserters picking from a chest why not internally model them as one inserter that can pick up 2 things and drop them at 2 locations? That double-inserter then has to make 2 decisions but it's deterministic again. Now with long reach and inserters all around an assembler the group of inserters you have to merge can be big. At some point the gain from running in parallel will be eaten up by having every inserter be as complex as the largest group you may have to handle. At least on the GPU where you would want to process them as a matrix and not individual inserters with different complexities.

SoShootMe wrote: Tue Sep 28, 2021 9:57 am
mrvn wrote: Mon Sep 27, 2021 3:44 pm While the whole factory interacts that is separated by time. Anything that interacts only next tick is fine. You just have to catch those thing that interact in the same tick.

What you really need to solve is that all inserters behave the same no matter what order you process them in. If you can do that then it's trivial to run them in threads, be that 8 CPU cores or 8000 GPU cores.
Your first point implies keeping track of "groups" such as multiple inserters taking items from the same chest, which must be handled in a particular order. However, one thing that often causes many otherwise-independent things to - potentially - interact is power.

It's still not clear this would be practical to implement, would offer significant opportunity for parallelism, or that any benefit wouldn't be lost due to overhead in enabling parallelism (such as transferring data in the GPU case). But that's no criticism; the last in particular seems challenging to reliably assess without doing most of the work.

Power in factorio is rather simple and I don't see a problem there. Factorio simply adds up all the power wanted by all the entities (in 3 priority levels) and all the power provided (again 3 priority levels). It then computes how much power if consumed from and provided for each level and then each entity gets the same fraction of energy. Each phase of the computation is totally parallelizable.

One effect I think hasn't been mentioned yet: cache locality. If you process each inserter fully on it's own then you have all the data for the inserter at hand in the local cache. If you process them in multiple phases though that means running through all inserters multiple times looking only at parts of the data each time. This can be seriously slower because by the time the first phase ends the data for the first inserter might have been evicted from caches. And a cache miss loads a full cache line while you only need a fraction of that. One might have to split up the data of the inserter so you can put everything needed for one phase of the process into one big array and put other data somewhere else.

For example you would have one huge array of electrical needs and every inserter, assembler, chemical plant, ... would just have an index / pointer into that array. Then the code to compute electricity can compute the sum of that array with far less memory bandwidth wasted. Drawback: When the inserter needs to check it's own power level it has to follow the extra indirection.

Overall there are a lot of things to consider. Optimizing one thing makes others worse. And most of the time the only thing to do is try. Try rewriting all the inserter code to be parallel just to see if it actually runs better. For a company that is a lot of time they have to pay their programmers.

gGeorg · Post by **gGeorg** » Sat Oct 02, 2021 3:52 pm

Here is an example of new generation game engine, it runs mostly inside graphic card.
Each character has independent pathing and iindependent AI. All It is computed in the graphic card.
Also get noticed all the arows in the air, those are also computed, balistic curve, and do damage on hit.
Milions characteres and arrows, and blood on screen. Total war series start to choke self with few hundreds.
BTW blood flows like water, and characters can get drowned in blood.

https://www.youtube.com/watch?v=m0xXdtLf2Tw

Smart heads around could explain how they managed the impossible ?

PunkSkeleton · Post by **PunkSkeleton** » Sat Oct 02, 2021 5:21 pm

First you need to make Factorio run efficently on arbitrary number of CPU threads, which is not the case. That is way, way easier than using GPU.
Only then you can think about using GPU to simulate parts of the factory and try to solve all the problems related to GPU computing.

UkcsAlias · Post by **UkcsAlias** » Mon Oct 04, 2021 12:21 pm

I would say that any logic that doesnt need to be accurate can be put in a seperate thread (doesnt matter if cpu or gpu). Basicly all the predictable stuff that generaly requires a certain time to pass.

For example for logistic robots, if you know their travel time is going to be 10 seconds. You can put 9.9s worth of calculating in a diffirent thread as the exact location isnt going to be 'that' relevant. If the bots then reach a state in which accuracy becomes relevant (near destination, or near enemies), you shift those to the main thread again.
You could even calculate in which frame they would run out of power if that is something that is going to be relevant. And again, this doesnt need to be frame accurate. It can be calculated 3 frames after sending them away, and still have the information ready.

So at frame 5990 there are still 10 frames remaining, but if it goes to the main thread again. Even if it was off by 1 frame. This can be corrected. Detecting whether accurately timed calculations are required can be decided per chunk. It saves 5990 frames on this already. And considering the high number of bots usualy flying, i would say it can save quite a lot.

Maybe this even works for assemblers as their production completion is something that can be predicted ahead aswel (dont know how the status is tracked on this).

Now i dont expect clockspeeds to differ that much, but it shows that not all information has to be perfectly in sync in order to work properly.

mrvn · Post by **mrvn** » Mon Oct 04, 2021 12:59 pm

UkcsAlias wrote: Mon Oct 04, 2021 12:21 pm I would say that any logic that doesnt need to be accurate can be put in a seperate thread (doesnt matter if cpu or gpu). Basicly all the predictable stuff that generaly requires a certain time to pass.

For example for logistic robots, if you know their travel time is going to be 10 seconds. You can put 9.9s worth of calculating in a diffirent thread as the exact location isnt going to be 'that' relevant. If the bots then reach a state in which accuracy becomes relevant (near destination, or near enemies), you shift those to the main thread again.
You could even calculate in which frame they would run out of power if that is something that is going to be relevant. And again, this doesnt need to be frame accurate. It can be calculated 3 frames after sending them away, and still have the information ready.

So at frame 5990 there are still 10 frames remaining, but if it goes to the main thread again. Even if it was off by 1 frame. This can be corrected. Detecting whether accurately timed calculations are required can be decided per chunk. It saves 5990 frames on this already. And considering the high number of bots usualy flying, i would say it can save quite a lot.

Maybe this even works for assemblers as their production completion is something that can be predicted ahead aswel (dont know how the status is tracked on this).

Now i dont expect clockspeeds to differ that much, but it shows that not all information has to be perfectly in sync in order to work properly.

Any case like that you don't calculate at all every tick. You simply record the start time and speed and if anything needs the current state it's calculated straight from that. And that remains frame accurate.

Frame accuracy is actually rather important. Otherwise bots would jump whenever they switch back to accurate which would be horrible to the eye.

UkcsAlias · Post by **UkcsAlias** » Mon Oct 04, 2021 9:39 pm

mrvn wrote: Mon Oct 04, 2021 12:59 pm Any case like that you don't calculate at all every tick. You simply record the start time and speed and if anything needs the current state it's calculated straight from that. And that remains frame accurate.

For machines this can be done, because you generaly only need the accurate state when you open its window. Outside of that it can just continue and wait for the frame.

mrvn wrote: Mon Oct 04, 2021 12:59 pm Frame accuracy is actually rather important. Otherwise bots would jump whenever they switch back to accurate which would be horrible to the eye.

Im am aware that frame accuracy is important, but graphics wise it usualy isnt as much as players do rarely have such good eyes to notice it. And sure, some still might, but it at least shouldnt look ugly to them. And as long as you know the backend handles it frame perfectly, there is no issue anyway. Its like interpolation frames when you render a game at 120fps, while the server has 100fps, a good engine will render it at a very reliable way, and players then wont complain.

We can go even further with that, because if you are zoomed out heavily, you wont notice those things anyway. So these strict calculations are only needed when zoomed in anyway.

But anyway, if you want it more smooth... in that case the main thread should communicate with the secondary thread to provide a way to synchronize (probably communicate the frame number its processing.

The main thread only needs to communicate things for the secondary thread to calculate, and the frame number each frame that it is handling. This way you will only get a single frame off (depending on when the frame number gets sent), and 1 frame is not going to be noticed as easily, especialy if the animation remains that frame off until the end, while the item was transported at the intended frame.

The secondary thread can on that also communicate back that a certain object needs accurate calculations a few frames before they matter. If a bot is about to run out of power, you know that several frames before that. The main thread can then just read the state at whatever frame the bot was, and conitinue from that (either 1 or 2 frames). And if something has to change communicate it back to the secondary thread (the new path it will take and which frame this starts).

The threads in a CPU are normaly at the same clockspeed. So im not expecting a large gap of frames to appear. The main issues will start when the FPS starts to drop though, as that does slow down the game pace. And then the secondary threads get it a lot more difficult to synchronize as then the frames arent as smooth as they used to be, so predicting becomes harder. And caching such info ahead would consume quite a bit of RAM. But with that frame communication from the main thread (which should remain the heavier one anyway), you can mitigate issues here, as even at a slower framerate, the secondary thread will see it by getting the next frame call later.

Thats why i think that making the secondary threads only usable for things that do not realy require frame perfect accuracy is the better way (as to some degree they do, but thats mostly the deliver time itself, not the animation). Since the idea is that they arent going to be off like 10 frames anyway, its more like 1 or 2 frames at most. They will use info from the main thread to get a relatively accurate state.

And with it, the animation can just smooth itself to whatever state is needed. If its 3 frames ahead, you interpolate that by making the next 9 frames slower (each 3 frames you only make it move effectively worth 2 frames and just interpolate the locations to get a smooth animation). If you know how much FPS games use interpolation for animations, a 2d game like this isnt going to suffer a lot from that. There is a reason why in many games the models do not match the hitboxes, most players simply dont notice. And those that do, usualy know why, and do compensate in most cases (unless its bad netcode and positions are very unreliable).

But even then, i think keeping the animation slightly behind isnt realy going to be an issue if the items are still delivered in time. As thats the most important frame accuracy. However, in the end this isnt going to be suitable for GPU calculations, as its still going to remain rather strict.

mrvn · Post by **mrvn** » Tue Oct 05, 2021 1:11 am

UkcsAlias wrote: Mon Oct 04, 2021 9:39 pm
mrvn wrote: Mon Oct 04, 2021 12:59 pm Any case like that you don't calculate at all every tick. You simply record the start time and speed and if anything needs the current state it's calculated straight from that. And that remains frame accurate.
For machines this can be done, because you generaly only need the accurate state when you open its window. Outside of that it can just continue and wait for the frame.
mrvn wrote: Mon Oct 04, 2021 12:59 pm Frame accuracy is actually rather important. Otherwise bots would jump whenever they switch back to accurate which would be horrible to the eye.
Im am aware that frame accuracy is important, but graphics wise it usualy isnt as much as players do rarely have such good eyes to notice it. And sure, some still might, but it at least shouldnt look ugly to them. And as long as you know the backend handles it frame perfectly, there is no issue anyway. Its like interpolation frames when you render a game at 120fps, while the server has 100fps, a good engine will render it at a very reliable way, and players then wont complain.

Factorio never renders more FPS than UPS. There is no interpolation between frames ever.

That said, if your bots are off by just one pixel and then jump to the right spot always at the same distance from the goal that becomes noticeable. Since it's totally uneccessary the idea is bad.

UkcsAlias wrote: Mon Oct 04, 2021 9:39 pm But anyway, if you want it more smooth... in that case the main thread should communicate with the secondary thread to provide a way to synchronize (probably communicate the frame number its processing.

The main thread only needs to communicate things for the secondary thread to calculate, and the frame number each frame that it is handling. This way you will only get a single frame off (depending on when the frame number gets sent), and 1 frame is not going to be noticed as easily, especialy if the animation remains that frame off until the end, while the item was transported at the intended frame.

The secondary thread can on that also communicate back that a certain object needs accurate calculations a few frames before they matter. If a bot is about to run out of power, you know that several frames before that. The main thread can then just read the state at whatever frame the bot was, and conitinue from that (either 1 or 2 frames). And if something has to change communicate it back to the secondary thread (the new path it will take and which frame this starts).

And now you completely killed parallelity. Communication is the death of multithreading. Communication creates a dependency between the threads that requires complex and expensive code to work.

Also the idea of "a few frames before they matter" is pointless. Factorio lives tick to tick and is deterministic. You always know exactly to the tick when something will happen or you don't know it at all.

It seems to me like you want to split calculation to multiple machines where communications then have a lag measured in ticks or close to a tick. Only then would it make sense to warn some master instance that something will happen soon and it has to take over. That's not how multithreading works. Communication is costly compared to simple calculations but the delay is measured in CPU cycles, not ticks.

UkcsAlias wrote: Mon Oct 04, 2021 9:39 pm But even then, i think keeping the animation slightly behind isnt realy going to be an issue if the items are still delivered in time. As thats the most important frame accuracy. However, in the end this isnt going to be suitable for GPU calculations, as its still going to remain rather strict.

If animations are consistently behind then you won't notice it for the running of the factory.

But when you move or click a lag of a few frames becomes noticeable. It just starts to feel wrong. Might be an even bigger issue for combat. Dodging worms becomes harder for example. The character simply have a slower reaction time and gamers are very tuned to that.

UkcsAlias · Post by **UkcsAlias** » Tue Oct 05, 2021 11:46 am

mrvn wrote: Tue Oct 05, 2021 1:11 am That said, if your bots are off by just one pixel and then jump to the right spot always at the same distance from the goal that becomes noticeable. Since it's totally uneccessary the idea is bad.

No, the animation is off by 1 frame, and just delays its rendering by 1 frame. You dont see any warping here.

mrvn wrote: Tue Oct 05, 2021 1:11 am And now you completely killed parallelity. Communication is the death of multithreading. Communication creates a dependency between the threads that requires complex and expensive code to work.

That depends on the communication type. If a thread can just go on and only needs synchronization at the end. It might be active the first half, and be inactive the second half. But synchronization in this case ensures the thread isnt going ahead too far.

Note that at 60 UPS, and 1Ghz (which is quite slow - edit, and also a useless thing to mention in this example, but oh well), you have 16.6 milliseconds of time (and each milisecond allows thousands of calculations). If that alternative threads takes away 50% of the main thread and seperates it in 2 25% threads. It technicaly means the calculations are done at half the time. So a few ms can be used to synchronize the information again. The alternative threads can already continue to prepare the next frame, while the main thread is still busy with the current one.

That is synchronization between paralel systems. And its not a bad practice to do at all, as long as you keep it limited. Sure, for factorio it is going to have some inefficient systems (wait time). But as those times are predictable, it should be well manageble CPU wise as you can just tell a thread to sleep for a certain duration in order to prevent it constantly holding a thread.

mrvn wrote: Tue Oct 05, 2021 1:11 am Also the idea of "a few frames before they matter" is pointless. Factorio lives tick to tick and is deterministic. You always know exactly to the tick when something will happen or you don't know it at all.

It seems to me like you want to split calculation to multiple machines where communications then have a lag measured in ticks or close to a tick. Only then would it make sense to warn some master instance that something will happen soon and it has to take over. That's not how multithreading works. Communication is costly compared to simple calculations but the delay is measured in CPU cycles, not ticks.

The reason i said a few frames, was because a secondary thread might just continue processing expecting the framerate to be stable. Yet on heavier games, it no longer might be. But with sturdy synchronization between threads this doesnt happen anyway.

mrvn wrote: Tue Oct 05, 2021 1:11 am If animations are consistently behind then you won't notice it for the running of the factory.

But when you move or click a lag of a few frames becomes noticeable. It just starts to feel wrong. Might be an even bigger issue for combat. Dodging worms becomes harder for example. The character simply have a slower reaction time and gamers are very tuned to that.

All lag should be 1 frame. I used more frames as an example of how bad it can be, while players still often dont notice it.

And also, in network games there is already lag. You cant guarantee all players to have less than 16ms latency.

mrvn · Post by **mrvn** » Tue Oct 05, 2021 2:49 pm

UkcsAlias wrote: Tue Oct 05, 2021 11:46 am
mrvn wrote: Tue Oct 05, 2021 1:11 am That said, if your bots are off by just one pixel and then jump to the right spot always at the same distance from the goal that becomes noticeable. Since it's totally uneccessary the idea is bad.
No, the animation is off by 1 frame, and just delays its rendering by 1 frame. You dont see any warping here.

The problem is that as you described it as some point the main thread takes over and accurately calculates the animation. So it jumps from being off by 1 frame to being off by 0 frames. That jump is the problem.

UkcsAlias wrote: Tue Oct 05, 2021 11:46 am
mrvn wrote: Tue Oct 05, 2021 1:11 am And now you completely killed parallelity. Communication is the death of multithreading. Communication creates a dependency between the threads that requires complex and expensive code to work.
That depends on the communication type. If a thread can just go on and only needs synchronization at the end. It might be active the first half, and be inactive the second half. But synchronization in this case ensures the thread isnt going ahead too far.

Note that at 60 UPS, and 1Ghz (which is quite slow - edit, and also a useless thing to mention in this example, but oh well), you have 16.6 milliseconds of time (and each milisecond allows thousands of calculations). If that alternative threads takes away 50% of the main thread and seperates it in 2 25% threads. It technicaly means the calculations are done at half the time. So a few ms can be used to synchronize the information again. The alternative threads can already continue to prepare the next frame, while the main thread is still busy with the current one.

That is synchronization between paralel systems. And its not a bad practice to do at all, as long as you keep it limited. Sure, for factorio it is going to have some inefficient systems (wait time). But as those times are predictable, it should be well manageble CPU wise as you can just tell a thread to sleep for a certain duration in order to prevent it constantly holding a thread.

That's not how you would do it. You split the work up into way more threads than the cpu has. Then each thread wakes up, calculates one tick and goes back to sleep. Then the next thread does it's work. By splitting it up that much you can balance the threads across cores. If this tick inserters take less time then some other thread can run on the core after it. If it takes more time then more threads are run on different cores. Only at the end of the tick you might run out of threads and have cores idle. So you want to put threads that will take lots of time at the front and quick ones at the back to minimize the potential loss at the end.

And in factorio you then have the UPS and FPS parts. If the graphics parts leaves any cores idle you can start calculating the next tick already. But there really is no point in having one thread race ahead of others.

UkcsAlias wrote: Tue Oct 05, 2021 11:46 am
mrvn wrote: Tue Oct 05, 2021 1:11 am Also the idea of "a few frames before they matter" is pointless. Factorio lives tick to tick and is deterministic. You always know exactly to the tick when something will happen or you don't know it at all.

It seems to me like you want to split calculation to multiple machines where communications then have a lag measured in ticks or close to a tick. Only then would it make sense to warn some master instance that something will happen soon and it has to take over. That's not how multithreading works. Communication is costly compared to simple calculations but the delay is measured in CPU cycles, not ticks.
The reason i said a few frames, was because a secondary thread might just continue processing expecting the framerate to be stable. Yet on heavier games, it no longer might be. But with sturdy synchronization between threads this doesnt happen anyway.

mrvn wrote: Tue Oct 05, 2021 1:11 am If animations are consistently behind then you won't notice it for the running of the factory.

But when you move or click a lag of a few frames becomes noticeable. It just starts to feel wrong. Might be an even bigger issue for combat. Dodging worms becomes harder for example. The character simply have a slower reaction time and gamers are very tuned to that.
All lag should be 1 frame. I used more frames as an example of how bad it can be, while players still often dont notice it.

And also, in network games there is already lag. You cant guarantee all players to have less than 16ms latency.

And players noticed. That's why the game now has a bunch of predictive and latency hiding code.

All lag should be 0 frames. Most games have 2 or more frames. That's why people say you need to play at 120fps or more in many action games. Even though the monitor probably only has 60Hz you still benefit from the reduction in lag.

UkcsAlias · Post by **UkcsAlias** » Tue Oct 05, 2021 3:14 pm

mrvn wrote: Tue Oct 05, 2021 2:49 pm And players noticed. That's why the game now has a bunch of predictive and latency hiding code.

All lag should be 0 frames. Most games have 2 or more frames. That's why people say you need to play at 120fps or more in many action games. Even though the monitor probably only has 60Hz you still benefit from the reduction in lag.

Lag and latency are diffirent on that. Latency cant be removed, lag can be compensated. And i was talking about latency because server/client.

And no matter what, one of them is going to see lag. Either the client lag is compensated, so the server gets the client actions 1 frame afterward. Or the server simply refuses the lag compensation, causing a client to get 2 frames of lag (sending is 1 frame of time, receiving is 1 frame).

No lag is impossible by pure physics.

mrvn · Post by **mrvn** » Tue Oct 05, 2021 6:39 pm

UkcsAlias wrote: Tue Oct 05, 2021 3:14 pm
mrvn wrote: Tue Oct 05, 2021 2:49 pm And players noticed. That's why the game now has a bunch of predictive and latency hiding code.

All lag should be 0 frames. Most games have 2 or more frames. That's why people say you need to play at 120fps or more in many action games. Even though the monitor probably only has 60Hz you still benefit from the reduction in lag.
Lag and latency are diffirent on that. Latency cant be removed, lag can be compensated. And i was talking about latency because server/client.

And no matter what, one of them is going to see lag. Either the client lag is compensated, so the server gets the client actions 1 frame afterward. Or the server simply refuses the lag compensation, causing a client to get 2 frames of lag (sending is 1 frame of time, receiving is 1 frame).

No lag is impossible by pure physics.

You can use the "as if" principal there. You behave as if there were no lag and calculate the lag or latency away. You can timestamp actions with sub tick accuracy and factor them into the simulation even after the fact. So if the player hits "W" mid frame calculations you register that after the frame, check if the action was valid and if so figure it into the simulation after the fact. Then on the next frame the character will show up as having been moving for 1.5 frames already. If you are really good you do all the frame calculations and at the very end you update for player input that came in meanwhile. That update pass then avoids even having 1 frame out of sync.

Unfortunately you can't compensate for the delay between the GPU compositing the display and the TFT showing the image. At least not on the first frame after an action. You can do it for followup frames. But then you have to predict that e.g. the player will keep pressing "W" for a while yet, so display the character a bit ahead where it would be by time the frame actually displays. and so on.

Factorio does some compenstions for the network latency in multiplayer but the same idea applies locally to. The delay is just orders smaller, usually. Try playing at 1 FPS.

UkcsAlias · Post by **UkcsAlias** » Wed Oct 06, 2021 9:21 am

mrvn wrote: Tue Oct 05, 2021 6:39 pm You can use the "as if" principal there. You behave as if there were no lag and calculate the lag or latency away. You can timestamp actions with sub tick accuracy and factor them into the simulation even after the fact. So if the player hits "W" mid frame calculations you register that after the frame, check if the action was valid and if so figure it into the simulation after the fact. Then on the next frame the character will show up as having been moving for 1.5 frames already. If you are really good you do all the frame calculations and at the very end you update for player input that came in meanwhile. That update pass then avoids even having 1 frame out of sync.

Unfortunately you can't compensate for the delay between the GPU compositing the display and the TFT showing the image. At least not on the first frame after an action. You can do it for followup frames. But then you have to predict that e.g. the player will keep pressing "W" for a while yet, so display the character a bit ahead where it would be by time the frame actually displays. and so on.

Factorio does some compenstions for the network latency in multiplayer but the same idea applies locally to. The delay is just orders smaller, usually. Try playing at 1 FPS.

But that similar compensation can be done on the bots aswel. Again, the visuals do not actualy match the backend in that case, but the experience is as if it is. Which was what i tried to point out there. Sure, the 'act as if it is 1 frame ahead' solution might be nicer than being 1 frame behind. But its a similar thing.

The alternative threads on that can even behave similar by being that frame ahead, and potentialy just build up commands for the main thread (a bot goes empty, a new command is sent). This is a completely diffirent approach to the solution to what i suggested, but similar in result (1 thread controls movement, the other controls delivery. and only 1 of those aspects is in the main thread).

its probably going to be trial and error to find out which method suits best, but i do believe that there is an option that will work in a convincing way to the players without disrupting anything backend related.

Post by **ssilk** » Sat Oct 09, 2021 4:30 am

gGeorg wrote: Sat Oct 02, 2021 3:52 pm Here is an example of new generation game engine, it runs mostly inside graphic card.
Each character has independent pathing and iindependent AI. All It is computed in the graphic card.
Also get noticed all the arows in the air, those are also computed, balistic curve, and do damage on hit.
Milions characteres and arrows, and blood on screen. Total war series start to choke self with few hundreds.
BTW blood flows like water, and characters can get drowned in blood.

https://www.youtube.com/watch?v=m0xXdtLf2Tw

Smart heads around could explain how they managed the impossible ?

Thanks, this was really fascinating. There must be tenthousands of death per second. The bodies build big hills.
And that’s also what I mean with Factorio using the gpu. But I mean the determinism is not longer ensured here. If you play that game two times, the result is surely the same, but there are not exactly the same deaths at the same time. But I’m just guessing.

gGeorg · Post by **gGeorg** » Sat Oct 09, 2021 8:22 am

ssilk wrote: Sat Oct 09, 2021 4:30 am
gGeorg wrote: Sat Oct 02, 2021 3:52 pm Here is an example of new generation game engine, it runs mostly inside graphic card.
Each character has independent pathing and iindependent AI. All It is computed in the graphic card.
Also get noticed all the arows in the air, those are also computed, balistic curve, and do damage on hit.
Milions characteres and arrows, and blood on screen. Total war series start to choke self with few hundreds.
BTW blood flows like water, and characters can get drowned in blood.

https://www.youtube.com/watch?v=m0xXdtLf2Tw

Smart heads around could explain how they managed the impossible ?
Thanks, this was really fascinating. There must be tenthousands of death per second. The bodies build big hills.
And that’s also what I mean with Factorio using the gpu. But I mean the determinism is not longer ensured here. If you play that game two times, the result is surely the same, but there are not exactly the same deaths at the same time. But I’m just guessing.

It is new concept, no one else succeed to use graphic card to this scale of computation AND usage as graphics card. It is hard to tell if it is predictible or not. Determinism definitely helps reduce network traffic for multiplayer, lets see when Brilliant studious release a game based on the engine, how multiplayer works, how many players are supported.

mrvn · Post by **mrvn** » Sat Oct 09, 2021 12:55 pm

ssilk wrote: Sat Oct 09, 2021 4:30 am
gGeorg wrote: Sat Oct 02, 2021 3:52 pm Here is an example of new generation game engine, it runs mostly inside graphic card.
Each character has independent pathing and iindependent AI. All It is computed in the graphic card.
Also get noticed all the arows in the air, those are also computed, balistic curve, and do damage on hit.
Milions characteres and arrows, and blood on screen. Total war series start to choke self with few hundreds.
BTW blood flows like water, and characters can get drowned in blood.

https://www.youtube.com/watch?v=m0xXdtLf2Tw

Smart heads around could explain how they managed the impossible ?
Thanks, this was really fascinating. There must be tenthousands of death per second. The bodies build big hills.
And that’s also what I mean with Factorio using the gpu. But I mean the determinism is not longer ensured here. If you play that game two times, the result is surely the same, but there are not exactly the same deaths at the same time. But I’m just guessing.

Why? If you start with the exact same conditions both times the GPU will make the exact same decisions every time. Unless you purposefully add randomness in there the result will always be the same.

Note: randomness can be the order of soldiers in memory. You have to make sure that saving and loading will not scramble things around. But that's already true in factorio with inserters. Change their order and the order in which they pick up stuff changes.

quyxkh · Post by **quyxkh** » Sat Oct 09, 2021 5:28 pm

I think his point is, all the shader compilers on all the different gpus are generally not trying to achieve or even enable bit-perfect matching of results, not even necessarily from one compile to the next. They're trying to achieve better results by some metric, where "better" literally means appreciably different, not just in time-to-render. That's not what you want when your calculations depend on every player seeing bit-identical results.

Post by **ssilk** » Sat Oct 16, 2021 8:54 am

@mrvn: it is much more complex very fast. For example a different number of GPU-kernels can make a difference in result, if they are not somehow synchronized. This synchronization can become super complex (number of kernels is just one factor of many) for quite simple tasks.

Factorio Forums

Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…

Re: Using the graphics card to help the CPU out…