Desync resolver: reverse parallel execution

Post your ideas and suggestions how to improve the game.

Moderator: ickputzdirwech

Post Reply
Plawerth
Fast Inserter
Fast Inserter
Posts: 118
Joined: Thu Mar 02, 2017 12:57 am
Contact:

Desync resolver: reverse parallel execution

Post by Plawerth »

I am not a Factorio coder but I understand programming in general.

Apparently a "game save" contains the entire deterministic state needed to continue running from the moment of the save.

And apparently a desync report contains both the deterministic state of the client and server of the exact tick where a desync occurred.

If the game engine is truly deterministic then it seems it should be possible to walk backwards through the programming, turning the tick counter backwards, running the code backwards, to find where a desync occurred.

,

In order to wind the tick counter backwards it is necessary to know what choices the emulated lockstep processor made, and what it was last executing when the desync was detected. The reverse process then runs the game backwards through the code, doing the opposite of whatever was programmed, to undo game events.

To make this easier, may need to log processor subroutine call events as a sort of "COME FROM" list to undo procedural jumps that can come from multiple sources.

It would also need to log the nondeterministic player action data coming in from the server to the client so that it can be used to undo player triggered activity.

But it is not necessary to log everything the processor did, such as math operations, if/then statements, etc, as those can be performed by merely running the actual game code in reverse. Only the significant choice events made outside the direct linear calculations need to be logged.

,

The person doing the debugging needs to decide how much memory to use to store this state tracking data, which sets a limit on how far back the code can be unwound.

The two saved states from the client and server are wound backwards together until the desynced elements reconverge back together again and are identical once more.

This then can be used to repeatedly wind the processor clock forward and backward across the problem code to find the source of the desync.

,

Apparently this would need help from the core development team as apparently only they know the specific details of how the lockstep processor emulation works.

User avatar
boskid
Factorio Staff
Factorio Staff
Posts: 2250
Joined: Thu Dec 14, 2017 6:56 pm
Contact:

Re: Desync resolver: reverse parallel execution

Post by boskid »

No.

State contained in desync is some ticks later than desync occured because client seeing crc difference must send request "hey, something wrong, save me map for desync report" and this can arrive to server after some ticks later.

Game being deterministic means that `f(state, input_actions) -> new_state` will give same new_state on each players game instance and so each player only needs state at some point (downloading map from server) and then only needs to receive succesive input_actions for each tick to be in sync with server. It does not mean it can be reversed. This would require keeping history of every decision taken by each entity. Copper plate is going through belt and there was inserter - was it put here by inserter or was traveling by belt from somewhere else? Too much effort and performance penalty to what it would give.

-- edit:
And how would you guarantee that reverting is correct? There was desync so something went wrong, reverting ticks would not guarantee that this recovered previous state is truly what it was. This would require saving map every tick and this this is not acceptable.

mrvn
Smart Inserter
Smart Inserter
Posts: 5709
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Desync resolver: reverse parallel execution

Post by mrvn »

If you enable replay wouldn't that record everything needed to go back to the last tick where client and server agreed and then single step each tick to find where they diverged?

User avatar
boskid
Factorio Staff
Factorio Staff
Posts: 2250
Joined: Thu Dec 14, 2017 6:56 pm
Contact:

Re: Desync resolver: reverse parallel execution

Post by boskid »

mrvn wrote:
Mon Oct 14, 2019 10:01 am
If you enable replay wouldn't that record everything needed to go back to the last tick where client and server agreed and then single step each tick to find where they diverged?
Almost yes(not undo, only replay from start to given tick), replay is just set of input_actions for each tick and so it is possible to reach desync tick. However this process is not reliable: replay will be lost when changing mods, when changing version, when replay was not enabled, when replay was disabled in game, when using editor to import map (this disables replay) or when game was running so long that replaying whole game would take ages to redo. Also how would you describe in terms of input_actions how to reach player's state and how to reach server's state to investigate? Desync is because same input_actions gave different results and so tick of interest is exactly one tick later.

Plawerth
Fast Inserter
Fast Inserter
Posts: 118
Joined: Thu Mar 02, 2017 12:57 am
Contact:

Re: Desync resolver: reverse parallel execution

Post by Plawerth »

mrvn is asking a different but related question, though with the same theme of better desync debugging.

I understand about improper use of global vs local variables often being the result of crashes, so to properly replay desync causing actions may require both logging input actions, and also of local variables used by the client vs the server, in order to capture improperly defined global/local variables.

To narrow this down, allow selective logging of:
- local variables used by the core game
- local variables used by scenario softmods defined in control.lua
- local variables used by individual "real" mods

Oxyd
Former Staff
Former Staff
Posts: 1428
Joined: Thu May 07, 2015 8:42 am
Contact:

Re: Desync resolver: reverse parallel execution

Post by Oxyd »

What do you mean by “lockstep processor emulation”?

mrvn
Smart Inserter
Smart Inserter
Posts: 5709
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Desync resolver: reverse parallel execution

Post by mrvn »

Plawerth wrote:
Mon Oct 14, 2019 10:48 am
mrvn is asking a different but related question, though with the same theme of better desync debugging.
Full replay data or actually replaying wouldn't even be necessary.

The game should keep the CRC for all ticks not yet synced between all clients and a replay/event history going back that far. Then on desync it can compare the CRCs on the server and client to find which tick exactly caused the desync and the replay/event history to show what user input and events where executed that tick. The desync output could then for example show on_entity_built() was called making finding the desync cause that much simpler.

Post Reply

Return to “Ideas and Suggestions”