Page 1 of 1

[1.1.107] Desync when using request_to_generate_chunks

Posted: Thu May 23, 2024 8:33 am
by Atraps003
Steps to reproduce-

1- Load map on a linux machine running headless factorio server
2- Connect to server with a windows 11 machine
3- Run /c game.surfaces[1].clear(true)

It's likely to desync instantly. Sometimes the surface needs to be cleared multiple times. In some situations getting a desync is more reliable if the seed is changed before clearing the surface.
/c local surface = game.surfaces[1] local mgs = surface.map_gen_settings mgs.seed = math.random(1111, 99999999) surface.map_gen_settings = mgs

Steps 1 and 2 are important. I couldn't get a desync running multiple instances on the same machine. As far as I can tell toggle-heavy-mode doesn't detect anything either.

freeplay.lua is altered. The following is added to it. Removing request_to_generate_chunks seems to fix the issue.

Code: Select all

script.on_event(defines.events.on_surface_cleared,
function(event)
game.surfaces[1].request_to_generate_chunks({0, 0}, 6)
game.surfaces[1].force_generate_chunk_requests()
end)

script.on_event(defines.events.on_chunk_generated,
function(event)
global.chunk_area = event.area
global.set_water_shallow = {}
global.water_count = 0
	for k, tile in pairs (game.surfaces[1].find_tiles_filtered{name = { "water", "deepwater" }, area = global.chunk_area}) do
	global.water_count = global.water_count + 1
	global.set_water_shallow[global.water_count] = {name = "water-shallow", position = tile.position}
	end
game.surfaces[1].set_tiles(global.set_water_shallow)
end)

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri May 24, 2024 12:26 pm
by Rseding91
Are you able to reproduce this without modifying freeplay.lua? As in, do the scripting with a mod that both computers have installed?

I'm wondering if maybe our detection of base game modifications missed it, and the linux end is running a vanilla freeplay.lua instead of the modified one.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Tue May 28, 2024 6:12 am
by Atraps003
Thanks for looking. I packaged it as a mod to test. It still desyncs.

https://mods.factorio.com/mod/desync-testing

Reproduction steps-
1- Install mod on win11 machine
2- New game > Mod scenarios > desync-testing
3- Save map, transfer map to linux machine with headless factorio server, transfer mod to server, start server loading transferred map.
4- Connect win11 machine to server
5- Run /c game.surfaces[1].clear(true)

If no desync try changing to this seed.

/c local surface = game.surfaces[1] local mgs = surface.map_gen_settings mgs.seed = 66181655 surface.map_gen_settings = mgs
/c game.surfaces[1].clear(true)


I have been testing with three machines. win11, win7, and linux.

win11 connected to itself doesn't desync
win7 connected to linux doesn't desync
win7 connected to win11 desyncs
win11 connected to linux desyncs

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Tue May 28, 2024 12:18 pm
by robot256
No idea what might be the cause, but can I suggest you attach a factorio.log from each of the machines? It includes some information on the computer hardware. Unless you already tried running different operating systems on the same machine, it's possible the difference is related to the hardware rather that or in addition to the OS.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Thu May 30, 2024 5:35 am
by Atraps003
Logs attached

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri May 31, 2024 6:52 pm
by Rseding91
I so far have been unsuccessful. Could you try something - just to eliminate a possible cause.

Could you boot into linux on the win11 machine's hardware and try connecting to the other linux machine and see if it desyncs?

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri Jun 07, 2024 7:19 am
by Atraps003
I booted linux instead of win11 on the i9-12900k machine and it still desyncs when connected to the E3-1271v3 linux server.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri Jun 07, 2024 2:43 pm
by Rseding91
That makes me think the hardware is the issue. Like the CPU itself is executing instructions wrong... but in the same way every time since it doesn't desync with itself. If the issue was software, it *should* reproduce when hosting on the same machine connected to itself.

Is the i9 machine overclocked? Or is the RAM overclocked on it? If so, what happens if you disable that overclocking?

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Sat Jun 08, 2024 9:23 am
by Atraps003
XMP was enabled but after disabling it still desyncs. I updated bios which "introduces the Intel Baseline Profile option, allowing users to revert to Intel factory default settings for basic functionality, lower power limits, and improving stability in certain games"

With intel baseline profile loaded it still desyncs.

It could be that my 12900k machine is faulty but what led me to believe otherwise is the unusual amount of desyncs occurring at the same time when surface is cleared from random players connected to the server with this scenario.

Would desync reports from these players be useful?

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Tue Jun 11, 2024 2:13 pm
by Rseding91
I don't think the other desync reports will be of much use. Looking at the one you already provided it's pointing at decoratives and tiles having generated differently between the server and the peers. Except I have no idea how that could happen.

The main blocker for me is being able to reproduce it on 1 machine. Requiring 2 machines is virtually impossible to hunt down desyncs unless it's a "do X action and it desyncs every time".

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Tue Jun 11, 2024 3:21 pm
by boskid
I am trying to reproduce this issue using windows+linux pair, i see the script running as it places shallow water in place of regular water however i see no desyncs happening, force_crc() is not throwing any desyncs and changing map gen settings does nothing.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Thu Jun 13, 2024 4:28 am
by Atraps003
My reproduction step of connecting windows to linux is wrong. It looks like it's not the os that is the issue but the cpu generation.

Multiplayer with group of new cpus will not desync. Multiplayer with group of old cpus will not desync. Multiplayer with a mix of new cpus and old cpus results in desync.

Here is the list of cpus I have been testing with.

Old cpu group-
Intel E3-1271v3
Intel i5-2500k
Intel i5-6260U
Amd 5600x

New cpu group-
Intel i9-12900k
Intel i5-12600kf
Amd 7800x3d

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Thu Jun 13, 2024 8:56 am
by boskid
Interesting, using 1.1.109 and running on two machines with i9-14900k and i7-5820k there is a desync happening, with reproduction steps working it should be possible to investigate what causes this.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Thu Jun 13, 2024 3:57 pm
by boskid
Ok so the investigation revealed that this is indeed a game bug. There are effectively 4 moving pieces involved here: requesting to generate multiple chunks, forcing them to generate "now", changing tiles inside of on_chunk_generated event and value reported by `std::thread::hardware_concurrency()` which depends on the CPU used.

Core of the issue is that when there are multiple chunks generation requests forced to complete now there is a special code that tries to keep certain amount of map gen tasks running and applies the results however those tasks raise on_chunk_generated event, and map gen tasks for creating entities are using tiles data at some point. When a CPU returns smaller value of hardware_concurrency there is less tasks that can capture tiles data before on_chunk_generated event is raised which due to scripting changes some tiles which causes subsequent tasks to see different tiles data compared to a CPU with larger hardware_concurrency value. Because map gen tasks saw different tiles, they produced different state of a game and desync happened.

In this specific case, a workaround would be to not touch tiles inside of on_chunk_generated but to delay this code so the tiles would be changed only when there are no map gen tasks running anymore. This however does not guarantee there wont be any desyncs related to map gen anymore since that revealed a quite significant hole in the map gen design. This definitely needs a fix on the engine side.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri Jun 14, 2024 4:51 pm
by coderpatsy
For reference, this bug got a writeup in FFF #415.

Re: [1.1.107] Desync when using request_to_generate_chunks

Posted: Fri Jun 14, 2024 7:08 pm
by Atraps003
Thanks for the information and bug fix.