Page 1 of 1

[1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Sun Apr 24, 2022 9:36 pm
by DeHackEd
I know "non-blocking saves" are experimental and all that, but this one is pretty easy to reproduce.

0. Have a very large map. Mine has a savegame size nearly 250 MB.

1. Run a multiplayer headless server with non-blocking save enabled and join the game.

2. Have the last (only?) player quit the server. With the last player gone the server will save the game automatically as a non-blocking operation.

3. While the above automatic save is still in progress, run "/quit" on the server console

Server begins saving the map (again) and hangs during shutdown. The original background save process is still running (hung) as well as the main game process which has released most of its memory during shutdown. Now both processes hang around, stuck. Even a double CTRL+C ("terminating immediately") doesn't kill it. (Never mind this bit, I forgot to detach the debugger)

Stack trace of main server process:

Code: Select all

(gdb) thread apply all bt

Thread 18 (Thread 0x7f10b1ae5700 (LWP 6530) "factorio"):
#0  0x00007f10bba3b8b3 in __GI___select (nfds=11, readfds=0x7f10b1ae42b0, writefds=0x0, exceptfds=0x0, timeout=0x7f10b1ae4260) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00000000011cc327 in UDPSocket::recvfrom (this=0x62e9920, packet=..., address=..., blocking=true) at /tmp/factorio-build-k3xmrB/src/Net/UDPSocket.cpp:433
#2  0x00000000011d0e10 in TransmissionControlHelper::recvFromSockets () at /tmp/factorio-build-k3xmrB/src/Net/TransmissionControlHelper.cpp:245
#3  TransmissionControlHelper::receive (this=0xb, message=...) at /tmp/factorio-build-k3xmrB/src/Net/TransmissionControlHelper.cpp:109
#4  0x000000000119c072 in RouterBase::readPacketsLoop (this=0x32e8f30) at /tmp/factorio-build-k3xmrB/src/Net/RouterBase.cpp:65
#5  0x0000000001fba820 in std::execute_native_thread_routine (__p=0x8558100) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#6  0x00007f10bbb13ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f10bba43def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

# Many "WorkerThread" threads skipped here

Thread 2 (Thread 0x7f10bb92c700 (LWP 6293) "TaskManager"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x2ad0634) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x2ad05e0, cond=0x2ad0608) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x2ad0608, mutex=0x2ad05e0) at pthread_cond_wait.c:638
#3  0x0000000001f4f11c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /home/build/gcc-11.2-source/gcc-11.2.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#4  std::__condvar::wait (__m=..., this=<optimized out>) at /home/build/gcc-11.2-source/gcc-11.2.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std_mutex.h:155
#5  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#6  0x00000000017e13aa in TaskManager::run () at /tmp/factorio-build-k3xmrB/src/Util/TaskManager.cpp:65
#7  operator() () at /tmp/factorio-build-k3xmrB/src/Util/TaskManager.cpp:9
#8  __invoke_impl<void, TaskManager::TaskManager()::<lambda()> > () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:61
#9  __invoke<TaskManager::TaskManager()::<lambda()> > () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:96
#10 _M_invoke<0> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:253
#11 operator() () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:260
#12 _M_run (this=0x2ad05e0) at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:211
#13 0x0000000001fba820 in std::execute_native_thread_routine (__p=0x2ae2d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#14 0x00007f10bbb13ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f10bba43def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f10bb941c40 (LWP 6290) "factorio"):
#0  ChildProcessAgent::waitForAll () at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:87
#1  GlobalContext::clearResources () at /tmp/factorio-build-k3xmrB/src/GlobalContext.cpp:707
#2  GlobalContext::~GlobalContext () at /tmp/factorio-build-k3xmrB/src/GlobalContext.cpp:872
#3  0x0000000001a9f7be in CommandLineMultiplayer::hostCommandLineMultiplayerGame(cxxopts::ParseResult const&, Filesystem::Path const&, Filesystem::Path const&, bool)::{lambda()#1}::operator()() const [clone .constprop.0] () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:351
#4  0x00000000006b6612 in ~ScopedCallback () at /tmp/factorio-build-k3xmrB/src/Util/ScopedCallback.hpp:60
#5  CommandLineMultiplayer::hostCommandLineMultiplayerGame () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:365
#6  main (argc=125767456, argv=0x7) at /tmp/factorio-build-k3xmrB/src/Main.cpp:654
Hung background save process:

Code: Select all

Thread 1 (Thread 0x7f10bb941c40 (LWP 9174) "factorio"):
#0  __libc_write (nbytes=4, buf=0x7ffd38335bdc, fd=17) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=17, buf=0x7ffd38335bdc, nbytes=4) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x0000000001208ed1 in AsyncScenarioSaver::doSave(std::function<void ()>)::{lambda(int)#1}::operator()(int) const::Observer::onProgressUpdate(float) [clone .lto_priv.0] () at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:169
#3  0x00000000013da72c in Surface::save (this=0x11, output=..., chunksFinished=0x4, chunkCount=3148996591, savingObserver=0x0) at /tmp/factorio-build-k3xmrB/src/Surface/Surface.cpp:949
#4  0x0000000001104687 in Map::save (this=0x11, output=..., savingObserver=0x4) at /tmp/factorio-build-k3xmrB/src/Map/Map.cpp:1470
#5  0x0000000001246e5d in Scenario::saveMap (this=0x11, saveType=(unknown: 0x38335bdc), packageWriter=..., targetSerialiser=..., savingObserver=0x0, gameStateCorruptionDetected=<error reading variable>) at /tmp/factorio-build-k3xmrB/src/Scenario/Scenario.cpp:1058
#6  0x0000000001248a1e in Scenario::saveAs (this=0x11, savePath=..., saveName=..., previewFullPath=..., savingObserver=0x0, saveType=(unknown: 0x1)) at /tmp/factorio-build-k3xmrB/src/Scenario/Scenario.cpp:941
#7  0x0000000001596a68 in operator() () at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:122
#8  __invoke_impl<void, AsyncScenarioSaver::doSave(std::function<void()>)::<lambda(int)>&, int> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:61
#9  __invoke_r<void, AsyncScenarioSaver::doSave(std::function<void()>)::<lambda(int)>&, int> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:111
#10 _M_invoke (__functor=..., __args#0=@0x7ffd38335bdc: 5) at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_function.h:291
#11 0x000000000079e0fa in std::function<void (int)>::operator()(int) const () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_function.h:560
#12 ChildProcessAgent::fork(std::function<void (int)> const&, std::function<void (int, ChildProcessAgent::ProcessStatus)> const&) (this=0x7ffd383369d0, runInChild=..., onFinished=...) at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:120
#13 0x0000000001230499 in AsyncScenarioSaver::doSave(std::function<void ()>) () at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:107
#14 AsyncScenarioSaver::save(std::function<void ()>, Scenario*, Filesystem::Path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, InfoBoxManager*, SaveType, std::function<void ()>) (finishedCallback=..., scenario=0x7ffd38335bdc, savePath=..., saveName=..., savePreview=false, infoBoxManager=0x1, saveType=(unknown: 0xfffffe00), doneCallback=...) at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:52
#15 0x000000000078c2e9 in AppManager::saveGameOnServerReal (this=0x7ffd38336bd0) at /tmp/factorio-build-k3xmrB/src/AppManager.cpp:423
#16 0x000000000078df68 in AppManager::process (this=0x50f3a10) at /tmp/factorio-build-k3xmrB/src/AppManager.cpp:313
#17 0x000000000085c0fa in MainLoop::prePrepare () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:561
#18 0x0000000001aa06ce in MainLoop::mainLoopStepHeadless () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:600
#19 MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) [clone .constprop.0] () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:389
#20 0x00000000006b6500 in hostMultiplayerGameInternal () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:299
#21 CommandLineMultiplayer::hostCommandLineMultiplayerGame () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:355
#22 main (argc=17, argv=0x7ffd38335bdc) at /tmp/factorio-build-k3xmrB/src/Main.cpp:654

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Sun Apr 24, 2022 9:44 pm
by Loewchen
Non-blocking saving is not supported.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Sun Apr 24, 2022 11:11 pm
by DeHackEd
I respectfully disagree and present evidence to the contrary.

Experimental and at my own risk? Yes. But I also have a consistent reproducer of the problem.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Sun Apr 24, 2022 11:35 pm
by FuryoftheStars
DeHackEd wrote:
Sun Apr 24, 2022 11:11 pm
I respectfully disagree and present evidence to the contrary.

Experimental and at my own risk? Yes. But I also have a consistent reproducer of the problem.
And yet: viewtopic.php?f=47&t=95511 (notice it's in the "Won't fix" category)

Probably all depends on what the actual issue is.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Mon Apr 25, 2022 3:11 am
by DeHackEd
In that particular example the user ran out of RAM really hard and Linux killed the saving process to free RAM. So that's a basic server administrative issue, not a Factorio issue. Should have been filed as "not a bug" (user's fault) rather than "Won't Fix" I'd say.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Mon Apr 25, 2022 4:04 am
by ptx0
wrong category. it's still a bug, even if you won't fix it. overzealous mods strike again, making Factorio just a little less good to satisfy their own ego.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Mon Apr 25, 2022 6:10 am
by ickputzdirwech
Keep it civil! Arguing on topic can be done without personal attacks.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Apr 26, 2022 12:27 pm
by DeHackEd
Right, well, with that out of the way, please consider this bug report and not just toss it aside.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Apr 26, 2022 1:54 pm
by robot256
From what I understand, the non-blocking save forks the thread using a funky Limux OS feature that allows the memory contents to diverge without copying everything all at once. You'd have to be able to abort the autosave thread and purge its memory allocation when the quit command is received. No idea how difficult that would be in this particular case or if it would mess up other things in the OS.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Apr 26, 2022 6:12 pm
by DeHackEd
That's one option. I think an acceptable alternative would be to wait until an existing non-blocking save finishes before trying another save, regardless of type. Perhaps if you want to be really smart, if the point in time being saved is the same in both situations (ie. server is paused) then let the non-blocking save finish and call it good.

My best bet is that the non-blocking save is stuck trying to report its save progress to the main server process and can't because said process is waiting for the saving process to exit rather than monitoring its progress, deadlocking both in a staring contest. This is the problem.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Oct 11, 2022 6:49 pm
by DeHackEd
I'd like to formally challenge this ruling that this isn't a bug. Non-blocking save bugs are being fixed today, invalidating the reason this was rejected.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Oct 11, 2022 6:50 pm
by Rseding91
DeHackEd wrote:
Tue Oct 11, 2022 6:49 pm
I'd like to formally challenge this ruling that this isn't a bug. Non-blocking save bugs are being fixed today, invalidating the reason this was rejected.
Still won't be worked on by anyone I know of.

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Tue Oct 11, 2022 7:37 pm
by DeHackEd
If it's a low priority, okay. Issues shutting down a headless server under specific circumstances are fairly minor. But file it under some other category besides "Not a bug".

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Posted: Sun Oct 23, 2022 4:52 pm
by ptx0
DeHackEd wrote:
Tue Oct 11, 2022 7:37 pm
If it's a low priority, okay. Issues shutting down a headless server under specific circumstances are fairly minor. But file it under some other category besides "Not a bug".
now it's in "Won't fix", because they still have disdain for their users