[1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Bugs that are actually features.
Post Reply
DeHackEd
Inserter
Inserter
Posts: 29
Joined: Sun Dec 13, 2020 12:31 am
Contact:

[1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by DeHackEd »

I know "non-blocking saves" are experimental and all that, but this one is pretty easy to reproduce.

0. Have a very large map. Mine has a savegame size nearly 250 MB.

1. Run a multiplayer headless server with non-blocking save enabled and join the game.

2. Have the last (only?) player quit the server. With the last player gone the server will save the game automatically as a non-blocking operation.

3. While the above automatic save is still in progress, run "/quit" on the server console

Server begins saving the map (again) and hangs during shutdown. The original background save process is still running (hung) as well as the main game process which has released most of its memory during shutdown. Now both processes hang around, stuck. Even a double CTRL+C ("terminating immediately") doesn't kill it. (Never mind this bit, I forgot to detach the debugger)

Stack trace of main server process:

Code: Select all

(gdb) thread apply all bt

Thread 18 (Thread 0x7f10b1ae5700 (LWP 6530) "factorio"):
#0  0x00007f10bba3b8b3 in __GI___select (nfds=11, readfds=0x7f10b1ae42b0, writefds=0x0, exceptfds=0x0, timeout=0x7f10b1ae4260) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x00000000011cc327 in UDPSocket::recvfrom (this=0x62e9920, packet=..., address=..., blocking=true) at /tmp/factorio-build-k3xmrB/src/Net/UDPSocket.cpp:433
#2  0x00000000011d0e10 in TransmissionControlHelper::recvFromSockets () at /tmp/factorio-build-k3xmrB/src/Net/TransmissionControlHelper.cpp:245
#3  TransmissionControlHelper::receive (this=0xb, message=...) at /tmp/factorio-build-k3xmrB/src/Net/TransmissionControlHelper.cpp:109
#4  0x000000000119c072 in RouterBase::readPacketsLoop (this=0x32e8f30) at /tmp/factorio-build-k3xmrB/src/Net/RouterBase.cpp:65
#5  0x0000000001fba820 in std::execute_native_thread_routine (__p=0x8558100) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#6  0x00007f10bbb13ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f10bba43def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

# Many "WorkerThread" threads skipped here

Thread 2 (Thread 0x7f10bb92c700 (LWP 6293) "TaskManager"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x2ad0634) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x2ad05e0, cond=0x2ad0608) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x2ad0608, mutex=0x2ad05e0) at pthread_cond_wait.c:638
#3  0x0000000001f4f11c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /home/build/gcc-11.2-source/gcc-11.2.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#4  std::__condvar::wait (__m=..., this=<optimized out>) at /home/build/gcc-11.2-source/gcc-11.2.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/std_mutex.h:155
#5  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
#6  0x00000000017e13aa in TaskManager::run () at /tmp/factorio-build-k3xmrB/src/Util/TaskManager.cpp:65
#7  operator() () at /tmp/factorio-build-k3xmrB/src/Util/TaskManager.cpp:9
#8  __invoke_impl<void, TaskManager::TaskManager()::<lambda()> > () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:61
#9  __invoke<TaskManager::TaskManager()::<lambda()> > () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:96
#10 _M_invoke<0> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:253
#11 operator() () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:260
#12 _M_run (this=0x2ad05e0) at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_thread.h:211
#13 0x0000000001fba820 in std::execute_native_thread_routine (__p=0x2ae2d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#14 0x00007f10bbb13ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f10bba43def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f10bb941c40 (LWP 6290) "factorio"):
#0  ChildProcessAgent::waitForAll () at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:87
#1  GlobalContext::clearResources () at /tmp/factorio-build-k3xmrB/src/GlobalContext.cpp:707
#2  GlobalContext::~GlobalContext () at /tmp/factorio-build-k3xmrB/src/GlobalContext.cpp:872
#3  0x0000000001a9f7be in CommandLineMultiplayer::hostCommandLineMultiplayerGame(cxxopts::ParseResult const&, Filesystem::Path const&, Filesystem::Path const&, bool)::{lambda()#1}::operator()() const [clone .constprop.0] () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:351
#4  0x00000000006b6612 in ~ScopedCallback () at /tmp/factorio-build-k3xmrB/src/Util/ScopedCallback.hpp:60
#5  CommandLineMultiplayer::hostCommandLineMultiplayerGame () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:365
#6  main (argc=125767456, argv=0x7) at /tmp/factorio-build-k3xmrB/src/Main.cpp:654
Hung background save process:

Code: Select all

Thread 1 (Thread 0x7f10bb941c40 (LWP 9174) "factorio"):
#0  __libc_write (nbytes=4, buf=0x7ffd38335bdc, fd=17) at ../sysdeps/unix/sysv/linux/write.c:26
#1  __libc_write (fd=17, buf=0x7ffd38335bdc, nbytes=4) at ../sysdeps/unix/sysv/linux/write.c:24
#2  0x0000000001208ed1 in AsyncScenarioSaver::doSave(std::function<void ()>)::{lambda(int)#1}::operator()(int) const::Observer::onProgressUpdate(float) [clone .lto_priv.0] () at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:169
#3  0x00000000013da72c in Surface::save (this=0x11, output=..., chunksFinished=0x4, chunkCount=3148996591, savingObserver=0x0) at /tmp/factorio-build-k3xmrB/src/Surface/Surface.cpp:949
#4  0x0000000001104687 in Map::save (this=0x11, output=..., savingObserver=0x4) at /tmp/factorio-build-k3xmrB/src/Map/Map.cpp:1470
#5  0x0000000001246e5d in Scenario::saveMap (this=0x11, saveType=(unknown: 0x38335bdc), packageWriter=..., targetSerialiser=..., savingObserver=0x0, gameStateCorruptionDetected=<error reading variable>) at /tmp/factorio-build-k3xmrB/src/Scenario/Scenario.cpp:1058
#6  0x0000000001248a1e in Scenario::saveAs (this=0x11, savePath=..., saveName=..., previewFullPath=..., savingObserver=0x0, saveType=(unknown: 0x1)) at /tmp/factorio-build-k3xmrB/src/Scenario/Scenario.cpp:941
#7  0x0000000001596a68 in operator() () at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:122
#8  __invoke_impl<void, AsyncScenarioSaver::doSave(std::function<void()>)::<lambda(int)>&, int> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:61
#9  __invoke_r<void, AsyncScenarioSaver::doSave(std::function<void()>)::<lambda(int)>&, int> () at /home/build/gcc-11.2/include/c++/11.2.0/bits/invoke.h:111
#10 _M_invoke (__functor=..., __args#0=@0x7ffd38335bdc: 5) at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_function.h:291
#11 0x000000000079e0fa in std::function<void (int)>::operator()(int) const () at /home/build/gcc-11.2/include/c++/11.2.0/bits/std_function.h:560
#12 ChildProcessAgent::fork(std::function<void (int)> const&, std::function<void (int, ChildProcessAgent::ProcessStatus)> const&) (this=0x7ffd383369d0, runInChild=..., onFinished=...) at /tmp/factorio-build-k3xmrB/src/ChildProcessAgent.cpp:120
#13 0x0000000001230499 in AsyncScenarioSaver::doSave(std::function<void ()>) () at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:107
#14 AsyncScenarioSaver::save(std::function<void ()>, Scenario*, Filesystem::Path const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, InfoBoxManager*, SaveType, std::function<void ()>) (finishedCallback=..., scenario=0x7ffd38335bdc, savePath=..., saveName=..., savePreview=false, infoBoxManager=0x1, saveType=(unknown: 0xfffffe00), doneCallback=...) at /tmp/factorio-build-k3xmrB/src/Scenario/AsyncScenarioSaver.cpp:52
#15 0x000000000078c2e9 in AppManager::saveGameOnServerReal (this=0x7ffd38336bd0) at /tmp/factorio-build-k3xmrB/src/AppManager.cpp:423
#16 0x000000000078df68 in AppManager::process (this=0x50f3a10) at /tmp/factorio-build-k3xmrB/src/AppManager.cpp:313
#17 0x000000000085c0fa in MainLoop::prePrepare () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:561
#18 0x0000000001aa06ce in MainLoop::mainLoopStepHeadless () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:600
#19 MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) [clone .constprop.0] () at /tmp/factorio-build-k3xmrB/src/MainLoop.cpp:389
#20 0x00000000006b6500 in hostMultiplayerGameInternal () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:299
#21 CommandLineMultiplayer::hostCommandLineMultiplayerGame () at /tmp/factorio-build-k3xmrB/src/CommandLineMultiplayer.cpp:355
#22 main (argc=17, argv=0x7ffd38335bdc) at /tmp/factorio-build-k3xmrB/src/Main.cpp:654
Attachments
factorio-current.log
(7.6 KiB) Downloaded 12 times


DeHackEd
Inserter
Inserter
Posts: 29
Joined: Sun Dec 13, 2020 12:31 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by DeHackEd »

I respectfully disagree and present evidence to the contrary.

Experimental and at my own risk? Yes. But I also have a consistent reproducer of the problem.

FuryoftheStars
Smart Inserter
Smart Inserter
Posts: 1198
Joined: Tue Apr 25, 2017 2:01 pm
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by FuryoftheStars »

DeHackEd wrote:
Sun Apr 24, 2022 11:11 pm
I respectfully disagree and present evidence to the contrary.

Experimental and at my own risk? Yes. But I also have a consistent reproducer of the problem.
And yet: viewtopic.php?f=47&t=95511 (notice it's in the "Won't fix" category)

Probably all depends on what the actual issue is.

DeHackEd
Inserter
Inserter
Posts: 29
Joined: Sun Dec 13, 2020 12:31 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by DeHackEd »

In that particular example the user ran out of RAM really hard and Linux killed the saving process to free RAM. So that's a basic server administrative issue, not a Factorio issue. Should have been filed as "not a bug" (user's fault) rather than "Won't Fix" I'd say.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1358
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by ptx0 »

wrong category. it's still a bug, even if you won't fix it. overzealous mods strike again, making Factorio just a little less good to satisfy their own ego.

User avatar
ickputzdirwech
Filter Inserter
Filter Inserter
Posts: 702
Joined: Sun May 07, 2017 10:16 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by ickputzdirwech »

Keep it civil! Arguing on topic can be done without personal attacks.
Mods: Shortcuts for 1.1, ick's Sea Block, ick's vanilla tweaks
Tools: Atom language pack
Text quickly seems cold and unfriendly. Be careful how you write and interpret what others have written.
- A reminder for me and all who read what I write

DeHackEd
Inserter
Inserter
Posts: 29
Joined: Sun Dec 13, 2020 12:31 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by DeHackEd »

Right, well, with that out of the way, please consider this bug report and not just toss it aside.

robot256
Filter Inserter
Filter Inserter
Posts: 428
Joined: Sun Mar 17, 2019 1:52 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by robot256 »

From what I understand, the non-blocking save forks the thread using a funky Limux OS feature that allows the memory contents to diverge without copying everything all at once. You'd have to be able to abort the autosave thread and purge its memory allocation when the quit command is received. No idea how difficult that would be in this particular case or if it would mess up other things in the OS.

DeHackEd
Inserter
Inserter
Posts: 29
Joined: Sun Dec 13, 2020 12:31 am
Contact:

Re: [1.1.57] Headless server can have 2 saves (one non-blocking) running at once, hangs

Post by DeHackEd »

That's one option. I think an acceptable alternative would be to wait until an existing non-blocking save finishes before trying another save, regardless of type. Perhaps if you want to be really smart, if the point in time being saved is the same in both situations (ie. server is paused) then let the non-blocking save finish and call it good.

My best bet is that the non-blocking save is stuck trying to report its save progress to the main server process and can't because said process is waiting for the saving process to exit rather than monitoring its progress, deadlocking both in a staring contest. This is the problem.

Post Reply

Return to “Not a bug”