[0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

This subforum contains all the issues which we already resolved.
Post Reply
movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

[0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

Linux amd64, headless server.

No mods.

Code: Select all

 591.562 Info ServerMultiplayerManager.cpp:1189: Auto saving map as /home/factorio1/factorio/saves/test1.zip
 591.562 Info AppManager.cpp:395: Saving game as /home/factorio1/factorio/saves/test1.zip
 591.590 Info AsyncScenarioSaver.cpp:144: Saving process PID: 14812
 591.973 Info ChildProcessAgent.cpp:60: Child 14812 exited with return value 0
 591.973 Info AppManager.cpp:398: Saving finished
2390.665 Info ChildProcessAgent.cpp:60: Child 15665 exited with return value 0
2390.665 Warning ChildProcessAgent.cpp:79: We were not waiting for that child
I checked the entire log, and there is no reference to pid 15665 anywhere.

There was no clients connected when this thing happened. Before that only one client connected few times, and then quit. So the game is essentially paused since time 590.571.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

I was really curious about it, and started monitoring using forkstat, with filtering out all irrelevant other processes.

Even with no clients connected, I see factorio does do a fork() exactly every 30 seconds. It looks like the forked process is calling `sh -c /usr/bin/lsb_release -ri`. Every 30 seconds!

Code: Select all

Time     Event   PID Info   Duration Process
19:22:57 fork   9987 parent          ../factorio/bin/x64/factorio --server-settings server-settings-movax.json --server-adminlist server-adminlist.json --console-log server-console.log --start-server /home/factorio1/factorio/saves/test1.zip
19:22:57 fork  11355 child           ../factorio/bin/x64/factorio --server-settings server-settings-movax.json --server-adminlist server-adminlist.json --console-log server-console.log --start-server /home/factorio1/factorio/saves/test1.zip
19:22:57 exec  11355                 sh -c /usr/bin/lsb_release -ri
19:22:57 fork  11355 parent          sh -c /usr/bin/lsb_release -ri
19:22:57 fork  11356 child           sh -c /usr/bin/lsb_release -ri
19:22:57 exec  11356                 /usr/bin/python3 -Es /usr/bin/lsb_release -ri
19:22:57 exit  11356      0   0.054s /usr/bin/python3 -Es /usr/bin/lsb_release -ri
19:22:57 exit  11355      0   0.069s sh -c /usr/bin/lsb_release -ri
Probably related. And also wasteful. Please cache this information once? I am not sure why would factorio want to even know my distro in the first place (maybe for the crash reporter?). This is headless server running factorio. No steam involved.

BTW.

Code: Select all

$ /usr/bin/lsb_release -ri
Distributor ID:	Debian
Release:	10
$

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by ptx0 »

just out of curiosity are you using the non blocking save feature here?

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

ptx0 wrote:
Fri Feb 14, 2020 7:06 pm
just out of curiosity are you using the non blocking save feature here?
Yes. But as I said game was paused and no clients were connected. Looking at logs, the all async save child processed exited without issues. It could be related tho of course.

Rseding91
Factorio Staff
Factorio Staff
Posts: 13209
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by Rseding91 »

Well, the only way that line ever shows up is if you're using the non-blocking saving feature. Otherwise that code just never runs.
If you want to get ahold of me I'm almost always on Discord.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

Rseding91 wrote:
Fri Feb 14, 2020 7:30 pm
Well, the only way that line ever shows up is if you're using the non-blocking saving feature. Otherwise that code just never runs.
Ok. Makes sense. But why would there be some dangling child that only finished 40 minutes after the game was already paused / last async autosave happened?

Rseding91
Factorio Staff
Factorio Staff
Posts: 13209
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by Rseding91 »

movax20h wrote:
Fri Feb 14, 2020 7:33 pm
Rseding91 wrote:
Fri Feb 14, 2020 7:30 pm
Well, the only way that line ever shows up is if you're using the non-blocking saving feature. Otherwise that code just never runs.
Ok. Makes sense. But why would there be some dangling child that only finished 40 minutes after the game was already paused / last async autosave happened?
Dunno. It's an experimental feature that nobody really maintains. It has potential unknown issues.

Unless it's causing some crash I don't consider it worth looking into. If one of the other devs wants to they can.
If you want to get ahold of me I'm almost always on Discord.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

Rseding91 wrote:
Fri Feb 14, 2020 7:34 pm
movax20h wrote:
Fri Feb 14, 2020 7:33 pm
Rseding91 wrote:
Fri Feb 14, 2020 7:30 pm
Well, the only way that line ever shows up is if you're using the non-blocking saving feature. Otherwise that code just never runs.
Ok. Makes sense. But why would there be some dangling child that only finished 40 minutes after the game was already paused / last async autosave happened?
Dunno. It's an experimental feature that nobody really maintains. It has potential unknown issues.

Unless it's causing some crash I don't consider it worth looking into. If one of the other devs wants to they can.
Sure, absolutely. It is super minor issue, if issue at all. It was just something that shouldn't happen, and could have been an indication of some deeper bug, thus I reported it.

Thanks.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

It just happened again:

Code: Select all

2020-02-14 19:55:16 [JOIN] movax20h joined the game
3086.449 Info ServerMultiplayerManager.cpp:1039: Disconnect notification for peer (3)
3086.449 Info ServerMultiplayerManager.cpp:921: updateTick(360551) received stateChanged peerID(3) oldState(InGame) newState(DisconnectScheduled)
2020-02-14 19:55:18 [LEAVE] movax20h left the game
3086.467 Info ServerSynchronizer.cpp:618: nextHeartbeatSequenceNumber(91568) removing peer(3).
3087.459 Info ServerMultiplayerManager.cpp:1189: Auto saving map as /home/factorio1/factorio/saves/test1.zip
3087.460 Info AppManager.cpp:395: Saving game as /home/factorio1/factorio/saves/test1.zip
3087.488 Info AsyncScenarioSaver.cpp:144: Saving process PID: 12475
3087.873 Info ChildProcessAgent.cpp:60: Child 12475 exited with return value 0
3087.873 Info AppManager.cpp:398: Saving finished

4826.560 Info ChildProcessAgent.cpp:60: Child 14098 exited with return value 0
4826.561 Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Again, with no connected clients. Unfortunately I didn't have forkstat running, but I will keep looking at it, but probably just ignore.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

I just got it on forkstat!

Code: Select all

5966.560 Info ChildProcessAgent.cpp:60: Child 14616 exited with return value 0
5966.561 Warning ChildProcessAgent.cpp:79: We were not waiting for that child
it is in fact this lsb_release thingy

Code: Select all

Time     Event   PID Info   Duration Process
20:43:18 fork   9987 parent          ../factorio/bin/x64/factorio --server-settings server-settings-movax.json --server-adminlist server-adminlist.json --console-log server-console.log --start-server /home/factorio1/factorio/saves/test1.zip
20:43:18 fork  14616 child           ../factorio/bin/x64/factorio --server-settings server-settings-movax.json --server-adminlist server-adminlist.json --console-log server-console.log --start-server /home/factorio1/factorio/saves/test1.zip
20:43:18 exec  14616                 sh -c /usr/bin/lsb_release -ri
20:43:18 fork  14616 parent          sh -c /usr/bin/lsb_release -ri
20:43:18 fork  14617 child           sh -c /usr/bin/lsb_release -ri
20:43:18 exec  14617                 /usr/bin/python3 -Es /usr/bin/lsb_release -ri
20:43:18 exit  14617      0   0.053s /usr/bin/python3 -Es /usr/bin/lsb_release -ri
20:43:18 exit  14616      0   0.067s sh -c /usr/bin/lsb_release -ri
Time     Event   PID Info   Duration Proc
I will restart the server without non-blocking / async saving enable to confirm.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

Same happens with non-blocking save feature disabled.

movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by movax20h »

According to my checks with proc/tasks/X/wchan, it is the main thread doing it, and somewhere in mainLoopStepHeadless, just after the sleep where it sleeps for 30 seconds.

Code: Select all

(gdb) 
#0  0x00007fb3f6791720 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fb3f67bc874 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x000000000117c1c8 in MainLoop::mainLoopStepHeadless () at /tmp/factorio-build-CVxc7t/src/MainLoop.cpp:582
#3  0x000000000117cc93 in MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) ()
    at /tmp/factorio-build-CVxc7t/src/MainLoop.cpp:372
#4  0x0000000000577f43 in main () at /tmp/factorio-build-CVxc7t/src/Main.cpp:936
Really hard to pin point exactly, but it looks the mainLoopStepheadless is not using epool/select for waiting, nor it is waiting on any mutex / conditional variable, it is just sleeping unconditionally (modulo possibly receiving a signal).

Could it be that the child process finishes earlier than some global array of childs / forks is updated, and by the time the signal handler or wait on child is called, it is not yet updated?

I thought it might be possible to make it atomic is to use `clone(..., CLONE_STOPPED)`, update child arrays, then unstop it (signal SIGCONT), but it only works on very old kernels, and is now removed (and bits are used for other things now) completely.


I think the only solution is to wait for some signal from the parent after fork, but before the execv, indicating that it is ok to continue (and child table was updated). Otherwise there might be a race with a delivery of SIGCHILD to the parent after the process finishes.

Other option is to process SIGCHILD asynchronously in the main event loop, and only use the SIGCHILD handler to register the signals in some queue / array, but process them in the even loop. It can be tricky to get locking correct tho, especially when using non-recurvise locks.

posila
Factorio Staff
Factorio Staff
Posts: 5202
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: [0.18.6] Warning ChildProcessAgent.cpp:79: We were not waiting for that child

Post by posila »

Interesting find, thank you. Apparently, the distro is used to compose UserAgent string for HTTP requests. I don't know if distro is useful to have as part of user agent or not, or whether we need to se user agent at all ... but I changed it to not do this call repeatedly.

Post Reply

Return to “Resolved Problems and Bugs”