[Oxyd] [0.17.59] Linux Headless, non-blocking save fails

Place for things which are bugs but we have no idea how to solve them. Things related to hardware, libraries, strange setups, etc.
Post Reply
AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

[Oxyd] [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

Seems to happen at random, not sure how to exactly reproduce it.


Part of the gridlock cluster here, We're using the "experimental non-blocking save" feature, we're aware it's experimental, just figured it was worth a shot to post it here



One of our servers was not accessible but the process still seemed to be running but as if the game was paused when none of our games pause when there are no players... Upon some investigation, I noticed in the factorio log that the last few lines were from it attempting to save but it seemed to fail???

Code: Select all

148355.380 Info ServerSynchronizer.cpp:618: nextHeartbeatSequenceNumber(4560950) removing peer(92).
148456.754 Info AppManager.cpp:267: Saving to _autosave1 (non-blocking).
148456.834 Info AsyncScenarioSaver.cpp:144: Saving process PID: 11309
153089.590 Received SIGINT, shutting down
Someone had disconnected then ~100 seconds later it did an autosave as seen above, It then just stopped... After we noticed it I sent Ctrl-C to kill it, and it just hung on shutting down and that was it.

On a normal non-blocking save we see

Code: Select all

159451.012 Info AppManager.cpp:267: Saving to _autosave4 (non-blocking).
159451.030 Info AsyncScenarioSaver.cpp:144: Saving process PID: 12071
159451.993 Info ChildProcessAgent.cpp:60: Child 12071 exited with return value 0
159451.993 Info AppManager.cpp:268: Saving finished
The full log can be posted if requested, due to its size and multiple player ips listed in the file.. Also, @Oxyd and @V453000 were pinged in our internal discord about it


Thanks,
AreYouScared / Cjmwid
Last edited by AreYouScared on Mon Jul 29, 2019 8:40 am, edited 1 time in total.
Image

tehfreek
Filter Inserter
Filter Inserter
Posts: 391
Joined: Thu Mar 17, 2016 7:34 am
Contact:

Re: [0.17.59] Linux Headless, Blocking save fails

Post by tehfreek »

Was the autosave recoverable? I had a non-blocking save lock up a few days ago but the autosave could be loaded. I didn't report it since I figured it was a one-time, unique thing; I have not had it happen since.

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [0.17.59] Linux Headless, Blocking save fails

Post by AreYouScared »

Didn't look its been overwritten :(
Image

Oxyd
Former Staff
Former Staff
Posts: 1428
Joined: Thu May 07, 2015 8:42 am
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by Oxyd »

If anyone of you gets this hang again, could you attach GDB to the saving process (the one whose PID is in the log) and run `thread apply all bt` and post the output here?

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

I will make note of this, if it happens Ill be sure to try and get ahold of you in the gridlock discord. So far it hasn't happened again. Does it being in a docker container change what you'd be looking for?
Image

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

https://ptero.co/uxekutazof.shell - From Godmave for an unknown server?
These last three all kind of happened at the same time? well, I noticed them all at the same time.
https://ptero.co/gaqodiqoho.shell - From Me for Caldonia
https://ptero.co/zamumonufi.shell- From Me for Caladan
https://ptero.co/enedenubul.shell- From Me for Dakara

From our discord, figured I should post it here to keep everyone informed...
If more information is required, let us know here or in the internal channels...

Heres another from 8/1
https://ptero.co/eboguxanoc.shell - From Me for Frontier
Image

User avatar
Klonan
Factorio Staff
Factorio Staff
Posts: 5150
Joined: Sun Jan 11, 2015 2:09 pm
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by Klonan »

How much RAM do the servers have and how much RAM is in use running Factorio?

Non-blocking save can up to double the amount of RAM usage, and if you don't have enough, it can be a bad time

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

Machine are fitted with 16gb of ram, 4 servers per machine, each one uses about 2 to 3gb of ram
Image

User avatar
Klonan
Factorio Staff
Factorio Staff
Posts: 5150
Joined: Sun Jan 11, 2015 2:09 pm
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by Klonan »

AreYouScared wrote:
Mon Aug 05, 2019 4:18 pm
Machine are fitted with 16gb of ram, 4 servers per machine, each one uses about 2 to 3gb of ram
Yea that is not a lot of headroom, if two saves go off at the same time it could easily eat all the RAM.

Does the machine have paging enabled?

Oxyd
Former Staff
Former Staff
Posts: 1428
Joined: Thu May 07, 2015 8:42 am
Contact:

Re: [0.17.59] Linux Headless, non-blocking save fails

Post by Oxyd »

AreYouScared wrote:
Thu Aug 01, 2019 6:59 am
https://ptero.co/uxekutazof.shell - From Godmave for an unknown server?
These last three all kind of happened at the same time? well, I noticed them all at the same time.
https://ptero.co/gaqodiqoho.shell - From Me for Caldonia
https://ptero.co/zamumonufi.shell- From Me for Caladan
https://ptero.co/enedenubul.shell- From Me for Dakara

From our discord, figured I should post it here to keep everyone informed...
If more information is required, let us know here or in the internal channels...

Heres another from 8/1
https://ptero.co/eboguxanoc.shell - From Me for Frontier
Very interesting. One of those stacktraces shows a process in the middle of crashing. And just to make sure, you attached GDB to the child whose PID is in listed in the log file, right? Not the parent process.
Klonan wrote:
Tue Aug 06, 2019 8:34 am
AreYouScared wrote:
Mon Aug 05, 2019 4:18 pm
Machine are fitted with 16gb of ram, 4 servers per machine, each one uses about 2 to 3gb of ram
Yea that is not a lot of headroom, if two saves go off at the same time it could easily eat all the RAM.

Does the machine have paging enabled?
Even if the machines do run out of memory, the game shouldn't just hang. I suppose it might start swapping like crazy if swap is enabled.

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [Oxyd] [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

The sad part is there is no child process to attach to do i had to attach to the parent process.
Image

AreYouScared
Long Handed Inserter
Long Handed Inserter
Posts: 87
Joined: Thu Mar 23, 2017 3:32 am
Contact:

Re: [Oxyd] [0.17.59] Linux Headless, non-blocking save fails

Post by AreYouScared »

I've had a save on a nonblocking auto save spree from the start of this issue, and it has yet to happen again, so i guess this could be closed?
Image

posila
Factorio Staff
Factorio Staff
Posts: 5201
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: [Oxyd] [0.17.59] Linux Headless, non-blocking save fails

Post by posila »

Thanks.
I assume there is not enough information to figure out what it was happening, and it's not possible to get more information since it doesn't happen anymore; so I'll move it to 1/0 magic.

Post Reply

Return to “1 / 0 magic”