[Oxyd] [Linux/Mac] non-blocking save crashes

Bugs that we were not able to reproduce, and/or are waiting for more detailed info.
sorahn
Long Handed Inserter
Long Handed Inserter
Posts: 79
Joined: Tue Mar 17, 2020 11:35 pm
Contact:

Re: [1.0.0] Non blocking save hangs

Post by sorahn »

kovarex wrote: Thu Sep 24, 2020 1:02 pm We will fix it by removing the non-blocking save feature, it is just trouble that works only on linux and is not worth it.
That is super unfortunate to hear. I would wager that most of the dedicated servers running factorio are linux, and as the saves get bigger the process takes longer. When you mix in mods with that you have the possibility of the mods crashing the server, so you want to save frequently, but if you have blocking saving you spend as much time saving as you do playing.

So then if you start saving say once an hour instead, you risk a huge setback from a crash.

Certainly sad news if it gets pulled.

Is there anything we (the players) can do to help you guys to keep this feature?
User avatar
ssilk
Global Moderator
Global Moderator
Posts: 12889
Joined: Tue Apr 16, 2013 10:35 pm
Contact:

Re: [1.0.0] Non blocking save hangs

Post by ssilk »

kovarex wrote: Thu Sep 24, 2020 1:02 pm We will fix it by removing the non-blocking save feature, it is just trouble that works only on linux and is not worth it.
Please don’t!

This feature is a relief for people who like to play with mega- and gigabases. In my current world game save takes nearly a minute. Totally unplayable without this.

It’s even a relief for those who play normal bases: the seconds of waiting time for save is an interruption in gameplay. How often did I loose a live, because when I’m in the middle of a biter nest the game saves?

It’s vice versa: Factorio needs this feature for Windows (it works also on mac not only Linux), too. :) It’s such important and such high gameplay-value!

Suggestions

viewtopic.php?f=6&t=84785 (Recommended)
viewtopic.php?f=6&t=61941
viewtopic.php?f=6&t=81156
viewtopic.php?f=6&t=56073
Cool suggestion: Eatable MOUSE-pointers.
Have you used the Advanced Search today?
Need help, question? FAQ - Wiki - Forum help
I still like small signatures...
Rseding91
Factorio Staff
Factorio Staff
Posts: 16223
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [1.0.0] Non blocking save hangs

Post by Rseding91 »

ssilk wrote: Fri Oct 16, 2020 7:14 am It’s vice versa: Factorio needs this feature for Windows (it works also on mac not only Linux), too. :) It’s such important and such high gameplay-value!
It's not possible to implement on Windows. It's not a matter of "difficult" or "time consuming": it simply can not be done.
If you want to get ahold of me I'm almost always on Discord.
sthalik
Long Handed Inserter
Long Handed Inserter
Posts: 56
Joined: Tue May 01, 2018 9:32 am
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by sthalik »

What about ZwCreateProcess?
Rseding91
Factorio Staff
Factorio Staff
Posts: 16223
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by Rseding91 »

sthalik wrote: Fri Oct 16, 2020 8:19 pm What about ZwCreateProcess?
Tried it; it doesn't work. Nothing works and the new process just sits using 0% CPU never executing code and or crashes immediately on trying to do anything.
If you want to get ahold of me I'm almost always on Discord.
Squelch
Filter Inserter
Filter Inserter
Posts: 346
Joined: Sat Apr 23, 2016 5:31 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by Squelch »

I can understand the Windows problem, but why should that dictate whether the feature is removed from 'nix OS's?

WSL (Windows Subsystem for Linux) and/or Docker allow us to run a headless server instance in the background as a workaround if not actually running on a Linux Xserver for example.

Please don't remove non-blocking saves?

PS. For the record, I have never encountered the stuck/defunct/zombie save process problem after many many hours. I do run mods, one major (Py suite) and a few smaller QoL. Should I ever run into the problem, I would be all over it to find the cause or a solid repro.

This problem does not seem that common at all, so to remove non-blocking saves will be like throwing the proverbial baby out with the bathwater.
Rseding91
Factorio Staff
Factorio Staff
Posts: 16223
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by Rseding91 »

Squelch wrote: Mon Oct 19, 2020 4:57 pm I can understand the Windows problem, but why should that dictate whether the feature is removed from 'nix OS's?

WSL (Windows Subsystem for Linux) and/or Docker allow us to run a headless server instance in the background as a workaround if not actually running on a Linux Xserver for example.

Please don't remove non-blocking saves?

PS. For the record, I have never encountered the stuck/defunct/zombie save process problem after many many hours. I do run mods, one major (Py suite) and a few smaller QoL. Should I ever run into the problem, I would be all over it to find the cause or a solid repro.

This problem does not seem that common at all, so to remove non-blocking saves will be like throwing the proverbial baby out with the bathwater.
The problem is people reporting issues to us and taking developer time when there are no reproduction steps or ability for us to debug/fix the issues. It's just a time sink.
If you want to get ahold of me I'm almost always on Discord.
Squelch
Filter Inserter
Filter Inserter
Posts: 346
Joined: Sat Apr 23, 2016 5:31 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by Squelch »

Rseding91 wrote: Mon Oct 19, 2020 7:52 pm The problem is people reporting issues to us and taking developer time when there are no reproduction steps or ability for us to debug/fix the issues. It's just a time sink.
Then please crowdsource the problem? The Factorio community, as a whole, are pretty competent at problem solving (it is the nature of the game after all).

The feature is clearly identified as "Experimental", and as such, does not bring any guarantees or support. However, I would hazard a guess that there are many games out there with the option enabled, and have not encountered the same issues. This would suggest that there might be something in the environment on a subset of systems that do encounter the problem that could then be identified by gathering more data. Do you have those metrics available to you? ie how many games have the option enabled, and how many crash reports attributed to the feature?

I am more than happy to volunteer my time in attempting to identify and collate that information to come up with a reliable reproduction. Other areas of the game have already benefited from user investigation on behalf of, and for, the development team, so as a result, valuable development time investigating the problem can be spent elsewhere until such a time that a more complete picture is available.

Some current examples:
[Oxyd] [0.18.28] Stuck on waiting to save map - Directly pertinent to this issue.
Factorio flickers heavily all of the sudden

What I, and I hope some other users are asking, is to allow us to continue with this experimental feature for a while longer, and without the expectation of developer time spent on it until we can identify a solid reproduction or cause?
ferromagus
Manual Inserter
Manual Inserter
Posts: 3
Joined: Thu Sep 24, 2020 6:55 am
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by ferromagus »

Rseding91 wrote: Mon Oct 19, 2020 7:52 pm The problem is people reporting issues to us and taking developer time when there are no reproduction steps or ability for us to debug/fix the issues. It's just a time sink.
I think in my case it's just a collision between the auto-save mechanic in the game and a systemd timer that is periodically sending a /server-save command to the server running in a screen session right before taking a btrfs snapshot now that I think about it.

I also still have the coredump, that should help debugging the problem and give insight into the state of the server when it was hanging. I would happily provide it, through discord or email maybe, if desired. At least I was able to get the stack traces from the threads out of it with gdb. I don't want to be a burden to anyone, I just thought it might be insightful and useful feedback to an experimental feature and thus offered to provide the coredump. The road to hell is paved with good intentions I guess.
rafasc
Manual Inserter
Manual Inserter
Posts: 4
Joined: Tue Aug 25, 2020 4:47 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by rafasc »

Rseding91 wrote: Mon Oct 19, 2020 7:52 pm The problem is people reporting issues to us and taking developer time when there are no reproduction steps or ability for us to debug/fix the issues. It's just a time sink.
Isn't that true for all bugs?
I can understand If Wube as a company has decided that the cost-benefit of expending time to fix this experimental, -nix exclusive, feature is not worth it; but please don't put the blame on your users.

You guys gained excellent reputation about caring and "sinking time" on fixing esoteric bugs that the majority of people would never run into. The reasoning of "We are removing the experimental feature because users file bug reports about it" feels peculiar.

Start a multiplayer game using the steam version where steam cloud and Blueprint sync is enabled.
My crashes crashes went away when since I've disabled those features. And come back when I re-enable them.

I can reproduce it, not deterministically, but I've never seen it take more than 15 save attempts to crash.
User avatar
ssilk
Global Moderator
Global Moderator
Posts: 12889
Joined: Tue Apr 16, 2013 10:35 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by ssilk »

Rseding91 wrote: Fri Oct 16, 2020 8:25 pm
sthalik wrote: Fri Oct 16, 2020 8:19 pm What about ZwCreateProcess?
Tried it; it doesn't work. Nothing works and the new process just sits using 0% CPU never executing code and or crashes immediately on trying to do anything.
I hear between the lines, that it’s scraping at your programmers honor. But nobody can be perfect in everything. 8-)

So I would go so far and say: then wube needs to hire a specialist. O.K. That is me leaning out of the window. Sorry for that.

Because the point is: This feature is really a game-changer. When it is working. ;)

And so when wube is willing to invest that implementation (and I think this could become very expensive), there are many things that could increase the chance:
- explain the problem. For example in the FFF. It has by minimum a political and a technical aspect. I would not mix them. Yes there will be hundreds of posts, and everyone knows it better how to implement this, but it’s part of that investment.
- as said: crowdsource the problem, means: make this option easier to turn on (but with a fat warning), collect more logs, if that has been turned on. More errors - better chance to find the problem. If that feature is just hidden behind, it will be used only by experienced people, that are happy, that it just works. ;)
- ask actively for help (fff)
- search actively for people, that have deep knowledge into that or have already developed it for another software.
- search for examples where something like this is already working. For example: Reason, my beloved digital audio workstation, is able to do something similar (saving gigabytes of samples, while working/playing with them). I’m sure there are many more examples.
- and there is surely more that can be done.

But even if wube won’t invest into this: please don’t remove this. It’s much better to have it running with some bugs, than without. I’m normally saying the opposite, but this case is different!

Sorry for those, that cannot use it, but it’s experimental.
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by ptx0 »

just ignore the bug and leave the async save feature in-tact, or hire one of the devs like myself who have submitted their resume, are competent and capable of fixing Linux issues.
kovarex
Factorio Staff
Factorio Staff
Posts: 8298
Joined: Wed Feb 06, 2013 12:00 am
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by kovarex »

Moving to pending, as there is no one able and willing to fix it at the moment.
Squelch
Filter Inserter
Filter Inserter
Posts: 346
Joined: Sat Apr 23, 2016 5:31 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by Squelch »

kovarex wrote: Mon Nov 02, 2020 12:37 pm Moving to pending, as there is no one able and willing to fix it at the moment.
Thank you for giving it a stay of execution and not performing a coup de grâce.

I think identifying the issue properly would help immensely. I play on both native Linux client or server on a desktop machine, as well as a Win10 client using a local server via WSL and Docker on my laptop, all with non-blocking saves enabled. I haven't encountered this problem at all after quite some period. This is all non Steam, and LAN only which may be factors however.

That said, there does seem to be a problem with some setups that is triggering these problems, so I am currently collating as many of the reports and suggested causes and trying to recreate the crashes as I'm able. So far nothing sticks out as a common denominator, but there must be one.
User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1507
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: [Oxyd] [Linux/Mac] non-blocking save crashes

Post by ptx0 »

kovarex wrote: Mon Nov 02, 2020 12:37 pm Moving to pending, as there is no one able and willing to fix it at the moment.
awaiting response from your PM.
movax20h
Fast Inserter
Fast Inserter
Posts: 164
Joined: Fri Mar 08, 2019 7:07 pm
Contact:

Re: [1.0.0] Non blocking save hangs

Post by movax20h »

kovarex wrote: Thu Sep 24, 2020 1:02 pm We will fix it by removing the non-blocking save feature, it is just trouble that works only on linux and is not worth it.
:(

Please no.

I love this feature, and it works fine for me for very long time on my Linux machine. It really makes working with big bases and big saves way more pleasurable. I do autosave every 5 minutes, and the save takes about 20 seconds (I have very fast machine), but on some other people machines it could be a minute. Non-blocking save really solve this issue. I do save every 5 minutes, just to not loss progress, but also as a snapshot for factorio maps (I do have script that archives every autosave automatically with timestamps, so I have 100s of autosaves now).

If anybody of people experiencing the crashes with non-blocking save, could share all the details: log, save, mods, info how often it does happen. I can test it on my machine.
Post Reply

Return to “Pending”