[Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
[Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
Save files: https://drive.google.com/drive/folders/ ... sp=sharing
Included are the original save (suffering..zip), the autosaves as of right after the crash, and both of the .tmp.zip files that were present.
Log file, as of right after the crash (also attached to this post): https://pastebin.com/RiiRQcVD
System info: Arch Linux 5.9.9-zen 64-bit, Factorio 1.1.1 on the native Steam client installed from multilib/steam, with multilib/steam-native-runtime installed as well.
What happened:
- I played on a different modded save for a bit (the only present mod was Editor Extensions)
- I manually saved the game, then selected the main save ("suffering.") and synced the mods, disabling all except "base"
- The game restarted and loaded the save as normal. I played for a couple hours, with async autosaves happening without issues
- At some point, an autosave stopped with an in-game message saying "Saving process crashed.", which I confirmed and just continued playing (can't remember if I manually saved right after it - hopefully the log can clarify that)
- After a bit of playing (with more autosaves happening as normal), a save started (I literally can't remember if it was an autosave or a manual save - I'm sorry) and the game immediately took up 100% of the CPU and an abnormal amount of memory, nearly crashing. Both the CPU and RAM returned to normal after several seconds
- The save message in the top left remained there for several minutes, being stuck at 100% (screenshot attached)
- When I decided it wasn't going to go away, I tried to quit the game as normal (press Esc, click "Quit Game")
- The "Game is not saved. Quit anyway?" box showed up, and as soon as I clicked "Quit", the game hung up for several seconds, followed by an error message (screenshot attached)
- The game process exited when I clicked "Yes" (didn't have to kill it with a task manager)
After the crash:
- The game opens normally
- The main menu "Continue" button shows "suffering..tmp" as the save name
- When I clicked "Load game" (not "Continue"), I got the following message: "Opening zip /home/selplacei/.factorio/saves/suffering..tmp.zip failed: Bad zip file"
- I could load _autosave2 (the most recent save) normally, and manually saved it without problem
Not sure if this is a game bug or caused by my own system - posting here just in case.
			
							Included are the original save (suffering..zip), the autosaves as of right after the crash, and both of the .tmp.zip files that were present.
Log file, as of right after the crash (also attached to this post): https://pastebin.com/RiiRQcVD
System info: Arch Linux 5.9.9-zen 64-bit, Factorio 1.1.1 on the native Steam client installed from multilib/steam, with multilib/steam-native-runtime installed as well.
What happened:
- I played on a different modded save for a bit (the only present mod was Editor Extensions)
- I manually saved the game, then selected the main save ("suffering.") and synced the mods, disabling all except "base"
- The game restarted and loaded the save as normal. I played for a couple hours, with async autosaves happening without issues
- At some point, an autosave stopped with an in-game message saying "Saving process crashed.", which I confirmed and just continued playing (can't remember if I manually saved right after it - hopefully the log can clarify that)
- After a bit of playing (with more autosaves happening as normal), a save started (I literally can't remember if it was an autosave or a manual save - I'm sorry) and the game immediately took up 100% of the CPU and an abnormal amount of memory, nearly crashing. Both the CPU and RAM returned to normal after several seconds
- The save message in the top left remained there for several minutes, being stuck at 100% (screenshot attached)
- When I decided it wasn't going to go away, I tried to quit the game as normal (press Esc, click "Quit Game")
- The "Game is not saved. Quit anyway?" box showed up, and as soon as I clicked "Quit", the game hung up for several seconds, followed by an error message (screenshot attached)
- The game process exited when I clicked "Yes" (didn't have to kill it with a task manager)
After the crash:
- The game opens normally
- The main menu "Continue" button shows "suffering..tmp" as the save name
- When I clicked "Load game" (not "Continue"), I got the following message: "Opening zip /home/selplacei/.factorio/saves/suffering..tmp.zip failed: Bad zip file"
- I could load _autosave2 (the most recent save) normally, and manually saved it without problem
Not sure if this is a game bug or caused by my own system - posting here just in case.
- Attachments
- 
			
		
				- screenshot-factorio-stuck-autosave.png (45.55 KiB) Viewed 8768 times
 
- 
			
		
				- screenshot-factorio-error-message.png (2.33 MiB) Viewed 8768 times
 
- 
			
		
		
				- factorio-current.log
- (33.92 KiB) Downloaded 229 times
 
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
I thought the devs removed the non-blocking save. Is it still available in the settings?
			
			
									
									
						Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
It is available via the config file on Linux. Based on the comment ("Does nothing on Windows") it appears to be supported. It has worked perfectly fine until 1.1 (I got the "Saving process crashed" message a total of 3 times since upgrading).valneq wrote: Thu Nov 26, 2020 7:36 am I thought the devs removed the non-blocking save. Is it still available in the settings?
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
It was kept after several requests so we could attempt to identify the cause.valneq wrote: Thu Nov 26, 2020 7:36 am I thought the devs removed the non-blocking save. Is it still available in the settings?
see [Oxyd] [Linux/Mac] non-blocking save crashes
From the log, and what seems to have initiated the crash is the generic error (return value 1) for autosave1 followed by a successful save to autosave2, but failed to save the blueprint storage:
Code: Select all
8363.920 Info AppManager.cpp:285: Saving to _autosave1 (non-blocking).
8364.072 Info AsyncScenarioSaver.cpp:144: Saving process PID: 1724870
8366.131 [1724870] Info BlueprintShelf.cpp:720: Saving blueprint storage.
8366.671 Info ChildProcessAgent.cpp:60: Child 1724870 exited with return value 1
8544.637 Info AppManager.cpp:285: Saving to _autosave2 (non-blocking).
8544.777 Info AsyncScenarioSaver.cpp:144: Saving process PID: 1726490
8547.438 [1726490] Info BlueprintShelf.cpp:720: Saving blueprint storage.
8547.530 [1726490] Error BlueprintShelf.cpp:897: Failed to save blueprint storage to cloud. Size: 467699, Cloud Storage: Total=0, Available=0
8547.789 Info ChildProcessAgent.cpp:60: Child 1726490 exited with return value 0
8547.789 Info AppManager.cpp:286: Saving finished
8567.214 Info AppManagerStates.cpp:1829: Saving game as /home/selplacei/.factorio/saves/suffering. (non-blocking)
8567.347 Info AsyncScenarioSaver.cpp:144: Saving process PID: 1726684
8569.386 [1726684] Info BlueprintShelf.cpp:720: Saving blueprint storage.
8740.021 Warning AppManager.cpp:301: Not saving _autosave3: Previous saving process hasn't finished yet
8922.703 Warning AppManager.cpp:301: Not saving _autosave1: Previous saving process hasn't finished yetI still haven't been able to reproduce this, and wish I could so I could try and chase it down.
strace might reveal something if a reliable repro can be found. So far, I haven't been able to identify anything savefile related from previous reports, and I can only assume it is something particular to some system environments.
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
In the OPs stacktrace, last line that makes sense is 
from there up it is nonsense, so it's probably inside Steam API call and the address symbolization returned nonsense for those calls.
I am not even sure if Steam API is supposed to work from forked process.
			
			
									
									
						Code: Select all
#32 0x00007ffddf8ac1f8 in SteamContext::getStorageQuota() const at /tmp/factorio-build-7k9uW8/src/SteamIntegration/SteamContext.cpp:182I am not even sure if Steam API is supposed to work from forked process.
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
Steam doesn't appear to be a common denominator in the reported crashes although I suspect you are correct with your observation, and it may be a contributing factor, or just another vector to the defect.posila wrote: Thu Nov 26, 2020 12:42 pm In the OPs stacktrace, last line that makes sense isfrom there up it is nonsense, so it's probably inside Steam API call and the address symbolization returned nonsense for those calls.Code: Select all
#32 0x00007ffddf8ac1f8 in SteamContext::getStorageQuota() const at /tmp/factorio-build-7k9uW8/src/SteamIntegration/SteamContext.cpp:182
I am not even sure if Steam API is supposed to work from forked process.
Code: Select all
8366.671 Info ChildProcessAgent.cpp:60: Child 1724870 exited with return value 1Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
selplacei wrote: Thu Nov 26, 2020 6:39 am System info: Arch Linux 5.9.9-zen 64-bit, Factorio 1.1.1 on the native Steam client installed from multilib/steam, with multilib/steam-native-runtime installed as well.
Out of interest @selplacei, what swap space do you have, and how is it allocated?System info: [CPU: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz, 4 cores, RAM: 7896 MB]
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
I have a 16 GB swap partition on an SSD with swappiness set to 60. Everything other than that I haven't touched.Squelch wrote: Thu Nov 26, 2020 1:10 pmselplacei wrote: Thu Nov 26, 2020 6:39 am System info: Arch Linux 5.9.9-zen 64-bit, Factorio 1.1.1 on the native Steam client installed from multilib/steam, with multilib/steam-native-runtime installed as well.Out of interest @selplacei, what swap space do you have, and how is it allocated?System info: [CPU: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz, 4 cores, RAM: 7896 MB]
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
Thanks, that helps to rule out that aspect.selplacei wrote: Thu Nov 26, 2020 1:39 pm I have a 16 GB swap partition on an SSD with swappiness set to 60. Everything other than that I haven't touched.
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
I'm currently running  on Factorio on a clean save, hoping that at least the "Save process crashed" thing happens again. Will post if anything comes up. The useful info should be easy to separate since the saving process has its own PID
			
			
									
									
						Code: Select all
strace -fRe: [1.1.1] Asynchronous save process stuck, crash when quitting game
Good call, and thanks for your efforts.selplacei wrote: Thu Nov 26, 2020 2:03 pm I'm currently runningon Factorio on a clean save, hoping that at least the "Save process crashed" thing happens again. Will post if anything comes up. The useful info should be easy to separate since the saving process has its own PIDCode: Select all
strace -f
The following might also be useful
Code: Select all
strace -ff --output ~/.factorio/AsychMonitor /path/to/factorioIf only I was able to recreate the crashes on any of my systems, I'd be doing this myself. I look forward to seeing what you find.
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
posila wrote: Thu Nov 26, 2020 12:42 pm In the OPs stacktrace, last line that makes sense isfrom there up it is nonsense, so it's probably inside Steam API call and the address symbolization returned nonsense for those calls.Code: Select all
#32 0x00007ffddf8ac1f8 in SteamContext::getStorageQuota() const at /tmp/factorio-build-7k9uW8/src/SteamIntegration/SteamContext.cpp:182
I am not even sure if Steam API is supposed to work from forked process.
I'll retract the first part of my assertion. Steam does indeed appear to be a common factor. I've just gone over my collated reports again, and the crashes that at first appeared to be attributed to the child process hanging do in fact appear to have other causes/exit values. This might explain why myself and some others haven't been able to reproduce the problem. In my case, I've tried everything else except running via Steam.Squelch wrote: Thu Nov 26, 2020 1:03 pm Steam doesn't appear to be a common denominator in the reported crashes although I suspect you are correct with your observation, and it may be a contributing factor, or just another vector to the defect.Does appear regularly in the log files for previous reports, but it's as descriptive as saying "I haz a bug" unfortunately.Code: Select all
8366.671 Info ChildProcessAgent.cpp:60: Child 1724870 exited with return value 1
The second half of my assertion still stands, with exit value 1 for an autosave attempt. The pattern seems to go:
- autosave attempt PID[1]
- Blueprint storage (cloud) success
- PID[1] Failed with exit value 1
- Another autosave attempt PID[2]
- Blueprint storage (cloud) fail
- PID[2] Success with exit value 0
- Next save attempt
- PID[1], PID[2]... "Previous saving process hasn't finished yet"
- Crash
- Garbage stacktrace
[Conjecture]
Steam cloud might be seeing two negotiations for storage from each half of the fork. It accepts and acknowledges the original request, but because the second request comes from the same host it never responds, or it's assumed to be the same, therefore leaving it hanging and a possible zombie. The next request to sync storage fails because the fork is still busy. Previous successful saves might just be luck in timing.
[/conjecture]
Re: [1.1.1] Asynchronous save process stuck, crash when quitting game
This being connected to Steam storage would make sense to me. I also haven't really tried the Steam version, so I'll see if I can reproduce it this way.
			
			
									
									
						Re: [Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
I haven't been able to reproduce neither the original issue nor the "Save process crashed" message so far. It is interesting that saving on that one save still takes much more CPU and RAM than normal (noticeably more than on 1.1.0) on my machine. I'll play as normal with strace enabled and will post if a crash happens again, but don't expect it.
Also, it might be obvious by now, but just before the crash I was doing a lot of putting new blueprints into the global blueprint library as part of normal gameplay.
			
			
									
									
						Also, it might be obvious by now, but just before the crash I was doing a lot of putting new blueprints into the global blueprint library as part of normal gameplay.
Re: [Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
I'm still poking at it, and nothing yet.selplacei wrote: Thu Nov 26, 2020 8:05 pm I haven't been able to reproduce neither the original issue nor the "Save process crashed" message so far. It is interesting that saving on that one save still takes much more CPU and RAM than normal (noticeably more than on 1.1.0) on my machine. I'll play as normal with strace enabled and will post if a crash happens again, but don't expect it.
I have set strace with individual files for each process to a folder named session. This is cleared on every restart (strace generates a lot of verbage) Matching up the PIDs from Factorio log should be easier than one flat file.
Yes, that is the line of thought I was taking, and have enabled "Blueprint library cloud sync" along with creating test blueprints as I go in a new game.Also, it might be obvious by now, but just before the crash I was doing a lot of putting new blueprints into the global blueprint library as part of normal gameplay.
Thanks for participating in this. I'm sure something will come of this.
Re: [Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
I've managed to get the bug again, this time with strace enabled! Also, I wasn't doing anything blueprint-related this time (at least since 30 minutes before the crash), and there weren't any failed autosaves before the crash. Like last time, the progress bar was stuck at 100%, and when I tried to quit the game, it crashed.
The log is attached. As for the strace, it probably contains sensitive information, so I'll kindly ask a developer to DM me and I'll provide a link to it - the whole thing weighs 8 GiB since I've been playing for a while, but I'll also provide the last 10,000 and 1,000,000 lines as separate files. All PIDs are in the same file.
			
							The log is attached. As for the strace, it probably contains sensitive information, so I'll kindly ask a developer to DM me and I'll provide a link to it - the whole thing weighs 8 GiB since I've been playing for a while, but I'll also provide the last 10,000 and 1,000,000 lines as separate files. All PIDs are in the same file.
- Attachments
- 
			
		
		
				- factorio-current.log
- (72.47 KiB) Downloaded 202 times
 
Re: [Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
So, with Steam enabled I've seen the hang and subsequent crash myself, but also couldn't reproduce it reliably. Nevertheless, when I tried saving a file to Steam Cloud from both the parent and child processes, both processes hung immediately. So whilst I can't be 100% sure, I think the cause of this was that the child tried to push files to Steam.
In 1.1.4, only the parent process will talk to Steam, which should hopefully fix this issue.
			
			
									
									
						In 1.1.4, only the parent process will talk to Steam, which should hopefully fix this issue.
Re: [Oxyd] [1.1.1] Asynchronous save process stuck, crash when quitting game
That seems to align with my conjecture, and I'm so pleased that you may have nailed this @Oxyd.Oxyd wrote: Tue Dec 01, 2020 4:51 pm So, with Steam enabled I've seen the hang and subsequent crash myself, but also couldn't reproduce it reliably. Nevertheless, when I tried saving a file to Steam Cloud from both the parent and child processes, both processes hung immediately. So whilst I can't be 100% sure, I think the cause of this was that the child tried to push files to Steam.
In 1.1.4, only the parent process will talk to Steam, which should hopefully fix this issue.
My testing time has been rather limited of late, so I wasn't able to reproduce the problem either. Let's hope that is finally resolved.




