[0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Things that we don't consider worth fixing at this moment.
Post Reply
luc
Fast Inserter
Fast Inserter
Posts: 218
Joined: Sun Jul 17, 2016 9:53 pm
Contact:

[0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Post by luc »

When connecting to the headless server after not playing for a few days, the client hangs 'waiting for server to save map' for a few minutes. I suspect it's due to swapping, as I see swap usage going down until the point where the map saves. My swap space is hard drive-based, which probably exacerbates the problem.

There is a syscall that can tell the server not to swap out a process. Swapping is good in general, even in a game server, but maybe there are certain parts of the memory (like the data needed to save the map, though that is probably the bulk of the memory) which you can tell it to keep in RAM? Or perhaps allow calling this syscall with a startup flag or config setting like --disable-swap? I would call that syscall myself, but it seems impossible to call it on another program. I'd have to create and maintain a binary patch, i.e. it's much easier to call it from inside factorio. It should be just one syscall if I understand the man page correctly, and it can be marked as an unstable/experimental config setting that admins should not rely on.

User avatar
wheybags
Former Staff
Former Staff
Posts: 328
Joined: Fri Jun 02, 2017 1:50 pm
Contact:

Re: [0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Post by wheybags »

Sorry, but it is very unlikely we will implement this. It is quite a niche problem, and really the solution is to get more ram so you never swap out.
It seems the syscall you listed might work, but there are potential complications regarding process privileges and limits on memory locking by unprivileged processes.
However, since you're talking about syscalls, you're probably someone who can wrangle a c compiler. Indeed, it seems that you can't use mlock on a separate process, but what if you wrote a small wrapper program that would call mlockall(), then use exec to start factorio without forking? I'm not sure if it would work, but if you're really dedicated, it could be worth a try. You'll probably need to set the limits for unprivileged processes as well, or run the game as root (lol pls don't).

luc
Fast Inserter
Fast Inserter
Posts: 218
Joined: Sun Jul 17, 2016 9:53 pm
Contact:

Re: [0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Post by luc »

wheybags wrote:
Tue May 21, 2019 11:58 am
the solution is to get more ram so you never swap out.
That's not quite right. I could add terabytes of RAM, but as long as my server keeps reading data from hard drives that are larger than the amount of RAM (most people have more disk space than RAM), it will sooner or later swap out unused programs to make space for buffers. Try reading stuff from disk, or simply read your disk: `sudo cat /dev/sda > /dev/null`. Unless you never read anything from disk other than the data that is already buffered in RAM, it will fill up much memory as your disk is large. Except any programs program that are in use, of course (so if you used something within the last few, idk, days? Then it won't swap it out).
you're probably someone who can wrangle a c compiler
Thanks for the compliment :) But in truth, while I know the theory and can do some basics, I'm not good with low-level stuff. I don't really understand the steps you described. What I do understand is that an implementation, even if hidden under an experimental flag, is a little more involved than it seemed at first (I didn't know of the potential complications).

Do you know of a way that one might work around the problem? I'm thinking of something like a fake client in 5 lines of python that would send the right bytes to trigger a map save, then disconnect. I could have that run every few hours, that would reload it often enough.
wheybags wrote:
Tue May 21, 2019 11:58 am
or run the game as root (lol pls don't).
# ps aux | grep -e factorio -e ^USER
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 22569 2.8 2.3 1168420 187908 pts/9 Sl+ May06 654:45 bin/x64/factorio

:D

It's in a container though, but good point, I didn't think to create a user when I was setting up a server for some friends!

--------

You know what just occurred to me? Why does it take multiple minutes? It's <200MB and my disk doesn't take minutes to read that amount of data.

Over the past week (that's why it's taking so long for me to get back to this) I've been running an experiment, leaving a Python process unused for a while, while reading data from disk (so that the kernel would hopefully buffer the disk data and swap out unused processes). I allocated 200MiB of byte array in Python. At first (that evening and the next evening) it didn't want to swap out: I saw that reading files filled up the buffers, but it didn't swap out my process that I used <24h ago. Then I left it for a bit, and by now it has been over a week. Guess what? It finally swapped out! "195412 kB" was reported[1] as being swapped away for the Python process. For comparison, I also didn't play Factorio over this time (I've been busy) and that process reports "135408 kB" as being swapped out.

So now the big test: will reading that data in Python be as slow as when Factorio reads its map data? If Factorio takes multiple minutes for 135MB, then surely 195MB will take even longer. Here's what I did in Python over the past week, with some 'comments' added:

>>> a = bytes([0 for _ in range(int(1024*1024*200))]) # allocate a byte array of 200MiB
>>> len(a) # this was last week
209715200 # seems to work fine
# Checking htop, the memory allocated sounds about right.
# Checking how much is swapped out[1], we get some 2MB, even after I read the whole hard drive to /dev/null and my server has been annoyingly unresponsive (because of that) for hours. I give up for now and let this process sit idle in a screen session
# ... a week goes by ...
# I run [1] again and ~195MB was reported as swapped out. Now I continue the test:
>>> import time # I loaded the module
>>> t=time.time(); print(time.time()-t) # and did a test run
0.0008966922760009766 # doing nothing is super fast, as expected =)
# Checking [1] again, a few kilobytes were loaded back into main memory (probably the REPL part of the process), but nothing significant. The main chunk of memory is still in swap.
>>> t=time.time(); b=sum(a); print(time.time()-t) # "sum()" the byte array, just so that every byte is read again
12.118383646011353 # 12 seconds
# Checking [1] again, I now see 2304kB: we're back to last week's situation: the "a" bytearray is back in main memory.

So it takes 12 seconds to load ~195MB out of swap, a bunch more than the memory my factorio process swaps away. But this is roughly where my knowledge ends, I don't know how to investigate why this is. Maybe Factorio has some weird memory access patterns? Even if I would do random access (loading random pages of 4KB each), 135MB shouldn't take that long... or maybe? I don't know.

My conclusions so far:
  • Loading 195MB from swap takes longer than reading a normal 195MB file, but in Factorio, loading 135MB from swap takes forever.
  • Buying more RAM will not help, unless you have more RAM memory than ssd/hdd space (and with 4TB of HDD space, I'd have to find awfully large SODIMM RAM modules).
  • This is seen as a niche problem, but I wonder if there aren't a lot of people that run their servers just for themselves and some friends, and don't play for a few days, and run into the same issue. It seems easy to measure how long "saving the map" takes, or rather, how long it takes between 0% and 1% (because that's what takes a few minutes: that first percent), and anonymously report to the factorio servers something like "this super small operation, doing 1% of map saving, just took >60 seconds, and the next percentage point was only 0.2 seconds".
Do you have any idea why reading swap data might be so much slower in Factorio than in other software?

Oh, and could you let me know if this was read? I've put quite a bit of time into this, I'd at least like to make sure someone reads this forum :)

[1] for file in /proc/*/status ; do awk '/Tgid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | sort -k 3 -n | grep processname (either 'python' or 'factorio' in my case)

User avatar
wheybags
Former Staff
Former Staff
Posts: 328
Joined: Fri Jun 02, 2017 1:50 pm
Contact:

Re: [0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Post by wheybags »

So, I looked into it and as far as I can tell, linux will not swap out process memory fro disk cache. There is a kernel parameter called swappiness that determines whether a failed allocation will steal space by swapping out processes memory or io cache, but that seems to be the only relation I can find between the two.
This also matches my experience, which is that I generally see 0 swap usage on all my machines all the time (I normally only have swap at all so I can use hibernation).
Maybe you could install something to monitor your ram usage? If you don't expect to be actually exhausting your physical ram, maybe you have a runaway cron job or something?

As for why it takes so much longer - I would guess access patterns could affect it. The python snippet you posted will read sequentially, whereas factorio will read all over the place in that chunk of ram. The difference does seem quite large though.

luc
Fast Inserter
Fast Inserter
Posts: 218
Joined: Sun Jul 17, 2016 9:53 pm
Contact:

Re: [0.17.41] Server (headless) takes multiple minutes to save the map after not having played a few days

Post by luc »

wheybags wrote:
Fri May 31, 2019 8:30 am
So, I looked into it and as far as I can tell, linux will not swap out process memory fro disk cache. There is a kernel parameter called swappiness that determines whether a failed allocation will steal space by swapping out processes memory or io cache, but that seems to be the only relation I can find between the two.
This also matches my experience, which is that I generally see 0 swap usage on all my machines all the time (I normally only have swap at all so I can use hibernation).
Oh, that's odd. I do see swap usage on my laptop even when I have, say, only slightly more than 50% of RAM in use. The times that I run out I definitely notice (the most distinct sign is the mouse jumping instead of moving smoothly for a moment), so I'm pretty sure that on my laptop, I don't run out without knowing about it. On the server, that could be, maybe I should indeed look into that.

Thanks for checking things out, I guess it's just a minor nuisance we can live with for the people that experience this issue :)

Post Reply

Return to “Won't fix.”