[1.1.46]Non-blocking-save leaks memory

gallomimia · Post by **gallomimia** » Fri Nov 19, 2021 11:09 pm

Very happy to be back on linux playing this game after a dark cloud of Windows gaming. The amazing not-windows-only feature of Non-Blocking saves is now available to me! My factory has grown rather large, over 1k spm, with several mods including Swarmageddon. (Save file is 70MB)

So I imagine the game-state in memory is quite the block, and the simulation even freezes for maybe 1/4 second while it begins the save file. I assume it duplicated the memory pages, continues the game on one fork, and executes the save function on the other, now frozen, fork.

Seems there is a very large quantity of memory leaked after this procedure.

To duplicate, all you have to do is play a very large factory with autosave on 5 minutes or less, for several hours. Open a TOP and notice the bloat. It goes from 1g resident size to nearly 8g, eating a lot of virtual memory... I used to run this game on even bigger factories with NO virtual memory. Had to turn on a page file because of this. (And other programs leaking memory like mad fools. I suspect this bug might extend into the operating system's ability to free memory)

Post by **Loewchen** » Fri Nov 19, 2021 11:10 pm

See known issues.

ptx0 · Post by **ptx0** » Sat Nov 20, 2021 3:22 pm

gallomimia wrote: Fri Nov 19, 2021 11:09 pm Very happy to be back on linux [...] Had to turn on a page file because of this. (And other programs leaking memory like mad fools. I suspect this bug might extend into the operating system's ability to free memory)

ugh it's just that you don't get how memory management works. you were running without a "page file"? on Linux, it's known as swap, and it's mandatory to have some so the kernel can work correctly.

source: Linux kernel dev for a decade.

mrvn · Post by **mrvn** » Sat Nov 20, 2021 4:37 pm

ptx0 wrote: Sat Nov 20, 2021 3:22 pm
gallomimia wrote: Fri Nov 19, 2021 11:09 pm Very happy to be back on linux [...] Had to turn on a page file because of this. (And other programs leaking memory like mad fools. I suspect this bug might extend into the operating system's ability to free memory)
ugh it's just that you don't get how memory management works. you were running without a "page file"? on Linux, it's known as swap, and it's mandatory to have some so the kernel can work correctly.

source: Linux kernel dev for a decade.

The requirement to have twice the ram as swap is a total myth and linux works perfectly without any swap. Linux (and many other unixes) has been used on disk-less systems using netboot that have no place to put swap at all for decades. Using swap over the network is possibly but actually causes more problems than it solves as there are too many ways for it to deadlock. So I'm not sure what kernel dev you are talking about, they should know better.

PS: There are features that require swap, like suspend to swap, for obvious reasons. But nothing the kernel needs.

ptx0 · Post by **ptx0** » Sat Nov 20, 2021 4:48 pm

mrvn wrote: Sat Nov 20, 2021 4:37 pm
source: Linux kernel dev for a decade.
The requirement to have twice the ram as swap is a total myth and linux works perfectly without any swap. Linux (and many other unixes) has been used on disk-less systems using netboot that have no place to put swap at all for decades.

I'm the kernel developer. where'd I say you need twice the RAM? putting a bunch of words into my mouth. in fact, I'm just going to forever ignore you on this forum because of the amount of noise you contribute. thanks.

edit: to add more info for anyone who actually wants to know why running without swap "works", it's because you're not running mission-critical systems and you've taken stock of the downsides of not having swap (e.g. behaving poorly under memory pressure) and decided that you don't require support from anyone else, because you know better than the Best Practices. this is similar to any precaution anyone takes - like having verified backups or wearing a seatbelt.

eating a lot of virtual memory

good thing it's only eating up virtual memory.

mrvn · Post by **mrvn** » Sat Nov 20, 2021 7:03 pm

ptx0 wrote: Sat Nov 20, 2021 4:48 pm
mrvn wrote: Sat Nov 20, 2021 4:37 pm
source: Linux kernel dev for a decade.
The requirement to have twice the ram as swap is a total myth and linux works perfectly without any swap. Linux (and many other unixes) has been used on disk-less systems using netboot that have no place to put swap at all for decades.
I'm the kernel developer. where'd I say you need twice the RAM? putting a bunch of words into my mouth. in fact, I'm just going to forever ignore you on this forum because of the amount of noise you contribute. thanks.

Sorry, didn't mean to imply that you said anything about twice the ram. It's just the common myth that is repeated whenever the discussion comes to "How much swap do I need?".

ptx0 wrote: Sat Nov 20, 2021 4:48 pm edit: to add more info for anyone who actually wants to know why running without swap "works", it's because you're not running mission-critical systems and you've taken stock of the downsides of not having swap (e.g. behaving poorly under memory pressure) and decided that you don't require support from anyone else, because you know better than the Best Practices. this is similar to any precaution anyone takes - like having verified backups or wearing a seatbelt.

The better comparison would be to drive around with a spare tire. It's good to have one. But driving without spare tire does not make the car drive any worse nor does it endanger your live.

Anyway, the presence or lack of swap in linux has nothing to do with factorio leaking memory. That's all on factorio.

DarkShadow44 · Post by **DarkShadow44** » Sat Nov 20, 2021 8:16 pm

Does the issue go away if you use blocking save?

I tried reproducing the problem with vanilla factorio, and it doesn't happen. Non-blocking save at 1min intervals, game.speed=100. It constantly saves, but no memory is leaked.
If you want, you can share your save and we can help investigate.

asheiduk · Post by **asheiduk** » Sat Nov 20, 2021 10:29 pm

Loewchen wrote: Fri Nov 19, 2021 11:10 pm See known issues.

Hmmm. I don't find anything related. Can you provide a direct pointer?

mrvn · Post by **mrvn** » Sun Nov 21, 2021 12:24 am

I tried running factorio in valgrind. Even with a very small factory I only manage 10UPS. I run with and without non-blocking saving with a 1 minute interval and doing 1 save. This is the summary I got:

blocking saving

==28055== HEAP SUMMARY:
==28055== in use at exit: 6,909,950 bytes in 3,454 blocks
==28055== total heap usage: 4,592,592 allocs, 4,589,138 frees, 2,037,177,631 bytes allocated
==28055==
==28055== LEAK SUMMARY:
==28055== definitely lost: 186,009 bytes in 236 blocks
==28055== indirectly lost: 6,219,842 bytes in 617 blocks
==28055== possibly lost: 32,768 bytes in 1 blocks
==28055== still reachable: 471,331 bytes in 2,600 blocks
==28055== suppressed: 0 bytes in 0 blocks
==28055== Rerun with --leak-check=full to see details of leaked memory
==28055==
==28055== For counts of detected and suppressed errors, rerun with: -v
==28055== Use --track-origins=yes to see where uninitialised values come from
==28055== ERROR SUMMARY: 656400 errors from 19 contexts (suppressed: 2 from 2)

async saving

==1965== HEAP SUMMARY:
==1965== in use at exit: 392,750,652 bytes in 909,016 blocks
==1965== total heap usage: 4,182,156 allocs, 3,273,140 frees, 2,023,279,175 bytes allocated
==1965==
==1965== LEAK SUMMARY:
==1965== definitely lost: 306,241 bytes in 2,221 blocks
==1965== indirectly lost: 5,650,947 bytes in 799 blocks
==1965== possibly lost: 4,765,235 bytes in 13,971 blocks
==1965== still reachable: 382,028,229 bytes in 892,025 blocks
==1965== of which reachable via heuristic:
==1965== newarray : 690,976 bytes in 2,061 blocks
==1965== multipleinheritance: 79,368 bytes in 316 blocks
==1965== suppressed: 0 bytes in 0 blocks
==1965== Rerun with --leak-check=full to see details of leaked memory
==1965==
==1965== For counts of detected and suppressed errors, rerun with: -v
==1965== Use --track-origins=yes to see where uninitialised values come from
==1965== ERROR SUMMARY: 649112 errors from 20 contexts (suppressed: 2 from 2)

That's the forked process that does the saving and then exits. The kernel will have freed anything leaked in there.

==32243== HEAP SUMMARY:
==32243== in use at exit: 6,909,494 bytes in 3,450 blocks
==32243== total heap usage: 4,508,162 allocs, 4,504,712 frees, 1,929,692,502 bytes allocated
==32243==
==32243== LEAK SUMMARY:
==32243== definitely lost: 184,753 bytes in 235 blocks
==32243== indirectly lost: 6,158,803 bytes in 611 blocks
==32243== possibly lost: 95,063 bytes in 8 blocks
==32243== still reachable: 470,875 bytes in 2,596 blocks
==32243== suppressed: 0 bytes in 0 blocks
==32243== Rerun with --leak-check=full to see details of leaked memory
==32243==
==32243== For counts of detected and suppressed errors, rerun with: -v
==32243== Use --track-origins=yes to see where uninitialised values come from
==32243== ERROR SUMMARY: 188953 errors from 22 contexts (suppressed: 2 from 2)

The leak summary for the main processes is close enough to call it identical. So there is no big fat "memory leaks HERE" arrow. The "possibly lost" rises from 32kb to 90kb so that might include the leak. One would have to run valgrind with the full mem-check option, a larger save game and probably a number of autosaves to find any pattern in that mess.

I don't expect to have 0 leaks in any sizeable application anymore, modern code quality plain sucks, but the above is a lot. A lot of invalid memory accesses, most of them in the saving code it seems (the error count drops by 2/3rd with non-blocking save). If wube can reduce those 235 definitely lost blocks and then maybe it would become obvious where the leak in non-blocking saves is.

DarkShadow44 · Post by **DarkShadow44** » Sun Nov 21, 2021 12:46 am

asheiduk wrote: Sat Nov 20, 2021 10:29 pm Hmmm. I don't find anything related. Can you provide a direct pointer?

Known Issues Thread wrote: Various issues such as freezing, long saving times or high memory usage when using experimental "Non-blocking saving". Please disable this feature before reporting a bug related to saving. -->

As said, we can't reproduce. Maybe share a save.

Post by **Rseding91** » Sun Nov 21, 2021 10:20 pm

I know of no memory leaks in the game anywhere so I'd be interested in the full valgrind output if you can produce it.

EDIT: that's not true... I know of one in the standard library locale logic that has been reported regularly but there's nothing we can do about it and it leaks something like 40 bytes for the lifetime of the process and it's windows-only so valgrind would not see it.

Post by **Rseding91** » Mon Nov 22, 2021 12:34 am

We have valgrind running nightly and I just checked the latest run. It looks like basically every single thing it thinks is "leaked" is simply wrong. An example: a std::string stored in a static variable. Another example: a heap-allocated object immediatly put into a std::unique_ptr: "leaked" according to valgrind.

So I would be interested to see what it reports for you. It seems at least something is quite wrong with the version we are running thinking random stuff is leaked when I can say for 100% certain that they aren't being leaked.

mrvn · Post by **mrvn** » Mon Nov 22, 2021 9:35 pm

Looks like you reimplement your own malloc and that accesses memory outside the allocated blocks. You probably need to define exceptions for valgrind for this if it's metadata for the memory handling.

The memory leaks seem to be in SDL (just a touch) and your PNG loader. Attached the valgrind output from just starting factorio and quiting right away. Not even loading a game at all.

PS: I tried running a save game as bechmark for a bit and that gave me this:

==15128== HEAP SUMMARY:
==15128== in use at exit: 157,513 bytes in 32 blocks
==15128== total heap usage: 2,516,887 allocs, 2,516,855 frees, 536,868,140 bytes allocated
==15128==
==15128== LEAK SUMMARY:
==15128== definitely lost: 0 bytes in 0 blocks
==15128== indirectly lost: 0 bytes in 0 blocks
==15128== possibly lost: 0 bytes in 0 blocks
==15128== still reachable: 157,513 bytes in 32 blocks
==15128== suppressed: 0 bytes in 0 blocks

That's how I like my applications.

I can't reproduce the leaks on save the original post described. Might be my map is too small, has the wrong entities or it only happens with modded entities.

Post by **Rseding91** » Tue Nov 23, 2021 12:13 am

mrvn wrote: Mon Nov 22, 2021 9:35 pm Looks like you reimplement your own malloc and that accesses memory outside the allocated blocks. You probably need to define exceptions for

But we didn't/don't. We just call "new" (or std::make_unique in most cases) and the standard library implementation gets used (malloc by default).

Post by **Rseding91** » Tue Nov 23, 2021 12:15 am

mrvn wrote: Mon Nov 22, 2021 9:35 pm The memory leaks seem to be in SDL (just a touch) and your PNG loader. Attached the valgrind output from just starting factorio and quiting right away. Not even loading a game at all.

Ah... SDL isn't our code and the PNG loader for linux is not used on Windows. Windows uses GDI+.

I did find and fix 1 memory leak with SDL in the past so it's not too surprising that there are others. It is C after all and there are no destructors so it's all up to who ever wrote the code to make sure everything malloc-ed is free-d and it seems that isn't the case.

mrvn · Post by **mrvn** » Tue Nov 23, 2021 12:22 am

If you didn't define your own malloc stuff then it might be real invalid reads and out of range memory accesses. Something to look into.

Post by **Rseding91** » Wed Nov 24, 2021 7:58 pm

The PNG loader one I fixed in this latest release (I got it compiling on windows and tested it + fixed it). It just didn't get mentioned in the changelog. It was not a ongoing leak just it didn't release memory it allocated during startup so the whole process would use more than required runtime.

Factorio Forums