Page 1 of 2

better compression algorithm for saves and mods

Posted: Thu Mar 12, 2020 7:05 pm
by ptx0
using XZ compression and default settings from GNU tar, recompressing all elements in a desync report cut the size in half which is the difference between being able to share it with the developers, or being unable to.

my LTE connection limits me to 64KiB/s after I hit a bandwidth limit and after this time it's a pain to connect to the server due to the save file size which can grow to ~74MiB (the largest I have done, personally) and I've seen other players with even larger, 115MiB save files - this is without replay.dat enabled, too.

using XZ compression can reduce a 60MiB save file down to just 12MiB.

there are better compression algorithms out there, for example, zstd compression is dictionary based. we can certainly make a custom dictionary for zstd to use that is fine-crafted for factorio and this will help reduce save file size even further.

Re: better compression algorithm for saves and mods

Posted: Thu Mar 12, 2020 8:14 pm
by Nemo4809
So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.

Re: better compression algorithm for saves and mods

Posted: Thu Mar 12, 2020 8:25 pm
by posila
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?
To be fair ... when server is sending save to client or client sending desync report to the server, recompression is not an option.

Re: better compression algorithm for saves and mods

Posted: Thu Mar 12, 2020 8:38 pm
by ptx0
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.
well, have you ever looked inside a save file or desync report? there's some dat files that are probably 3-4x the size of the ZIP file and they do compress nicely, but we can do better.

the referenced compression algorithms are competitive for speed and multi-core capabilities. much more so than zip.

Re: better compression algorithm for saves and mods

Posted: Fri Mar 13, 2020 12:33 am
by Nemo4809
posila wrote:
Thu Mar 12, 2020 8:25 pm
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?
To be fair ... when server is sending save to client or client sending desync report to the server, recompression is not an option.
Oh. Thought he was manually sending stuff.
ptx0 wrote:
Thu Mar 12, 2020 8:38 pm
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.
well, have you ever looked inside a save file or desync report? there's some dat files that are probably 3-4x the size of the ZIP file and they do compress nicely, but we can do better.

the referenced compression algorithms are competitive for speed and multi-core capabilities. much more so than zip.
Well, I guessing then the developers went with zip for speed - I looked up xz; it uses LZMA compression which is slower than zip - and figured the compression ratio was good enough vs processing time required. Maybe they can add a separate compressor for things that need to be transferred over the internet as an option - for people with really tight bandwidth limitations/data caps.

Re: better compression algorithm for saves and mods

Posted: Fri Mar 13, 2020 1:02 am
by Zanthra
ptx0 wrote:
Thu Mar 12, 2020 7:05 pm
using XZ compression and default settings from GNU tar, recompressing all elements in a desync report cut the size in half which is the difference between being able to share it with the developers, or being unable to.
Do keep in mind that tar xz makes a solid archive. The individual files contained within are not compressed separately, and to read any individual file, all preceding files may need to be decompressed (or at least the decompressor must process the data for them to be in the appropriate state). It would be interesting to know what size the data would be if you put it in a tar without compression then in a zip file.

That is to say it may not be the algorithm as much as how it is applied.

Re: better compression algorithm for saves and mods

Posted: Thu Apr 16, 2020 4:45 am
by ptx0
so i was playing space exploration and the 300MiB map really makes multiplayer difficult, but XZ compression - though slower - reduces the save file and makes it playable. but sharing singleplayer maps with friends for remote play is getting old, can we have an option to use a different algorithm? I suppose it doesn't even need to be the ones I've proposed. and it needn't be default option, much like the unsupported "non-blocking save", if the performance is such a concern - but it's not a major performance difference, in my testing.

Re: better compression algorithm for saves and mods

Posted: Sun May 03, 2020 3:54 pm
by ptx0
can we get some kind of official feedback from this idea? i could even submit the PR myself.

Re: better compression algorithm for saves and mods

Posted: Sun May 03, 2020 6:17 pm
by Jap2.0
Just out of curiosity, what/are there any compressed file formats other than zip Windows/MacOS can open natively? (Not saying that everyone shouldn't have 7-zip or something installed....) It's nice to be able to just open them natively on the OS level (although granted the people who will be manually opening Factorio saves are probably the same group of people who know how to use 7-zip). Probably not a good idea to use a different format for local saving vs. multiplayer and desyncs.

Re: better compression algorithm for saves and mods

Posted: Sun May 03, 2020 11:30 pm
by ptx0
Jap2.0 wrote:
Sun May 03, 2020 6:17 pm
Just out of curiosity, what/are there any compressed file formats other than zip Windows/MacOS can open natively? (Not saying that everyone shouldn't have 7-zip or something installed....) It's nice to be able to just open them natively on the OS level (although granted the people who will be manually opening Factorio saves are probably the same group of people who know how to use 7-zip). Probably not a good idea to use a different format for local saving vs. multiplayer and desyncs.
the native zip/cab browser/archiver isn't even recommended for production use due to numerous issues with it: https://superuser.com/questions/476740/ ... 366#481366

https://superuser.com/questions/566123/ ... 207#566207

Re: better compression algorithm for saves and mods

Posted: Mon May 04, 2020 1:48 am
by quyxkh
I just did a quick check on a 200+MB save, a .tar.xz wound up only 25% smaller than the .zip. A desync report is basically just two saves and some notes, right? Doesn't look like the astoundingly-slow compression will save enough to get many reports across any particular feasibility barrier. xz produces <1MB/s of compressed data here, multiplayer compression is counterproductive if you can't keep the pipes full. I'd call it "not competitive here".

Looks like factorio maps is one of the cases where zstd's only marginally better at compressing, it'll do somewhat worse lots faster, about the same a little faster, so it's better on this workload by pretty much any metric but not a whole lot better, going on the quick checks I wouldn't see much value in switching.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 5:34 pm
by ptx0
quyxkh wrote:
Mon May 04, 2020 1:48 am
I just did a quick check on a 200+MB save, a .tar.xz wound up only 25% smaller than the .zip. A desync report is basically just two saves and some notes, right? Doesn't look like the astoundingly-slow compression will save enough to get many reports across any particular feasibility barrier. xz produces <1MB/s of compressed data here, multiplayer compression is counterproductive if you can't keep the pipes full. I'd call it "not competitive here".
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 5:50 pm
by prdfndr
+1.
I can definitely see a benefit of better compression for multiplayer in heavily modded games (the question though is "what is better?"). Space Exploration games easily go over 100 MB, and sometimes reach even 300MB. It is very painful to connect to them, and I saw many people stopping playing only because they either could not connect or it was taking too much time.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 6:33 pm
by Ragu
i like this idea
i have friend with bad internet and he can't play big maps with me and other friends (he leave our SE game when save was about 70+ MB, at end we had about 150 MB), so if i can set option for server for better compression - it will be nice

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 6:43 pm
by steinio
I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 7:16 pm
by ptx0
steinio wrote:
Fri May 08, 2020 6:43 pm
I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.
writing hundreds of MiB/s to disk can be a big problem. compression means writing less to disk, so it can be faster. serializing is a b i g issue with save time, it's the huge pause before the save progress bar begins to fill. of course an option to disable compression would be useful, though.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 10:25 pm
by Rseding91
ptx0 wrote:
Fri May 08, 2020 7:16 pm
steinio wrote:
Fri May 08, 2020 6:43 pm
I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.
writing hundreds of MiB/s to disk can be a big problem. compression means writing less to disk, so it can be faster. serializing is a b i g issue with save time, it's the huge pause before the save progress bar begins to fill. of course an option to disable compression would be useful, though.
The pause before the bar begins to fill is saving force data, logistic network data, path-finder data, and active entity order on chunks. It's all memory/latency bound and not related to compression or disk access speeds.

Re: better compression algorithm for saves and mods

Posted: Fri May 08, 2020 11:20 pm
by Jap2.0
ptx0 wrote:
Fri May 08, 2020 5:34 pm
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.

Re: better compression algorithm for saves and mods

Posted: Sat May 09, 2020 1:22 am
by ptx0
Jap2.0 wrote:
Fri May 08, 2020 11:20 pm
ptx0 wrote:
Fri May 08, 2020 5:34 pm
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.
well it doesn't exist on windows and yet the option exists for people who need it, just like compression could be. and in several situations, CPU time and wall clock time are cheaper than network bandwidth. I have data caps.

Re: better compression algorithm for saves and mods

Posted: Sat May 09, 2020 1:32 am
by Jap2.0
ptx0 wrote:
Sat May 09, 2020 1:22 am
Jap2.0 wrote:
Fri May 08, 2020 11:20 pm
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.
well it doesn't exist on windows and yet the option exists for people who need it, just like compression could be.
So can you tell me exactly when this is going to be default, optional, and not available? (Obviously decompression should be available everywhere.)

and in several situations, CPU time and wall clock time are cheaper than network bandwidth. I have data caps.
It's tradeoffs all the way down. How much time is it worth? A minute? Five? Ten? An hour? There's no way to say that one setting will be better or worse for everyone. Keep in mind that the longer you wait to load the save, the more catch-up data that has to be sent over the network.