better compression algorithm for saves and mods

Post your ideas and suggestions how to improve the game.

Moderator: ickputzdirwech

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

better compression algorithm for saves and mods

Post by ptx0 »

using XZ compression and default settings from GNU tar, recompressing all elements in a desync report cut the size in half which is the difference between being able to share it with the developers, or being unable to.

my LTE connection limits me to 64KiB/s after I hit a bandwidth limit and after this time it's a pain to connect to the server due to the save file size which can grow to ~74MiB (the largest I have done, personally) and I've seen other players with even larger, 115MiB save files - this is without replay.dat enabled, too.

using XZ compression can reduce a 60MiB save file down to just 12MiB.

there are better compression algorithms out there, for example, zstd compression is dictionary based. we can certainly make a custom dictionary for zstd to use that is fine-crafted for factorio and this will help reduce save file size even further.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

Nemo4809
Long Handed Inserter
Long Handed Inserter
Posts: 94
Joined: Thu Jan 16, 2020 10:49 am
Contact:

Re: better compression algorithm for saves and mods

Post by Nemo4809 »

So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.

posila
Factorio Staff
Factorio Staff
Posts: 5111
Joined: Thu Jun 11, 2015 1:35 pm
Contact:

Re: better compression algorithm for saves and mods

Post by posila »

Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?
To be fair ... when server is sending save to client or client sending desync report to the server, recompression is not an option.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.
well, have you ever looked inside a save file or desync report? there's some dat files that are probably 3-4x the size of the ZIP file and they do compress nicely, but we can do better.

the referenced compression algorithms are competitive for speed and multi-core capabilities. much more so than zip.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

Nemo4809
Long Handed Inserter
Long Handed Inserter
Posts: 94
Joined: Thu Jan 16, 2020 10:49 am
Contact:

Re: better compression algorithm for saves and mods

Post by Nemo4809 »

posila wrote:
Thu Mar 12, 2020 8:25 pm
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?
To be fair ... when server is sending save to client or client sending desync report to the server, recompression is not an option.
Oh. Thought he was manually sending stuff.
ptx0 wrote:
Thu Mar 12, 2020 8:38 pm
Nemo4809 wrote:
Thu Mar 12, 2020 8:14 pm
So just re-compress it?

I won’t be surprised if the default compression algorithm and strength is chosen more for its speed (when saving) than its effectiveness and it’s really more of a way to package multiple files into a single file than about saving space.
well, have you ever looked inside a save file or desync report? there's some dat files that are probably 3-4x the size of the ZIP file and they do compress nicely, but we can do better.

the referenced compression algorithms are competitive for speed and multi-core capabilities. much more so than zip.
Well, I guessing then the developers went with zip for speed - I looked up xz; it uses LZMA compression which is slower than zip - and figured the compression ratio was good enough vs processing time required. Maybe they can add a separate compressor for things that need to be transferred over the internet as an option - for people with really tight bandwidth limitations/data caps.

Zanthra
Fast Inserter
Fast Inserter
Posts: 206
Joined: Fri Mar 25, 2016 8:18 am
Contact:

Re: better compression algorithm for saves and mods

Post by Zanthra »

ptx0 wrote:
Thu Mar 12, 2020 7:05 pm
using XZ compression and default settings from GNU tar, recompressing all elements in a desync report cut the size in half which is the difference between being able to share it with the developers, or being unable to.
Do keep in mind that tar xz makes a solid archive. The individual files contained within are not compressed separately, and to read any individual file, all preceding files may need to be decompressed (or at least the decompressor must process the data for them to be in the appropriate state). It would be interesting to know what size the data would be if you put it in a tar without compression then in a zip file.

That is to say it may not be the algorithm as much as how it is applied.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

so i was playing space exploration and the 300MiB map really makes multiplayer difficult, but XZ compression - though slower - reduces the save file and makes it playable. but sharing singleplayer maps with friends for remote play is getting old, can we have an option to use a different algorithm? I suppose it doesn't even need to be the ones I've proposed. and it needn't be default option, much like the unsupported "non-blocking save", if the performance is such a concern - but it's not a major performance difference, in my testing.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

can we get some kind of official feedback from this idea? i could even submit the PR myself.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

Jap2.0
Smart Inserter
Smart Inserter
Posts: 2333
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: better compression algorithm for saves and mods

Post by Jap2.0 »

Just out of curiosity, what/are there any compressed file formats other than zip Windows/MacOS can open natively? (Not saying that everyone shouldn't have 7-zip or something installed....) It's nice to be able to just open them natively on the OS level (although granted the people who will be manually opening Factorio saves are probably the same group of people who know how to use 7-zip). Probably not a good idea to use a different format for local saving vs. multiplayer and desyncs.
There are 10 types of people: those who get this joke and those who don't.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

Jap2.0 wrote:
Sun May 03, 2020 6:17 pm
Just out of curiosity, what/are there any compressed file formats other than zip Windows/MacOS can open natively? (Not saying that everyone shouldn't have 7-zip or something installed....) It's nice to be able to just open them natively on the OS level (although granted the people who will be manually opening Factorio saves are probably the same group of people who know how to use 7-zip). Probably not a good idea to use a different format for local saving vs. multiplayer and desyncs.
the native zip/cab browser/archiver isn't even recommended for production use due to numerous issues with it: https://superuser.com/questions/476740/ ... 366#481366

https://superuser.com/questions/566123/ ... 207#566207
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

quyxkh
Filter Inserter
Filter Inserter
Posts: 926
Joined: Sun May 08, 2016 9:01 am
Contact:

Re: better compression algorithm for saves and mods

Post by quyxkh »

I just did a quick check on a 200+MB save, a .tar.xz wound up only 25% smaller than the .zip. A desync report is basically just two saves and some notes, right? Doesn't look like the astoundingly-slow compression will save enough to get many reports across any particular feasibility barrier. xz produces <1MB/s of compressed data here, multiplayer compression is counterproductive if you can't keep the pipes full. I'd call it "not competitive here".

Looks like factorio maps is one of the cases where zstd's only marginally better at compressing, it'll do somewhat worse lots faster, about the same a little faster, so it's better on this workload by pretty much any metric but not a whole lot better, going on the quick checks I wouldn't see much value in switching.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

quyxkh wrote:
Mon May 04, 2020 1:48 am
I just did a quick check on a 200+MB save, a .tar.xz wound up only 25% smaller than the .zip. A desync report is basically just two saves and some notes, right? Doesn't look like the astoundingly-slow compression will save enough to get many reports across any particular feasibility barrier. xz produces <1MB/s of compressed data here, multiplayer compression is counterproductive if you can't keep the pipes full. I'd call it "not competitive here".
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

prdfndr
Inserter
Inserter
Posts: 45
Joined: Sat Mar 21, 2020 6:23 pm
Contact:

Re: better compression algorithm for saves and mods

Post by prdfndr »

+1.
I can definitely see a benefit of better compression for multiplayer in heavily modded games (the question though is "what is better?"). Space Exploration games easily go over 100 MB, and sometimes reach even 300MB. It is very painful to connect to them, and I saw many people stopping playing only because they either could not connect or it was taking too much time.

Ragu
Burner Inserter
Burner Inserter
Posts: 5
Joined: Sat Nov 24, 2018 10:40 am
Contact:

Re: better compression algorithm for saves and mods

Post by Ragu »

i like this idea
i have friend with bad internet and he can't play big maps with me and other friends (he leave our SE game when save was about 70+ MB, at end we had about 150 MB), so if i can set option for server for better compression - it will be nice

User avatar
steinio
Smart Inserter
Smart Inserter
Posts: 2595
Joined: Sat Mar 12, 2016 4:19 pm
Contact:

Re: better compression algorithm for saves and mods

Post by steinio »

I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.
Image
Transport Belt Repair Man

View unread Posts

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

steinio wrote:
Fri May 08, 2020 6:43 pm
I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.
writing hundreds of MiB/s to disk can be a big problem. compression means writing less to disk, so it can be faster. serializing is a b i g issue with save time, it's the huge pause before the save progress bar begins to fill. of course an option to disable compression would be useful, though.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

Rseding91
Factorio Staff
Factorio Staff
Posts: 11972
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: better compression algorithm for saves and mods

Post by Rseding91 »

ptx0 wrote:
Fri May 08, 2020 7:16 pm
steinio wrote:
Fri May 08, 2020 6:43 pm
I vote for compressing saves only for multiplayer games.

In my single player games just store the uncompressed data to accelerate saving.
writing hundreds of MiB/s to disk can be a big problem. compression means writing less to disk, so it can be faster. serializing is a b i g issue with save time, it's the huge pause before the save progress bar begins to fill. of course an option to disable compression would be useful, though.
The pause before the bar begins to fill is saving force data, logistic network data, path-finder data, and active entity order on chunks. It's all memory/latency bound and not related to compression or disk access speeds.
If you want to get ahold of me I'm almost always on Discord.

Jap2.0
Smart Inserter
Smart Inserter
Posts: 2333
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: better compression algorithm for saves and mods

Post by Jap2.0 »

ptx0 wrote:
Fri May 08, 2020 5:34 pm
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.
There are 10 types of people: those who get this joke and those who don't.

User avatar
ptx0
Smart Inserter
Smart Inserter
Posts: 1112
Joined: Wed Jan 01, 2020 7:16 pm
Contact:

Re: better compression algorithm for saves and mods

Post by ptx0 »

Jap2.0 wrote:
Fri May 08, 2020 11:20 pm
ptx0 wrote:
Fri May 08, 2020 5:34 pm
saving can be set up to only occur on the server, it can even be non-blocking save. throughput is a non-issue.
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.
well it doesn't exist on windows and yet the option exists for people who need it, just like compression could be. and in several situations, CPU time and wall clock time are cheaper than network bandwidth. I have data caps.
My Mods - Fish Per Minute base size metric - Use your crashed spaceship as a belt balancer?
• • •
Base: Bob's @ 1 Million SPM
• • •
Linear search and overflows are indicative of sloppy coding practices.

Jap2.0
Smart Inserter
Smart Inserter
Posts: 2333
Joined: Tue Jun 20, 2017 12:02 am
Contact:

Re: better compression algorithm for saves and mods

Post by Jap2.0 »

ptx0 wrote:
Sat May 09, 2020 1:22 am
Jap2.0 wrote:
Fri May 08, 2020 11:20 pm
* No non-blocking saving on Windows last I checked
* Throughput might not be an issue for non-blocking autosaves, but potentially multiple minutes is unacceptable if it's blocking.
* How long the saving takes directly impacts how long it will take to join a server - both for the initial wait, and the amount of catching up necessary.
well it doesn't exist on windows and yet the option exists for people who need it, just like compression could be.
So can you tell me exactly when this is going to be default, optional, and not available? (Obviously decompression should be available everywhere.)

and in several situations, CPU time and wall clock time are cheaper than network bandwidth. I have data caps.
It's tradeoffs all the way down. How much time is it worth? A minute? Five? Ten? An hour? There's no way to say that one setting will be better or worse for everyone. Keep in mind that the longer you wait to load the save, the more catch-up data that has to be sent over the network.
There are 10 types of people: those who get this joke and those who don't.

Post Reply

Return to “Ideas and Suggestions”