Caching for multiplayer map download

Post your ideas and suggestions how to improve the game.

Moderator: ickputzdirwech

DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

ssilk wrote: ↑Sun Nov 07, 2021 6:18 am More than 300 mb. And now when I think about it, it was just so slow when I updated the mods or a new version did came out - so that the game needs to filter every single tile. In normal case load took about 20 seconds, save perhaps 60. When updating it could take 2-10 minutes. And because I thought save cannot be faster than load I added this 2/3 ratio to the save. I should be more careful with those numbers. But what I want to say is, that multiplayer is not the only problem when we talk about load/save.
Sure, but apart from not saving stuff, I don't think there's a lot to be done here. Although I admit I don't know how optimized save/load is. Maybe it could be made faster by using faster but less efficient compression. zstd for example has more options for very fast compression than zlib. For other code parts, I can't tell about optimization.
ssilk wrote: ↑Sun Nov 07, 2021 6:18 am About speculation what the best method is: useless discussion. :)
I repeat: it is much too complex to say method x is the best. To be sure we need to program a simulator which uses the internal routines, simulation of network bandwidth/errors, simulation of some hardware and need to think about how to divide the behavior of the simulation from the results. Fun for more than a year in a team of minimum 4 full-time developers. 8-)
Eh, for someone with source access it probably takes only a few weeks to whip something up and get some ballpark numbers. Unfortunately, I'm not part of that group, so I have to continue speculating :D
Tertius
Filter Inserter
Filter Inserter
Posts: 947
Joined: Fri Mar 19, 2021 5:58 pm
Contact:

Re: Caching for multiplayer map download

Post by Tertius »

How about incremental saves? If it's possible for the engine to mark a chunk as "modified" if there is a change in it, and the game saves the state of chunks independently, it might be possible to collect only changed chunks into a save.
The first save contains contains everything.
Subsequent saves contain the chunks modified since the previous save.
This results in a chain of one full save, then a number of incremental saves.
A new client downloads the whole chain and caches it.
If it disconnects and reconnects later, it needs to download only the incremental saves since his last download.
A save is generated as full, if more than 50% of the chunks are modified or if the added size of the incrementals of the current chain is bigger than the last full save, otherwise as incremental.

Depending on the complexity of a savefile, it might be possible to merge a chain to a new full save offline, so the server can host the current game state as full savegame (for new clients) as well as an endless chain of incrementals where the client only downloads the incrementals he is missing.
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

Tertius wrote: ↑Sun Nov 07, 2021 3:20 pm How about incremental saves? If it's possible for the engine to mark a chunk as "modified" if there is a change in it, and the game saves the state of chunks independently, it might be possible to collect only changed chunks into a save.
The first save contains contains everything.
Subsequent saves contain the chunks modified since the previous save.
This results in a chain of one full save, then a number of incremental saves.
A new client downloads the whole chain and caches it.
If it disconnects and reconnects later, it needs to download only the incremental saves since his last download.
A save is generated as full, if more than 50% of the chunks are modified or if the added size of the incrementals of the current chain is bigger than the last full save, otherwise as incremental.

Depending on the complexity of a savefile, it might be possible to merge a chain to a new full save offline, so the server can host the current game state as full savegame (for new clients) as well as an endless chain of incrementals where the client only downloads the incrementals he is missing.
Already better covered by saving the last modified at <tick>.
User avatar
ssilk
Global Moderator
Global Moderator
Posts: 12889
Joined: Tue Apr 16, 2013 10:35 pm
Contact:

Re: Caching for multiplayer map download

Post by ssilk »

DarkShadow44 wrote: ↑Sun Nov 07, 2021 12:21 pm Maybe it could be made faster by using faster but less efficient compression. zstd for example has more options for very fast compression than zlib. For other code parts, I can't tell about optimization.
We have discussed that already. viewtopic.php?f=6&t=34273
Short answer: no.
I have to continue speculating :D
You can do that as long as this forum exists and has space left to store it. 8-)


BTW I found some old article viewtopic.php?p=169194#p169194 , where I suggested to use delta to store the diffs between saves.
ssilk wrote: ↑Wed Jun 15, 2016 5:12 pm I see here the usage of a repository and a version control system.

I try to explain. :)

You can diff two saves. There are programs like xdelta (see viewtopic.php?f=6&t=23014 ), that enables saving of the difference between two saves. Which is much smaller, than the full save, but depends logically on a previous full save.

The dependencies of the save look then like so:

Code: Select all

1st Full Save                                      |
   |                                               | time
   v                                               v
2nd Full Save  -
   |            |
   |            v
   |         2nd Full Save XDelta version #1
   |            |
   |            v
   |         2nd Full Save XDelta version #2
   |
   v
3rd Full Save
Now the cool thing (and side-idea): the xdeltas can be used to synchronize multiplayer games: the new player loads a full save while the others are playing. And then the xdeltas...

^-^
Can be used as Tertius suggested.

That post points also to another thread
viewtopic.php?f=66&t=23014 More efficient multiplayer map download
Cool suggestion: Eatable MOUSE-pointers.
Have you used the Advanced Search today?
Need help, question? FAQ - Wiki - Forum help
I still like small signatures...
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

ssilk wrote: ↑Mon Nov 08, 2021 5:09 am BTW I found some old article viewtopic.php?p=169194#p169194 , where I suggested to use delta to store the diffs between saves.
This would help keep save sizes low, but wouldn't make saving/loading any faster. If anything, it would add overhead.
ssilk wrote: ↑Mon Nov 08, 2021 5:09 am
Now the cool thing (and side-idea): the xdeltas can be used to synchronize multiplayer games: the new player loads a full save while the others are playing. And then the xdeltas...
That's already covered with the checksum/tick diff idea, which would probably work a lot better. No need to make sure all players have the exact same deltas.
ssilk wrote: ↑Mon Nov 08, 2021 5:09 am That post points also to another thread
viewtopic.php?f=66&t=23014 More efficient multiplayer map download
That idea is not really related to my idea here, since it doesn't help the player get faster into game - it just allows others to continue playing. Which we have anyways, during the download players on the server can keep playing, no? Afterwards the new player catches up.
SoShootMe
Filter Inserter
Filter Inserter
Posts: 517
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Caching for multiplayer map download

Post by SoShootMe »

ssilk wrote: ↑Mon Nov 08, 2021 5:09 am BTW I found some old article viewtopic.php?p=169194#p169194 , where I suggested to use delta to store the diffs between saves.
...
That post points also to another thread
viewtopic.php?f=66&t=23014 More efficient multiplayer map download
Prompted by the current thread, I tried smilar to your second link, using two saves of the same game about an hour apart, where I did very little (no exploration, basically just bots repairing/building a few hundred entities) but of course there was mining and production going on - around 200k/m each of iron and copper ore.

For each save, I unzipped the file, replaced the level.datN files with decompressed versions and created an uncompressed tar of the result with the level.datN files in order (the best, simple thing I could think of to maximise similarity). Then I used xdelta3 to generate a binary diff of the two tar files, compressed with gzip -9.
Details
The result in my case was about half the size of each of the original saves but it didn't have a particularly high proportion of explored but unused chunks, a map like that would probably do better.

In reality however I think it's more practical to work more like rsync, so that it's not necessary for the server to have the exact client's state as xdelta would require to generate the diff. But using rsync --only-write-batch against the tar files above, most of the data was literal. The size of the batch should roughly reflect the amount of data that would be transferred between the Factorio server and client to bring the client up-to-date but even heavily compressed, it was around 75% of the original save size.

As a result I think a worthwhile benefit may require either the data format to change to enable greater efficiency with a generic tool like rsync or (as was suggested previously) a Factorio-specific implementation using a similar concept. I think the former would have performance implications, at least, and the latter probably a lot of work, so I wouldn't hold my breath for either. Even then it may not be a huge benefit; apart from any computational overhead, it depends on relative amount of changed vs unchanged data - tile types and locations of placed entities don't change much, but pretty much all other state (by size) is unlikely to be the same between a client disconnecting and rejoining a game on a server.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

SoShootMe wrote: ↑Mon Nov 08, 2021 1:24 pm As a result I think a worthwhile benefit may require either the data format to change to enable greater efficiency with a generic tool like rsync or (as was suggested previously) a Factorio-specific implementation using a similar concept. I think the former would have performance implications, at least, and the latter probably a lot of work, so I wouldn't hold my breath for either.
I don't think a generic solution could be as good as a specific implementation - there's a lot of details we know about chunks that rsync can't know about. Why do you think it would be a lot of work? I don't think it should be too much effort. The implementation I mean, there'd be a need for rigorous testing in both cases, I think.
SoShootMe wrote: ↑Mon Nov 08, 2021 1:24 pm Even then it may not be a huge benefit; apart from any computational overhead, it depends on relative amount of changed vs unchanged data - tile types and locations of placed entities don't change much, but pretty much all other state (by size) is unlikely to be the same between a client disconnecting and rejoining a game on a server.
Entities change properties though. That's why I proposed in the OP to only send non-factory data.
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

Here is another thing to consider: Chunks with entities change a lot. Basically every tick for most of them. But that doesn't mean everything in the chunk changes.

There are a lot of things in entities that generally remain constant. Like the inserter has a pickup and dropoff location. Belts have a direction. Assemblers have a recipe. Chests have a reserved space. Cargo wagons have item filters. Combinators have settings. All of those can change but generally once set up they don't. But if they are mixed in with all the other data, like the hand content of an isnerter, they will have to be transmitted. Checksuming or last changed tick won't help there.

So data should be split in a few categories. Something like:

- landscape data - changed by on_chunk_generated event, landfill, waterfill
- decorations, tiles, cliffs, entity structures, resources
- entity configuration and settings
- entity state, resource counts

The landscape mostly doesn't change at all while the entity state would change tick to tick. If they are compared separately then more unchanged data can be detected. So maybe chunks shouldn't have one checksum but 4 cheap ones with a final secure checksum for all data to catch collisions on the cheap checksums.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

mrvn wrote: ↑Mon Nov 08, 2021 3:19 pm There are a lot of things in entities that generally remain constant. Like the inserter has a pickup and dropoff location. Belts have a direction. Assemblers have a recipe. Chests have a reserved space. Cargo wagons have item filters. Combinators have settings. All of those can change but generally once set up they don't. But if they are mixed in with all the other data, like the hand content of an isnerter, they will have to be transmitted. Checksuming or last changed tick won't help there.
Possibly, though I didn't want to go into so much detail.
I did two short tests with a megabase, measuring the difference between the normal map and having all entities deleted: In short, factory entities don't seem too big, although we could still save a bit. But you need to keep in mind the overhead from the checksums. And that, when you remove data from that save, it could compress worse so you get different results there.
Though I don't know how easy it would be to separate different details from different entities... Taking out trees/tiles should be comparatively easy, while each factory entity would need special handing. To me it doesn't seem worth the effort, maybe as a second step after initial diffs.
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

Separating entity main data and state might not yield that much. But separating them from the landscape and tiles should yield quite a lot.

But lets say you reconnect after a desync, which is probably the most annoying for all. You've got all the landscape and tiles already. So you only need the entities and items. If we can half the remaining data for the entities that would be worth it.

As for your test base: That seems like a bad test. The map is all blank and should compress really well. So I'm not even going to ask how much was left after removing those 10/30MB entities.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

mrvn wrote: ↑Tue Nov 09, 2021 1:13 am Separating entity main data and state might not yield that much. But separating them from the landscape and tiles should yield quite a lot.
Yes, but wasn't that my idea from the start?
mrvn wrote: ↑Tue Nov 09, 2021 1:13 am As for your test base: That seems like a bad test. The map is all blank and should compress really well. So I'm not even going to ask how much was left after removing those 10/30MB entities.
Why is that a bad test? Of course landscape data doesn't use much data in those maps, that's the whole point. The variation due to landscape data should be minimal.
SoShootMe
Filter Inserter
Filter Inserter
Posts: 517
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Caching for multiplayer map download

Post by SoShootMe »

DarkShadow44 wrote: ↑Mon Nov 08, 2021 2:10 pm I don't think a generic solution could be as good as a specific implementation - there's a lot of details we know about chunks that rsync can't know about. Why do you think it would be a lot of work? I don't think it should be too much effort. The implementation I mean, there'd be a need for rigorous testing in both cases, I think.
Yes, a specific implementation would be better but short of encoding a chunk as a diff from when it was first generated (probably slower than sending the data), I think if you get the serialisation "right", rsync might be quite close.

You may be right on the amount of work; I was including testing but either way that ought to be the majority. But by that I mean I might have underestimated the work for changing serialisation to better suit eg rsync, rather than overestimating the work for a specific implementation.
DarkShadow44 wrote: ↑Mon Nov 08, 2021 2:10 pm Entities change properties though. That's why I proposed in the OP to only send non-factory data.
The fact that some things change frequently was essentially my point: the potential benefit is limited to the proportion of total size that is due to data that haven't changed (ie typically most of the rarely changing data, and little/none of the frequently changing data). The lower that is (also the less it is reduced and the faster a client's download speed), the less the benefit is.

The proportion will obviously vary but after a bit more experimentation I think it is likely to be at least 60% in a real game, usually more and especially so for a sparsely populated map, so I reckon it is worthwhile.
mrvn wrote: ↑Mon Nov 08, 2021 3:19 pm There are a lot of things in entities that generally remain constant. Like the inserter has a pickup and dropoff location. Belts have a direction. Assemblers have a recipe. Chests have a reserved space. Cargo wagons have item filters. Combinators have settings. All of those can change but generally once set up they don't. But if they are mixed in with all the other data, like the hand content of an isnerter, they will have to be transmitted.
Yeah, my hunch is the frequently and rarely changing data of entities (perhaps also things like tile type and resource amount) are mixed together when serialised, and with rsync that means no matches. Two passes, one to serialise the rarely changing properties, and another for the frequently changing ones, might work well (basically an extension of the original idea) so that data likely to be the same and likely to be different are grouped together. That requires deciding which properties should be serialised in each pass (possibly none in one of them).
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

SoShootMe wrote: ↑Tue Nov 09, 2021 12:36 pm Yes, a specific implementation would be better but short of encoding a chunk as a diff from when it was first generated (probably slower than sending the data), I think if you get the serialisation "right", rsync might be quite close.
Not sure, I guess I'll run a few tests on the current save format myself first ;)
Small caveat: You'd need another dependency, while the other approach would be self-contained.
SoShootMe wrote: ↑Tue Nov 09, 2021 12:36 pm The fact that some things change frequently was essentially my point: the potential benefit is limited to the proportion of total size that is due to data that haven't changed (ie typically most of the rarely changing data, and little/none of the frequently changing data). The lower that is (also the less it is reduced and the faster a client's download speed), the less the benefit is.

The proportion will obviously vary but after a bit more experimentation I think it is likely to be at least 60% in a real game, usually more and especially so for a sparsely populated map, so I reckon it is worthwhile.
How did you come to that conclusion? For me, entity data is always a small part. Tested by checking save size normally vs all entities deleted. Although I won't exclude the possibility of having done something wrong here.
SoShootMe wrote: ↑Tue Nov 09, 2021 12:36 pm Yeah, my hunch is the frequently and rarely changing data of entities (perhaps also things like tile type and resource amount) are mixed together when serialised, and with rsync that means no matches. Two passes, one to serialise the rarely changing properties, and another for the frequently changing ones, might work well (basically an extension of the original idea) so that data likely to be the same and likely to be different are grouped together. That requires deciding which properties should be serialised in each pass (possibly none in one of them).
That's probably right. Hard to tell how well it would work though. Would love to play around with that idea using the real code. Anyone made a working save editor yet? :lol:
SoShootMe
Filter Inserter
Filter Inserter
Posts: 517
Joined: Mon Aug 03, 2020 4:16 pm
Contact:

Re: Caching for multiplayer map download

Post by SoShootMe »

DarkShadow44 wrote: ↑Tue Nov 09, 2021 2:35 pm
SoShootMe wrote: ↑Tue Nov 09, 2021 12:36 pm The proportion [of total size that is due to data that haven't changed] will obviously vary but after a bit more experimentation I think it is likely to be at least 60% in a real game, usually more and especially so for a sparsely populated map, so I reckon it is worthwhile.
How did you come to that conclusion? For me, entity data is always a small part. Tested by checking save size normally vs all entities deleted. Although I won't exclude the possibility of having done something wrong here.
I wrote "at least 60%" :). That's a bit less than the lowest I found, by doing exactly what you describe (I can't see a better way except for reverse engineering the file format).
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

DarkShadow44 wrote: ↑Tue Nov 09, 2021 10:26 am
mrvn wrote: ↑Tue Nov 09, 2021 1:13 am Separating entity main data and state might not yield that much. But separating them from the landscape and tiles should yield quite a lot.
Yes, but wasn't that my idea from the start?
mrvn wrote: ↑Tue Nov 09, 2021 1:13 am As for your test base: That seems like a bad test. The map is all blank and should compress really well. So I'm not even going to ask how much was left after removing those 10/30MB entities.
Why is that a bad test? Of course landscape data doesn't use much data in those maps, that's the whole point. The variation due to landscape data should be minimal.
Because it doesn't allow comparing the size of the enmtity data to the rest of the save. Is the landscape below those entities 1MB or 100MB? That makes a big difference.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

mrvn wrote: ↑Tue Nov 09, 2021 9:18 pm Because it doesn't allow comparing the size of the enmtity data to the rest of the save. Is the landscape below those entities 1MB or 100MB? That makes a big difference.
You mean because of the uncertainties of compression?
Apart from that, it allows to compare the "save with entities" to the "save without entities", which should be the size of the entity data.
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

DarkShadow44 wrote: ↑Tue Nov 09, 2021 9:24 pm
mrvn wrote: ↑Tue Nov 09, 2021 9:18 pm Because it doesn't allow comparing the size of the enmtity data to the rest of the save. Is the landscape below those entities 1MB or 100MB? That makes a big difference.
You mean because of the uncertainties of compression?
Apart from that, it allows to compare the "save with entities" to the "save without entities", which should be the size of the entity data.
Which is a useless value without something to compare it to. 10 MB in a 20MB save is a lot. 10MB in a 300MB save not.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

mrvn wrote: ↑Tue Nov 09, 2021 9:50 pm Which is a useless value without something to compare it to. 10 MB in a 20MB save is a lot. 10MB in a 300MB save not.
When the entities of a huge base barely exceed 30MB, you can assume that a small factory occupies proportionally less space. It follows that the rest of the save data is occupied by non-factory parts.
I.e. take a 300MB size save with a factory half as big as from the map I linked, and about 285MB will be non-factory data. How's that useless?
Since we only take about big saves here (at least I do), the ratio of factory data to other data is small.
mrvn
Smart Inserter
Smart Inserter
Posts: 5925
Joined: Mon Sep 05, 2016 9:10 am
Contact:

Re: Caching for multiplayer map download

Post by mrvn »

DarkShadow44 wrote: ↑Tue Nov 09, 2021 10:32 pm
mrvn wrote: ↑Tue Nov 09, 2021 9:50 pm Which is a useless value without something to compare it to. 10 MB in a 20MB save is a lot. 10MB in a 300MB save not.
When the entities of a huge base barely exceed 30MB, you can assume that a small factory occupies proportionally less space. It follows that the rest of the save data is occupied by non-factory parts.
I.e. take a 300MB size save with a factory half as big as from the map I linked, and about 285MB will be non-factory data. How's that useless?
Since we only take about big saves here (at least I do), the ratio of factory data to other data is small.
I have no idea how big a save with actual terrain features would be for the factory you tested. You didn't even mention how big the save is, only that entities are 10/30 MB.
DarkShadow44
Filter Inserter
Filter Inserter
Posts: 358
Joined: Thu Jun 01, 2017 12:05 pm
Contact:

Re: Caching for multiplayer map download

Post by DarkShadow44 »

mrvn wrote: ↑Wed Nov 10, 2021 9:54 am I have no idea how big a save with actual terrain features would be for the factory you tested. You didn't even mention how big the save is, only that entities are 10/30 MB.
As I noted, it doesn't matter how big the save is. Didn't I just explain that?
Post Reply

Return to β€œIdeas and Suggestions”