Page 1 of 1

`zstd -1` on atlas caches could double even SSD read speed

Posted: Sat Sep 29, 2018 5:19 pm
by quyxkh
It occurrred to me to check how zstd would do on the atlas caches, the vanilla one's about 3GB. `zstd -1` compresses at ~350MB/s on my 5-year-old midrange box to 22% of the original size, ~3GB to ~0.66GB, which is enough to help even SSDs some when writing, but the decompress performance is ridiculous, I get ~1GB/s decompression, it takes less than three seconds to decompress it with hot caches and less than five seconds straight off my 150MB/s HDD. That'd cut cold-start time noticeably even on SSDs, on HDDs the difference would be really gratifying.

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Wed Jul 10, 2019 11:40 am
by posila
In 0.17.56, there will be config.ini setting "compress-sprite-atlas-cache". At the moment it uses compression level 1, we may change it to -1 in the future. Also it doesn't utilize parallel compression/decompression, so I am not sure if it'll make 350 MB/s resp. 1 GB/s on your PC, but it seemed pretty fast on mine.

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Wed Jul 10, 2019 5:26 pm
by quyxkh
the `zstd -1` was meant to be a command quote, that gets the zstd level 1 compression you implemented. _thanks_! (and also, btw, thanks for whatever magic you worked that made loading the atlas from a hot OS buffer cache so close to instantaneous, that still gets a little internal smile of gratification when it hits).

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Fri Jul 12, 2019 7:17 pm
by dimm
The game now loads very quickly. Thanks!

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Fri Jul 26, 2019 2:04 pm
by quyxkh
Lol just checked out the data rates on the new PCIe SSDs, they're stupid fast. Thanks for indulging us older-hw users on this.

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Wed Jul 31, 2019 11:09 pm
by slippycheeze
quyxkh wrote:
Fri Jul 26, 2019 2:04 pm
Lol just checked out the data rates on the new PCIe SSDs, they're stupid fast. Thanks for indulging us older-hw users on this.
You are not wrong, but... to quote from Latency at a human scale -- which scales one CPU cycle to one second, and reflects modern system latency:

L3 cache is about one minute away from the CPU, main memory is about four minutes, and the super-fast Optane stuff? 15 minutes for persistent memory, 7 hours for the DC SSD version. 17-ish hours for the average NVMe SSD.

CPU power is vastly, vastly cheaper than any sort of I/O, to the point that as long as you don't need a huge dictionary, compressed data is almost always a performance win even in main memory, so long as you don't need random access inside pages.

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Thu Aug 01, 2019 1:15 pm
by Darinth
slippycheeze wrote:
Wed Jul 31, 2019 11:09 pm
quyxkh wrote:
Fri Jul 26, 2019 2:04 pm
Lol just checked out the data rates on the new PCIe SSDs, they're stupid fast. Thanks for indulging us older-hw users on this.
You are not wrong, but... to quote from Latency at a human scale -- which scales one CPU cycle to one second, and reflects modern system latency:

L3 cache is about one minute away from the CPU, main memory is about four minutes, and the super-fast Optane stuff? 15 minutes for persistent memory, 7 hours for the DC SSD version. 17-ish hours for the average NVMe SSD.

CPU power is vastly, vastly cheaper than any sort of I/O, to the point that as long as you don't need a huge dictionary, compressed data is almost always a performance win even in main memory, so long as you don't need random access inside pages.
That's actually a really cool and useful way to try and explain the gap between the different components in a human-readable way. Thank you for providing that.

Re: `zstd -1` on atlas caches could double even SSD read speed

Posted: Wed Sep 04, 2019 4:16 am
by Vegemeister
posila wrote:
Wed Jul 10, 2019 11:40 am
In 0.17.56, there will be config.ini setting "compress-sprite-atlas-cache". At the moment it uses compression level 1, we may change it to -1 in the future. Also it doesn't utilize parallel compression/decompression, so I am not sure if it'll make 350 MB/s resp. 1 GB/s on your PC, but it seemed pretty fast on mine.
A big property of zstd is that decompression speed is practically independent of compression level. With single-threaded zstd on my machine (Haswell, 4.2 GHz), that looks like:

Code: Select all

$ zstd -b1 -e10 atlas-cache.dat
 1#atlas-cache.dat   : 992084233 -> 276101141 (3.593), 579.7 MB/s ,1467.1 MB/s 
 2#atlas-cache.dat   : 992084233 -> 271624627 (3.652), 454.6 MB/s ,1433.9 MB/s 
 3#atlas-cache.dat   : 992084233 -> 267384342 (3.710), 300.7 MB/s ,1381.9 MB/s 
 4#atlas-cache.dat   : 992084233 -> 265075857 (3.743), 228.0 MB/s ,1352.3 MB/s 
 5#atlas-cache.dat   : 992084233 -> 260563757 (3.807),  88.0 MB/s ,1327.7 MB/s 
 6#atlas-cache.dat   : 992084233 -> 259328406 (3.826),  70.0 MB/s ,1299.0 MB/s 
 7#atlas-cache.dat   : 992084233 -> 256378987 (3.870),  55.4 MB/s ,1435.8 MB/s 
 8#atlas-cache.dat   : 992084233 -> 254566280 (3.897),  54.9 MB/s ,1411.8 MB/s 
 9#atlas-cache.dat   : 992084233 -> 253687259 (3.911),  41.4 MB/s ,1442.4 MB/s 
10#atlas-cache.dat   : 992084233 -> 249708698 (3.973),  34.7 MB/s ,1396.7 MB/s 
Dividing the decompression speed by compression ratio, for the disk read speed to feed the CPU, I get:

408.32
392.63
372.48
361.29
348.75
339.52
371.01
362.28
368.81
351.55

With the usual assumption about reading the cache being much more frequent than writing it, it seems like you could go to arbitrarily high compression level and still benefit HDD users. Probably shouldn't go higher than 3 or 4, to avoid irritating people with fast SSDs too much at cache creation time.