Since 0.17, I have been experiencing long application startup times. By long I mean a minute spent in "Loading sprites..." phase. I have worked it around by using the atlas cache (graphics/cache-sprite-atlas=true) but now I decided to come back to this issue to check what is going on.
I have profiled factorio using perf and it seems like most of the time is being spent in the kernel, in the page_fault routine. If perf backtraces are to be believed (and I am reading them correctly) these page faults come from the calculateSpriteCropRect function (inlined in AtlasSystem::loadSprites function). I have then generated another perf profile, this time to look for page faults explicitly.
As I am not sure if my perf profile contains sensitive data, I will just copy parts of it here (I can PM both perf reports, if desired):
CPU cycles perf report:
Code: Select all
Children Self Command Shared Object Symbol
- 94.06% 0.76% factorio factorio [.] AtlasSystem::loadSprites
- 93.30% AtlasSystem::loadSprites
- 91.38% calculateSpriteCropRect (inlined)
91.23% page_fault
+ 1.49% AtlasSystem::loadSprite
+ 0.76% 0xffffffffffffffff
+ 94.06% 0.00% factorio [unknown] [k] 0xffffffffffffffff
+ 94.06% 0.00% factorio factorio [.] GlobalContext::init
+ 94.06% 0.00% factorio factorio [.] AtlasSystem::build
+ 94.06% 0.00% factorio factorio [.] AtlasSystem::tryLoadSpritesWithFallbackToMinimalMode
- 92.13% 0.00% factorio factorio [.] calculateSpriteCropRect (inlined)
- calculateSpriteCropRect (inlined)
91.23% page_fault
+ 91.49% 90.98% factorio [kernel.kallsyms] [k] page_fault
1.70% 1.70% PreloadWorker factorio [.] preloadPng
Code: Select all
Children Self Command Shared Object Symbol
+ 88.29% 0.00% factorio [unknown] [.] 0xffffffffffffffff
+ 88.29% 0.00% factorio factorio [.] MainLoop::run
+ 88.29% 0.00% factorio factorio [.] GlobalContext::init
+ 88.29% 0.00% factorio factorio [.] AtlasSystem::build
+ 88.29% 76.13% factorio factorio [.] AtlasSystem::loadSprites
+ 88.29% 0.00% factorio factorio [.] AtlasSystem::tryLoadSpritesWithFallbackToMinimalMode
+ 75.85% 0.00% factorio factorio [.] calculateSpriteCropRect (inlined)
+ 9.13% 0.00% factorio factorio [.] AtlasSystem::loadSprite
+ 8.67% 8.67% factorio libc-2.29.so [.] __memmove_avx_unaligned_erms
+ 8.26% 0.00% factorio factorio [.] BitmapUtil::copyWithDuplicatedBorder (inlined)
+ 8.20% 0.00% factorio factorio [.] memcpy (inlined)
+ 8.15% 0.00% factorio factorio [.] operator() (inlined)
+ 7.82% 0.00% factorio factorio [.] BitmapUtil::copyWithDuplicatedBorderNoConversion
7.10% 7.10% PreloadWorker libc-2.29.so [.] __memmove_avx_unaligned_erms
3.41% 3.41% PreloadWorker factorio [.] preloadPng
Does MemoryBitmap by any chance hold huge amount of data? If so, you might want to enable transparent huge pages (THP), using the madvise() syscall [1]. I have performed an experiment and enabled THP on the whole system and it cut the factorio startup time in half (to half a minute). You can experiment with it yourself by writing (as the root) to the /sys/kernel/mm/transparent_hugepage/enabled file. By default (at least on my distro) THP is only enabled when the application asks for it using madvise(), by writing "always" to the sys file you can enable it on all of the memory.
(huge pages are also useful for reducing TLB misses)
Still, it is strange there are so many page faults involved. It might happen if there are a lot of allocations/deallocations and the memory is being returned to the kernel in the meantime.
[1] http://man7.org/linux/man-pages/man2/madvise.2.html