Update, some clarification: linux calls them huge pages, windows mostly seems to call them large pages.
Hi,
after reading some benchmarks I assume that factorio's speed depends largely on memory performance. So I wanted to ask if you could implement support for large pages because I assume it could improve factorio's performance (maybe by about 10%) mostly because the memory factorio uses should then be covered by the TLB. There are some downsides, though:
- it is quite unusual to have large pages support enabled under windows
- the implementation might be operating system dependent and thus require more work
- if a system already runs for some time it might be impossible to allocate enough large pages
Regards,
Mimos
Large pages support
Moderator: ickputzdirwech
Large pages support
Last edited by Mimos on Mon Jul 17, 2017 10:06 am, edited 1 time in total.
- bobingabout
- Smart Inserter
- Posts: 7352
- Joined: Fri May 09, 2014 1:01 pm
- Contact:
Re: Large pages support
You mean you want it to use more CPU Cache?
Re: Large pages support
No, but somethin similar: I'd like it to use the TLB, which is some kind of cache and is already used, more efficiently.
A short explaination: The TLB caches address mappings from virtual to physical addresses of memory pages. Usually the pagesize is 4kiB and multiplied by a TLB width of (for example) 1536 entries this covers 6MiB of ram. With large pages you get 2MiB pagesize -> 3giB covered by the TLB. Thus less lookups for retrieving the data of the page tables (address mapping tables) in the ram are needed. For each random memory access that can now be covered by the TLB this saves roughly 50ns = 200 instructions at 4gHz (maybe this estimation is off by about 2 or maybe even more, but still a lot of instructions can be done in the time that is saved). Of course this depends on the hardware used and the gain is also influenced by the amount of random accesses factorio needs. If factorio already has a good "cache locality" (I don't know a better word) the speedup will be less.
Some further reading:
https://msdn.microsoft.com/de-de/librar ... s.85).aspx
https://en.wikipedia.org/wiki/Translati ... ide_buffer
A short explaination: The TLB caches address mappings from virtual to physical addresses of memory pages. Usually the pagesize is 4kiB and multiplied by a TLB width of (for example) 1536 entries this covers 6MiB of ram. With large pages you get 2MiB pagesize -> 3giB covered by the TLB. Thus less lookups for retrieving the data of the page tables (address mapping tables) in the ram are needed. For each random memory access that can now be covered by the TLB this saves roughly 50ns = 200 instructions at 4gHz (maybe this estimation is off by about 2 or maybe even more, but still a lot of instructions can be done in the time that is saved). Of course this depends on the hardware used and the gain is also influenced by the amount of random accesses factorio needs. If factorio already has a good "cache locality" (I don't know a better word) the speedup will be less.
Some further reading:
https://msdn.microsoft.com/de-de/librar ... s.85).aspx
https://en.wikipedia.org/wiki/Translati ... ide_buffer
-
- Filter Inserter
- Posts: 549
- Joined: Fri Jan 29, 2016 2:48 am
- Contact:
Re: Large pages support
Hopefully OP can clarify for himself but most likely he's talking about linux "hugepage" support, as described at https://www.kernel.org/doc/Documentatio ... lbpage.txt; afaict these are mostly consumed by databases, emulators, and other applications which like to gobble up large chunks of contiguous memory and do their own internal memory management.
Personally, I find this particular feature of linux to be pretty confusing and don't see a ton of clear "for dummys"-style guides on how to use it properly. A couple of years ago I noticed my system would sometimes become extremely choppy and unresponsive under certain workloads. I eventually tracked this down to having enabled the "magically performance enhancing" transparent hugepage kernel CONFIG_ frob enabled (I think this feature automagically allocates hugepages to applications which did not explicitly ask for them). Since then I've just left it off but my system does still have ("non-transparent") hugepage support enabled and apparently it is being used as there are over 2000 2048k hugepages live in my system, despite the fact that my user account apparently does not have permission to mmap them.
Personally, I find this particular feature of linux to be pretty confusing and don't see a ton of clear "for dummys"-style guides on how to use it properly. A couple of years ago I noticed my system would sometimes become extremely choppy and unresponsive under certain workloads. I eventually tracked this down to having enabled the "magically performance enhancing" transparent hugepage kernel CONFIG_ frob enabled (I think this feature automagically allocates hugepages to applications which did not explicitly ask for them). Since then I've just left it off but my system does still have ("non-transparent") hugepage support enabled and apparently it is being used as there are over 2000 2048k hugepages live in my system, despite the fact that my user account apparently does not have permission to mmap them.
Re: Large pages support
Hmm, I was aware that Linux mostly calls them huge pages, but I still went with large pages, because I thought that was their real name. After some more reading I thing I was wront, though. Thanks for the hint, ill update my first post.
I've never tried transparent huge pages because I read about some issues with them. So I was hoping the devs could implement native support for huge pages to hopefully avoid these issues.
I've never tried transparent huge pages because I read about some issues with them. So I was hoping the devs could implement native support for huge pages to hopefully avoid these issues.
Re: Large pages support
Note that on x86_64 there is support for 4k, 2M and 1G pages. Allocating memory in chunks of 2MB seems totally reasonable for factorio. 1G pages might be a bit on the large side still.
Note that the TLB design differs from CPU to CPU. Some have dedicated entries per page size and some share entries. Some have entries for 4k pages and shared entries for 2M/1G entries.
The thing to remember is that entries are limited and if you use too many entries then you get a cache miss with the huge time penalty that involves. The goal is to minimize the number of misses. Given the usually limited number of 2M entries a mixture of 2M and 4k pages is best. You want 2M pages when you have linear access or when the working set fits into the TLB. If you exceed the TLB size with random access then 4k pages are far better, simply because there are so much more entries for it that a cache hit becomes more likely.
Note that the TLB design differs from CPU to CPU. Some have dedicated entries per page size and some share entries. Some have entries for 4k pages and shared entries for 2M/1G entries.
The thing to remember is that entries are limited and if you use too many entries then you get a cache miss with the huge time penalty that involves. The goal is to minimize the number of misses. Given the usually limited number of 2M entries a mixture of 2M and 4k pages is best. You want 2M pages when you have linear access or when the working set fits into the TLB. If you exceed the TLB size with random access then 4k pages are far better, simply because there are so much more entries for it that a cache hit becomes more likely.
Re: Large pages support
I wouldn't implement this.
On Windows, allocating large pages requires the SeLockMemoryPrivilege token (MSDN), which requires that Factorio be installed with an administrator account, creating a user account on the system, and adding that user account to the "Lock pages in memory" privilege in group policy (by default no one, not even administrators, has that privilege - and for good reason). Factorio would then need to run under that user account (but since save games are stored in per-user %appdata%, that user account won't have read/write access to them. So NTFS permissions'll need to be modified to make that happen).
Raymond Chen (MSFT) has more to say about what large pages mean for physical memory.
Also, on the Haswell architecture, there are 64, 32 and 4 TLB entries respectively for 4KB, 2MB and 1GB pages; so there's zero benefit there as well.
On Windows, allocating large pages requires the SeLockMemoryPrivilege token (MSDN), which requires that Factorio be installed with an administrator account, creating a user account on the system, and adding that user account to the "Lock pages in memory" privilege in group policy (by default no one, not even administrators, has that privilege - and for good reason). Factorio would then need to run under that user account (but since save games are stored in per-user %appdata%, that user account won't have read/write access to them. So NTFS permissions'll need to be modified to make that happen).
Raymond Chen (MSFT) has more to say about what large pages mean for physical memory.
Also, on the Haswell architecture, there are 64, 32 and 4 TLB entries respectively for 4KB, 2MB and 1GB pages; so there's zero benefit there as well.