Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact:
Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Greets, been having some UPS issues in a bug-heavy game (with mods like rampant, but script times are remaining sub-1.2ms consistently), the in-game timings show that the Entity Update times are (depending on what the bugs are doing) between 15 to 35ms. I took a profile of the game while it was running and nicely enough a dwarf debug section exists in the executable so I got a nice mapping for the report. What I found was that the bugs are indeed doing a lot of work, over 47% of the total average frametime by themselves across both the bugs and their spawners (both are quite heavy in surprisingly roughly equal measures). However, among a lot of the calculations there is a single function called from multiple callstacks among the profile and this function by itself across all the calls is taking up ~16.2% of the total frametime, and that is `FixedPointNumberTemplate<int, 8u>::getDouble() const`. Now among my normal programming tasks I'd first look into seeing if anything is easily optimizable within it, perhaps a different algorithm (if my hunch about what it does based on its type and name is), and secondly might look into seeing how well it could be SIMD'd or so across what calls it (if it needs to be called so often to begin with), the latter would involve looking in a lot more places and the source code isn't open, so would it instead be possible to at least get the source for the `FixedPointNumberTemplate<int, 8u>::getDouble() const` (and whatever other potentially related functions if any) to see if I can work on optimizing it any as I get time? If there are tests for it I would like to see those as well to confirm any changes else I can write some.
Also, I'm curious, do the template arguments imply it's a 4-byte/32-bit `int` sized fixed point number with 8 after-fixed-point binary digits, and does its use really help over just using a 32-bit floating point type (especially since it will lack a lot of the most useful SIMD optimizations especially after it's more easily inlined as well), or is it primarily to use integer values so as to make the simulation more easily consistent (which floats can do as well, though are indeed more difficult to do right, especially on older hardware)?
Also, I'm curious, do the template arguments imply it's a 4-byte/32-bit `int` sized fixed point number with 8 after-fixed-point binary digits, and does its use really help over just using a 32-bit floating point type (especially since it will lack a lot of the most useful SIMD optimizations especially after it's more easily inlined as well), or is it primarily to use integer values so as to make the simulation more easily consistent (which floats can do as well, though are indeed more difficult to do right, especially on older hardware)?
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Code: Select all
template<class T, uint32_t PrecisionBits>
class FixedPointNumberTemplate
{
// ...
static constexpr T ValuesPerOne = T(1) << PrecisionBits;
// ...
constexpr double getDouble() const { constexpr double ValuePerIncrement = 1.0 / ValuesPerOne; return double(this->value) * ValuePerIncrement; }
// ...
};
EDIT: As for second part of the question ... fixed point number is used for position coordinates in order to have the same precision everywhere on the map.
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact:
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Even if it is inlined the DWARF section in the file (on linux) should still attribute it correctly as long as the code exists and didn't get split up, but it is indeed quite possible.posila wrote: ↑Fri Jan 14, 2022 9:14 am On Windows (so compiled by MSVC), it ends up being movd + cvtdq2pd + mulsd and is inlined everywhere I looked. Which makes me think your profiler is incorrectly attributing instructions to getDouble() as it can't figure out what belongs to what after inlining and link time optimizations.
If that's the case then the same still holds for
Code: Select all
MapPosition::distanceSquared(MapPosition const&)
Code: Select all
FixedPointNumberTemplate<int, 8u>::getDouble() const
Code: Select all
FixedPointNumberTemplate<int, 8u>::getDouble() const
Code: Select all
Surface::listFriendlyEntities(MapPosition const&, ForceID, double, std::vector<EntityWithForce*>&, bool)
Code: Select all
Surface::listFriendlyUnitsMapPosition const&, ForceID, double, std::vector<EntityWithForce*>&)
Hmm, actually I have a few minutes here, let me ssh home and run it remotely right quick so I can disassemble it properly...
Okay, so yes the
Code: Select all
MapPosition::distanceSquared(MapPosition const&)
Code: Select all
FixedPointNumberTemplate<int, 8u>::getDouble() const
Code: Select all
Surface::listFriendlyEntities(MapPosition const&, ForceID, double, std::vector<EntityWithForce*>&, bool)
Code: Select all
Surface::listFriendlyEntities(MapPosition const&, ForceID, double, std::vector<EntityWithForce*>&, bool)
I've uploaded the 2.4gigs of 30 seconds of `perf` dump (as a highly compressed 21 meg file for ease of downloading) if you want to look at it (if you aren't familiar with perf dumps then there are a few useful GUI's for viewing its data dumps, like hotspot or so, though most IDE's are capable of viewing the data directly in very useful forms): https://overminddl1.com/Factorio/perf/p ... io.data.xz
Although you no doubt can get it yourself, I also uploaded the disassembly of that function (just from gdb, haven't loaded it up into IDA or anything like that yet) in text form to: https://overminddl1.com/Factorio/perf/S ... tities.txt
As you can see in the `perf` dump the
Code: Select all
Surface::listFriendlyEntities(MapPosition const&, ForceID, double, std::vector<EntityWithForce*>&, bool)
Code: Select all
MapPosition::distanceSquared(MapPosition const&)
Code: Select all
FixedPointNumberTemplate<int, 8u>::getDouble() const
Code: Select all
distanceSquared
As an aside, the LUA GC is surprisingly heavy, and not doing much cleanup, I'm guessing it's just going over all the script objects from the mods but it's not finding much to clean up as not a whole lot of garbage is being made (but a lot of things are being cached and held purposefully in those mods). It's taking consistently well over 3ms of runtime per frame compared to the scripts execution taking 0.7ms to 1.2ms, but unsure how much that could be helped without turning down how much is incrementally GC'd each frame without more fundamental structural changes.
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact:
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Interestingly GCC 11.2 compiles it to:
Code: Select all
pxor xmm0, xmm0
cvtsi2sd xmm0, edi
mulsd xmm0, QWORD PTR .LC0[rip]
.LC0:
.long 0
.long 1064304640
Code: Select all
.LCPI0_0:
.quad 0x3f70000000000000 # double 0.00390625
cvtsi2sd xmm0, edi
mulsd xmm0, qword ptr [rip + .LCPI0_0]
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
well, GCC, Clang, and MSVC all have very different compiler optimisations they're applied.OvermindDL1 wrote: ↑Fri Jan 14, 2022 5:04 pm EDIT: I wonder why MSVC is generating the packed 64-bit instructions instead of using the scalar 32-bit instructions like gcc and clang are, it could get rid of that useless movd if so, that seems like a bug in MSVC... The gcc version that factorio was compiled with for linux appears to use the scalar instruction properly so it doesn't have to do the useless movd, however it also seems like it's doing a useless pxor instead, seems like only clang gets it right... Lol.
which mtune, O-level, or other CXXFLAGS are you using?
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
I can't download https://overminddl1.com/Factorio/perf/p ... io.data.xz, I get 403 Forbidden.
But, save would be more useful, so we can profile ourselves with tools that we are used to using and can experiment how changes affect the performance.
Anyway, listFriendlyEntities iterates over linked list of entities with force on a chunk, so distanceSquared() is working with values that are unlikely to be in CPU cache. But traversal of the linked list has the same problem, unless the CPU sees the pattern of reading position and then next pointer and prefetches the next pointer while waiting to get the position too.
Posting save to 17501 would be helpful, because in 1.1.51 there is a refactoring, that added here one more layer of indirection, so it is potentially lot slower.
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact:
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Purely just -O3 for this test, nothing fancy.
Oh bugger all, didn't have xz files in the allow list, fixed.posila wrote: ↑Fri Jan 14, 2022 6:50 pmI can't download https://overminddl1.com/Factorio/perf/p ... io.data.xz, I get 403 Forbidden.
That it would, uploaded the latest multiplayer save (thankfully no where near as big as I was worried that save was going to be, only 69.3 megs) to: https://overminddl1.com/Factorio/perf/OverSE-E1-22.zip
(Edit: Also finished posting it to the requested thread at viewtopic.php?p=559892#p559892 ).
I did notice prefetches happening, but yeah it was pretty memory unfriendly, looked like an intrusive list, not something I see often anymore considering how fairly ubiquitous type chunked storage is nowadays.posila wrote: ↑Fri Jan 14, 2022 6:50 pm Anyway, listFriendlyEntities iterates over linked list of entities with force on a chunk, so distanceSquared() is working with values that are unlikely to be in CPU cache. But traversal of the linked list has the same problem, unless the CPU sees the pattern of reading position and then next pointer and prefetches the next pointer while waiting to get the position too.
Slower? Oh if only I could get a hold of the code, it would be a large change but a form of type chunked storage would so very much speed things over repeated pointer indirections. ^.^;
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
Looking at that save specifically around the things you mentioned:
during a given update 4 unique types of entities are found while iterating listFriendlyEntities on the chunks. Of which there are an average of 60,000 unique entities touched each tick through listFriendlyEntities an average of 120,000 times each tick (multiple are touched twice by different spawners near each other).
Virtually all of the time is spent loading in the memory for the MapPositions from the 60,000 unique entities. That's 8 bytes * 60,000 or 480,000 bytes just to touch each of the positions.
Since there are 4 unique entity types the logic does not know what type of entity it might get when iterating the chunk lists and since we need to keep addition and removal of an entity from these lists O(1) they can't be pooled into any kind of memory pool.
during a given update 4 unique types of entities are found while iterating listFriendlyEntities on the chunks. Of which there are an average of 60,000 unique entities touched each tick through listFriendlyEntities an average of 120,000 times each tick (multiple are touched twice by different spawners near each other).
Virtually all of the time is spent loading in the memory for the MapPositions from the 60,000 unique entities. That's 8 bytes * 60,000 or 480,000 bytes just to touch each of the positions.
Since there are 4 unique entity types the logic does not know what type of entity it might get when iterating the chunk lists and since we need to keep addition and removal of an entity from these lists O(1) they can't be pooled into any kind of memory pool.
If you want to get ahold of me I'm almost always on Discord.
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
you could just publish the portion of source for these methods like you did for the rail pathfinder and other bits.
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
This function? https://gist.github.com/Rseding91/04a11 ... 925221ff47
If you want to get ahold of me I'm almost always on Discord.
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact:
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
This is precisely the kind of issue typed chunk storage is supposed to solve (and there are known methods of encoding runtime data into the chunk 'types' to be able to perform things like game-related positional range checks faster as well for mostly static things, like nests/spawners), though again that would necessitate a bit larger of a structural change than I'd be comfortable attempting in factorio without a substantive amount of time and a lot of tests first (though if curious in such data structures for some light/heavy reading then might look at the flecs or similar libraries for C/C++ that encode the full type chunk storage patterns for as much speed as you can churn out of the CPU, especially with something like factorio and it's restricted types of its game objects).Rseding91 wrote: ↑Sat Jan 15, 2022 1:30 am Since there are 4 unique entity types the logic does not know what type of entity it might get when iterating the chunk lists and since we need to keep addition and removal of an entity from these lists O(1) they can't be pooled into any kind of memory pool.
A few possible hacks could potentially help though, depending on things like how readily you need to be able to insert 'inside' the list compared to just append and swap removing (removing one in the middle by popping the one on the end and moving it into the removed ones position), if that's allowed for that (still easy to keep consistent in a deadmans lockstep simulation) then could break them up into individual bag arrays for each type in the chunks and you'll have O(1) append and O(1) removal anywhere, and a potentially huge speed boost especially since code can finally be generated that knows the type that it will be operating over (much much nicer on memory especially with the linear access and known prefetching of pointers in that array and you get a lot of SIMD opportunities for a fairly minimal amount of code change, in comparison anyway). Maybe some other options as well.
Re: Costly `FixedPointNumberTemplate<int, 8u>::getDouble() const`
I measured the save in 1.1.51 and it is pretty much the same as in 1.1.50, so that's not bad
-
- Fast Inserter
- Posts: 195
- Joined: Sun Oct 05, 2014 6:12 am
- Contact: