SoShootMe wrote: Sun Sep 03, 2023 9:20 pm
FuryoftheStars wrote: Sun Sep 03, 2023 8:06 pm
I believe it's the fact that for some bases you could potentially have these extra calculations being performed for thousands, if not 10s or 100s of thousands of bots that makes it UPS intensifying. And yes, I know the calculations you're talking about are not done every tick, but when you have several 100s of thousands of bots all flying around, you're going to get a large count of them per tick that'll need it.
The extra calculation that BicycleEater didn't mention above is determining whether a charge will be necessary. I think this is the part that is actually relevant, because it isn't done now (or with the new behaviour) and I think a bit more than "like 4 multiply operations" (which are only necessary when the plan is to charge en route, and in any case minute compared to finding somewhere to charge). This calculation is not needed every tick so "several 100s of thousands of bots all flying around" perhaps means something like several thousand calculations per tick. That is not nothing and every little helps, but given the calculation uses data already "at hand" it doesn't sound significant compared to updating other entities (crafting machines, inserters, ...) for a factory with so many active robots.
There are many reasons that conclusion could be wrong, but I'm still not seeing one that leads me to think it probably is, except that it seems a relatively obvious improvement and therefore, if it were viable, would likely have been tried and judged to be too detrimental to performance.
The operations to find whether the charge distance is required only need:
The bot's location.
The target's location.
The bot's charge level.
The bot's energy usage per second (/frame).
The bot's energy usage per tile travelled.
The bot's speed (per second/frame?).
(I'm not sure that it doesn't already have energy per second, per meter, and speed combined into the net energy per second while moving, but it must calculate that to move the bot every frame, so we could just assume they're already combined).
It then takes:
Code: Select all
tileRange = charge / ((energyPerFrame)/(tilePerFrame)+energyPerTile)
tileRangeSquared = tileRange * tileRange
distanceToTravelSquared = (botLocation.x - targetLocation.x)^2 + (botLocation.y - targetLocation.y)^2
Then compare "distanceToTravelSquared" with "tileRangeSquared".
That's 2 divisions, 3 multplications, 4 additions/subtraction, no extra conditionals.
If you want to optimise it (depending on system and how well optimised reciprocal is), and were doing this alongside the normal movement code, you could do the following:
Code: Select all
deltaX = botLocation.x - targetLocation.x
deltaY = botLocation.y - targetLocation.y
(both of these would be needed in normal movement code anyway, presuming the robot doesn't then recharge, so this has near-zero cost)
Code: Select all
speedInv = 1/tilePerFrame
energyPerTileTotalInv = 1/(energyPerFrame*speedInv+energyPerTile)
tileRange = charge * energyPerTileTotalInv
tileRangeSquared = tileRange * tileRange
distanceToTravelSquared = deltaX * deltaX + deltaY * deltaY
Which is now 2 non-shared additions, 2 reciprocals (relatively quick, though I can't find any direct sources for cycle counts), and 5 multiplications.
I think my estimate was close enough, particularly since I'm working off the two types of energy usage individually, even though the game may cache a more useful form, which would either save a multiplication and an addition, or just an addition.
Counting approximate cycles, on double precision, this algorithm gives a count of ~72 cycles for the divisions, ~24 cycles for the multiplications, and ~20 cycles for the additions.
Rounding this to 120 cycles, it will take, at 4GHz, 30ns to complete these computations.
Now, if something to do with reciprocals was used instead, like my second example, the numbers would look more like:
40 cycles for the reciprocals, 40 cycles for multiplications, 10 cycles for the additions, rounding to 100 cycles. At 4GHz, this looks like 25ns per frame.
A drone seems to use ~140-190 ns/frame in the air anyway, so this adds about 1/8-1/4 of a frame's of computation to the cost of the drones. That's honestly more than I expected, but still isn't much overall. If each drone flies for at least 10 frames, (5 tiles at level 15 drone speed), this would be a 1/60 drop in performance.
At ranges of 10 tiles, even across speed, the savings to recharge time amount (depending on setup, but generally) to ~1/60th, by which point the drop in performance is down to 1/120th (that is, if your drones travel more than ~7 tiles on average, you'll gain more throughput than you lose UPS).
To more specifically address the 100s of thousands of bots. If they take ~200ns per bot per frame, that's like 20ms/frame, so maybe my numbers are off. Alternatively, if that's your limit on performance, and it's like 150ns, then that's 15ms/frame, so maybe?
Anyway, if your bots travel at least 7 tiles on average, the savings on recharge will be greater than the losses
Note that my throughput improvement figure I quoted previously does depend (linearly) on distance. At 10 tiles, its a 1/60th improvement, at 5 its a 1/120th.
For bot expenses I'm using
viewtopic.php?f=204&t=108144
and for cycle counts I'm using
http://www.phys.ufl.edu/~coldwell/Multi ... ntmult.htm
EDIT: I also realised that speed is a float, not a double, so one of the divisions could be way shorter - like 23 cycles, not 36, making a reasonable optimisation (not including the reciprocals) take ~100 cycles. Faster with some careful float usage, as precision isn't 100% necessary - if the drones slightly underestimate how far they can go (by less than a tile), it really doesn't matter, provided it is deterministic, which the errors would be.