float point and why it is used in factorio

huhn · Post by **huhn** » Tue Sep 19, 2017 10:48 am

so i have noticed that numbers are shown wrong quite often in this game.

the game has 60 ticks (integer) with some pretty complicated tricks you will never end up with a float point number. i know every powered element can run slower when not enough energy is present and if you have an uneven number of "energy using entry's" not all "boiler" will be loaded with the same load except with a float point number but this could be done with a rest distribution or kind of "dithering" so the difference is so low nobody will ever notice and it will end up evenly at the end.

i'm just wondering how many desync bugs come from float point operation.
there is no question that most if not every developer would have used float point math in such a game and it is not really possible to change that yet if even possible at all.
so i'm more interest if the developer of this game have ever through if it would be possible to run this game without float point operations at all with the same or better performance.

mrvn · Post by **mrvn** » Tue Sep 19, 2017 11:06 am

I'm not sure the game internally uses floats but what you see in mods is this:

https://www.lua.org/pil/2.3.html

There simply is no int type in lua. So every mod gets to work with floats.

huhn · Post by **huhn** » Tue Sep 19, 2017 11:13 am

I'm not sold about this:

Lua has no integer type, as it does not need it. There is a widespread misconception about floating-point arithmetic errors and some people fear that even a simple increment can go weird with floating-point numbers.

viewtopic.php?f=48&t=50894

Adil · Post by **Adil** » Tue Sep 19, 2017 11:24 am

https://www.reddit.com/r/factorio/comme ... r/dbo81o8/

Post by **Rseding91** » Tue Sep 19, 2017 2:20 pm

There's nothing wrong with floating point numbers. The only limitations they have are the epsilon - the minimum difference between two floating point numbers and that you can't store a non-rational number precisely.

The linked solar-panel bug report has nothing to do with floating point "errors" and everything to do with converting a floating point value into an integer for display. It's not useful to see that a solar panel produces 9.9998 KW/tick of power - so it's converted to just show "10 KW".

Post by **Rseding91** » Tue Sep 19, 2017 2:25 pm

huhn wrote:i'm just wondering how many desync bugs come from float point operation.

As far as I know there has only ever been 1 desync problem related to floating points and it had nothing to do with what people would call "floating point errors" (since those don't actually exist) - it had to do with the code one of our compilers generated for 32 bit vs 64 bit: on one compiler float * float was calculated as 'return double(double * double)' and on another it was (correctly) calculated as 'return float(float * float)' resulting in slightly different calculations due to the double type having more precision.

huhn wrote:So i'm more interest if the developer of this game have ever through if it would be possible to run this game without float point operations at all with the same or better performance.

Maybe, but you wouldn't see any performance gain. Floating point operations have long since been equal to integer operations except possibly in some extreme cases where you're just crunching the same numbers over and over (which games don't do).

mrvn · Post by **mrvn** » Tue Sep 19, 2017 2:51 pm

Rseding91 wrote:
huhn wrote:i'm just wondering how many desync bugs come from float point operation.
As far as I know there has only ever been 1 desync problem related to floating points and it had nothing to do with what people would call "floating point errors" (since those don't actually exist) - it had to do with the code one of our compilers generated for 32 bit vs 64 bit: on one compiler float * float was calculated as 'return double(double * double)' and on another it was (correctly) calculated as 'return float(float * float)' resulting in slightly different calculations due to the double type having more precision.

huhn wrote:So i'm more interest if the developer of this game have ever through if it would be possible to run this game without float point operations at all with the same or better performance.
Maybe, but you wouldn't see any performance gain. Floating point operations have long since been equal to integer operations except possibly in some extreme cases where you're just crunching the same numbers over and over (which games don't do).

Floating points have the problem that loading them from memory or storing them to memory is costly. Also is it float or double? Because float is slower than double.

huhn · Post by **huhn** » Tue Sep 19, 2017 2:59 pm

thanks for your answers.

The linked solar-panel bug report has nothing to do with floating point "errors" and everything to do with converting a floating point value into an integer for display. It's not useful to see that a solar panel produces 9.9998 KW/tick of power - so it's converted to just show "10 KW".

it was big enough to get a "wrong" number in the conversation. of cause it doesn't help to show a user a number like 9.9998 and 10 makes more sense but the issue is big enough to show a 0.59 % difference in that case. even through it is just from the conversation.
i wouldn't be shocked if i do a 10 red science research in 1 lab and take the science in the middle of the research out and put it back again that i can't finish the research.

Maybe, but you wouldn't see any performance gain. Floating point operations have long since been equal to integer operations except possibly in some extreme cases where you're just crunching the same numbers over and over (which games don't do).

this pretty much answer everything i wanted to know thank you. i simply had to ask after seeing the solar panel and nuclear power plant "issue".

ratchetfreak · Post by **ratchetfreak** » Tue Sep 19, 2017 3:04 pm

mrvn wrote:
Floating points have the problem that loading them from memory or storing them to memory is costly. Also is it float or double? Because float is slower than double.

Do your research please,

A (single precision) float is 4 bytes, exactly as expensive to store as a int32. And in modern processors it has a lot of SIMD hardware dedicated to operating on it. Each operation is about as expensive as the equivalent int operation.

A double (precision float) is 8 bytes which is twice the memory to store and load from memory, there is less hardware dedicated to its functional units

ledow · Post by **ledow** » Tue Sep 19, 2017 5:58 pm

And so long as the float library used is consistent across platforms, it won't be the cause of any kinds of miscalculations (to be honest, doesn't the server do it all for everyone anyway?).

If handling floats was as unreliable as you suggest, almost nothing from browser div tags to any online game to even things like basic 3d rotations would be screwed up.

Nothing wrong with float used where it's needed, and to be honest as soon as you try to avoid using float you introduce just as much supporting code to handle it the way you need as you would just replacing with a float.

E.g. 60 fps = 16.6666666666666666666666666666.... ms per frame. Try and delay that long using integers consistently - the code to do so will be much more horrible than just "= 1.0/60.0". Sure, you might lose a billionth of a second somewhere with rounding/error, but that's completely within error margins.

The problem with float is not its use. It's NOT REALISING that you're using it and/or how to not use it. I still see people do stupid things like "if(float_a == float_b)" and similar.

ratchetfreak · Post by **ratchetfreak** » Tue Sep 19, 2017 7:43 pm

ledow wrote:And so long as the float library used is consistent across platforms, it won't be the cause of any kinds of miscalculations (to be honest, doesn't the server do it all for everyone anyway?).

No it doesn't, every client does all the exact same calculations and everyone needs to agree on the result down to the last least-significant bit.

TheTom · Post by **TheTom** » Wed Sep 20, 2017 7:36 am

ratchetfreak wrote:
mrvn wrote: A (single precision) float is 4 bytes, exactly as expensive to store as a int32. And in modern processors it has a lot of SIMD hardware dedicated to operating on it. Each operation is about as expensive as the equivalent int operation.

And that is simply not true. Floating point operations are SIGNIFICANTLY slower than integer operations. They also are more exact because there is no hardware support for example for decimal floats (which are a IEEE spec for quite some time). But I generally try to keep away from floats.

Disclaimer: I only work on non time critical and not business critical software, such as high performance financial trading applications, risk management applications etc.

ratchetfreak · Post by **ratchetfreak** » Wed Sep 20, 2017 8:19 am

TheTom wrote:
ratchetfreak wrote:
mrvn wrote: A (single precision) float is 4 bytes, exactly as expensive to store as a int32. And in modern processors it has a lot of SIMD hardware dedicated to operating on it. Each operation is about as expensive as the equivalent int operation.
And that is simply not true. Floating point operations are SIGNIFICANTLY slower than integer operations. They also are more exact because there is no hardware support for example for decimal floats (which are a IEEE spec for quite some time). But I generally try to keep away from floats.

Disclaimer: I only work on non time critical and not business critical software, such as high performance financial trading applications, risk management applications etc.

Your information is severely outdated. Current day CPUs you will find in computers will have very fast floating point operations on the same order of speed (1 or 2 cycles) as integer operations. An integer divide is is one of the slowest instructions you can make it do after fetching data from cold memory.

MeduSalem · Post by **MeduSalem** » Thu Sep 21, 2017 3:21 am

ratchetfreak wrote:Your information is severely outdated. Current day CPUs you will find in computers will have very fast floating point operations on the same order of speed (1 or 2 cycles) as integer operations. An integer divide is is one of the slowest instructions you can make it do after fetching data from cold memory.

Even modern day x86 CPUs from Intel/AMD aren't capable of single-/two-cycle floating point operations:

http://www.agner.org/optimize/instruction_tables.pdf

What matters in the tables is the actual latency per instruction. Because that's how long the Instruction would take to complete if something coming directly afterwards depends on the outcome. And if one compares integer operations to floating point operations then the floating point ones have huge latencies no matter what.

What modern day CPUs actually do is trying to hide the latency with excessive pipelining, buffering, scheduling and lots of other tricks like super-scalar, OoO, SIMD, etc to drive the IPS up. so it makes the appearance that they are able to spit out a Floating Point operation in 1 or 2 cycles. But in the end the operation still took the same amount of cycles/latency to complete even as if all the pipelining tricks wouldn't be there. The basic logic to compute an operation can't be faster than the amount of stable intermediate steps it takes to calculate them (mostly because of the actual path length through all the logic becoming an issue, requiring the operation to be split up over several cycles).

In some specific applications this WILL ultimatively be a problem, especially in very dependency-driven applications where in a very long code segment each operation depends on the outcome of the operation coming before it... because it would matter a lot if the Integer ALU is able to spit out 10 integer operations in the same time it took the FPU to complete only 1 or 2 of the equivalent Floating point ones.

So latency matters.

Integer is better than Floating Point when it comes to latency in a dependency chain... with some exceptions like Divisions etc... Integer Divisions may be as costly as FP divisions in some cases.

But when it comes to Factorio then Rseding91 is probably right about that it doesn't matter for Factorio because it's a game and the data being worked on is probably not really that extremely depedency-driven as I mentioned... So basically it means the game is largely bound to memory latency... because the FPU will already be finished doing a set of data looong before the next set of data arrives from the main memory in the cache.

That said I would still use Integer whereever it doesn't necessarily need to be Floating Point... just because of the huge latency discrepancy between most Integer and Floating Point calculations. The main reason is why should the CPU spend more cycles on something than actually needed. Why spend for example 15 cycles on the FPU if it could be done in 1 on the Int ALU. Make the Pipeline free for something else and make it less power consuming.

Also there are 4 Integer ALUs in modern x86 CPUs (like for example Ryzen) that can do mostly everything... compared to only 2 dedicated ADD and 2 dedicated MUL FPUs... So it may help in certain situations as well not to clog the FPU with unnecessary stuff and keep it free for the stuff that really needs the FPU.

But in the end it's a balancing act anyways to keep all the computing resources in a Core equally busy. Would be a waste not to use the FPU at all. Same goes for SIMD/Vector Extensions... if it the resources are there developers should consider using it, when possible (not implying that this is the case with Factorio).

mrvn · Post by **mrvn** » Thu Sep 21, 2017 9:16 am

ratchetfreak wrote:
mrvn wrote:
Floating points have the problem that loading them from memory or storing them to memory is costly. Also is it float or double? Because float is slower than double.
Do your research please,

A (single precision) float is 4 bytes, exactly as expensive to store as a int32. And in modern processors it has a lot of SIMD hardware dedicated to operating on it. Each operation is about as expensive as the equivalent int operation.

A double (precision float) is 8 bytes which is twice the memory to store and load from memory, there is less hardware dedicated to its functional units

Loading a single float into the FPU means loading it from memory and then pushing it to the FPU, which then converts it to long double internally. That extra conversion step takes a tick longer than loading a double. Or at least it used to before all the SIMD hardware. Now loading up 8 floats into an avx register is the same as loading up 4 doubles. But I'm not sure how much vector operations factorio does.

ratchetfreak · Post by **ratchetfreak** » Thu Sep 21, 2017 10:05 am

mrvn wrote:
Loading a single float into the FPU means loading it from memory and then pushing it to the FPU, which then converts it to long double internally. That extra conversion step takes a tick longer than loading a double. Or at least it used to before all the SIMD hardware. Now loading up 8 floats into an avx register is the same as loading up 4 doubles. But I'm not sure how much vector operations factorio does.

And now every single cpu that's worth anything has SIMD hrdware

evopwr · Post by **evopwr** » Thu Sep 21, 2017 10:09 am

mrvn wrote: Also is it float or double? Because float is slower than double.

I'm not saying your wrong, but as a senior developer, i would be interested to learn more on your last comment, so I can ensure my own understanding, and development approach, is at utmost efficiency.
Why would a float (or single in c#) be less efficient than a double? Surely a double (with an increase in byte storage) would be less efficient (but more accurate) ?
Would appreciate your wisdom on this.
Thanks,

***EDIT Ignore, this - i now see several other people saying the same thing (or similar)

mrvn · Post by **mrvn** » Thu Sep 21, 2017 12:02 pm

ratchetfreak wrote:
mrvn wrote:
Loading a single float into the FPU means loading it from memory and then pushing it to the FPU, which then converts it to long double internally. That extra conversion step takes a tick longer than loading a double. Or at least it used to before all the SIMD hardware. Now loading up 8 floats into an avx register is the same as loading up 4 doubles. But I'm not sure how much vector operations factorio does.
And now every single cpu that's worth anything has SIMD hrdware

But did that change the difference in cycle times for the FPU opcodes or only for the new SIMD opcodes? Doing a quick test I see that on 64bit the FPU isn't used for float/double but xmm registers are.

Code: Select all

double foo() { return *(volatile double *)0; }
float bar() { return *(volatile float *)0; }

elf32-i386:
00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   b8 00 00 00 00          mov    $0x0,%eax
   8:   dd 00                   fldl   (%eax)
   a:   5d                      pop    %ebp
   b:   c3                      ret    

0000000c <bar>:
   c:   55                      push   %ebp
   d:   89 e5                   mov    %esp,%ebp
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   d9 00                   flds   (%eax)
  16:   5d                      pop    %ebp
  17:   c3                      ret    

elf64-x86-64
0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   f2 0f 10 00             movsd  (%rax),%xmm0
   d:   5d                      pop    %rbp
   e:   c3                      retq   

000000000000000f <bar>:
   f:   55                      push   %rbp
  10:   48 89 e5                mov    %rsp,%rbp
  13:   b8 00 00 00 00          mov    $0x0,%eax
  18:   f3 0f 10 00             movss  (%rax),%xmm0
  1c:   5d                      pop    %rbp
  1d:   c3                      retq

So what is the cycle time for fldl/flds and movsd/movss on modern cpus? I assume fldl/flds is still horrible. But factorio doesn't need to support 486 cpus so it should never use those even in the (now dead) 32bit version.

ratchetfreak · Post by **ratchetfreak** » Thu Sep 21, 2017 12:32 pm

MeduSalem wrote:
ratchetfreak wrote:Your information is severely outdated. Current day CPUs you will find in computers will have very fast floating point operations on the same order of speed (1 or 2 cycles) as integer operations. An integer divide is is one of the slowest instructions you can make it do after fetching data from cold memory.
Even modern day x86 CPUs from Intel/AMD aren't capable of single-/two-cycle floating point operations:

http://www.agner.org/optimize/instruction_tables.pdf

What matters in the tables is the actual latency per instruction. Because that's how long the Instruction would take to complete if something coming directly afterwards depends on the outcome. And if one compares integer operations to floating point operations then the floating point ones have huge latencies no matter what.

What modern day CPUs actually do is trying to hide the latency with excessive pipelining, buffering, scheduling and lots of other tricks like super-scalar, OoO, SIMD, etc to drive the IPS up. so it makes the appearance that they are able to spit out a Floating Point operation in 1 or 2 cycles. But in the end the operation still took the same amount of cycles/latency to complete even as if all the pipelining tricks wouldn't be there. The basic logic to compute an operation can't be faster than the amount of stable intermediate steps it takes to calculate them (mostly because of the actual path length through all the logic becoming an issue, requiring the operation to be split up over several cycles).

In some specific applications this WILL ultimatively be a problem, especially in very dependency-driven applications where in a very long code segment each operation depends on the outcome of the operation coming before it... because it would matter a lot if the Integer ALU is able to spit out 10 integer operations in the same time it took the FPU to complete only 1 or 2 of the equivalent Floating point ones.

So latency matters.

Integer is better than Floating Point when it comes to latency in a dependency chain... with some exceptions like Divisions etc... Integer Divisions may be as costly as FP divisions in some cases.

But when it comes to Factorio then Rseding91 is probably right about that it doesn't matter for Factorio because it's a game and the data being worked on is probably not really that extremely depedency-driven as I mentioned... So basically it means the game is largely bound to memory latency... because the FPU will already be finished doing a set of data looong before the next set of data arrives from the main memory in the cache.

That said I would still use Integer whereever it doesn't necessarily need to be Floating Point... just because of the huge latency discrepancy between most Integer and Floating Point calculations. The main reason is why should the CPU spend more cycles on something than actually needed. Why spend for example 15 cycles on the FPU if it could be done in 1 on the Int ALU. Make the Pipeline free for something else and make it less power consuming.

Also there are 4 Integer ALUs in modern x86 CPUs (like for example Ryzen) that can do mostly everything... compared to only 2 dedicated ADD and 2 dedicated MUL FPUs... So it may help in certain situations as well not to clog the FPU with unnecessary stuff and keep it free for the stuff that really needs the FPU.

But in the end it's a balancing act anyways to keep all the computing resources in a Core equally busy. Would be a waste not to use the FPU at all. Same goes for SIMD/Vector Extensions... if it the resources are there developers should consider using it, when possible (not implying that this is the case with Factorio).

let's take skylake:
page 237: fmul = 1 cycle, 5 latency and 1 throughput. Meaning there are at least 5 fmul-capable fpus working in parallel (the older Haswell architecture has the same numbers here page 205)
page 245: mulps = 1 cycle, 4 latency and 0.5 throughput. Meaning 8 fpus capable of multiplying 4-wide single precision vector (here haswell has 1, 5 and 0.5 resp. needing 10 fpus, page 213)

I made the assumption that fmul is not pipelined, if it is then there is no need for that many fpu.

MeduSalem · Post by **MeduSalem** » Thu Sep 21, 2017 4:50 pm

ratchetfreak wrote:let's take skylake:
page 237: fmul = 1 cycle, 5 latency and 1 throughput. Meaning there are at least 5 fmul-capable fpus working in parallel (the older Haswell architecture has the same numbers here page 205)
page 245: mulps = 1 cycle, 4 latency and 0.5 throughput. Meaning 8 fpus capable of multiplying 4-wide single precision vector (here haswell has 1, 5 and 0.5 resp. needing 10 fpus, page 213)

I made the assumption that fmul is not pipelined, if it is then there is no need for that many fpu.

Somehow there aren't that many resources out there to get some detailed information about the current Skylake Microarchitecture... Hell I almost couldn't find a microarchitecture block diagram using Google (seems like Intel is still pretty closed up about it), but here I found something:

Skylake Architecture

Seems like each core has 2 FP ADD/MUL/FMA capable vector units and one additional classic FP x87 compliant ADD one...

There's also a paragraph saying:

The latency and throughput of floating point ADD, MUL, and FMA were made uniformed at 4 cycles with a throughput of 2 µOPs/clock.

If I had to take a wild guess then each FP unit can take 1 ADD instruction and 1 MUL instruction in parallel (= 2µOPs/clock... or combine them to do 1 FMA) and that they have a 4 stage/cycle pipeline with 4 ADD and 4 MUL OPs at differnt stages of intermediate calculation.

So as long as the OPs don't directly depend on each other they are single-cycle. If they do you have to wait 4 cycles... and pray that the Out-of-Order logic finds something else that is independent to fill the pipeline in the meantime... but if you have to wait for slow memory all the time then it doesn't really matter anymore anyways. xD

Factorio Forums

float point and why it is used in factorio

float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio

Re: float point and why it is used in factorio