Page 1 of 2

[Rseding91] Desync on game.print of small number

Posted: Tue Sep 25, 2018 9:38 pm
by grilledham
Server headless Factorio on Linux
Client Windows 10
Following command casues desync

Code: Select all

/c
local root2 = math.sqrt(2)
local x = -209.5 /root2
local y = 735.5/ root2
local angle = math.rad(45)
local qx = math.cos(angle)
local qy = math.sin(angle)
local rot_x =  - qy * y + qx * x
game.print(rot_x + 472.5)
For the server it prints: 5.6843418860808e-14
For client it prints: 0

Re: Arithmetic causes desync

Posted: Tue Sep 25, 2018 10:01 pm
by Sergeant_Steve
On a "Headless" version of Factroio running on Windows Server 2016 with a Windows 10 Client the command does not cause a desync.

Re: Arithmetic causes desync

Posted: Tue Sep 25, 2018 10:07 pm
by Valansch
I found it to be game.print causing the desync as opposed to the actual arithmetic. Floats seem to behave nicely, if you force crc after the calculation you dont get a desync. However if you print the variable it outputs different values for windows and linux.

Windows will output is 0
Linux will output is 5.6843418860808e-14 (exactly 2^-44)

Some caculations:
floor (log2(472.5)) = 8
==> exponent is 8
==> mantissa bits values are shifted by 8
smallest number that can be added to 2^8 is 2^(-52 + 8) = 2^-44 = 5.6843419e-14

This number is exactly of the number linux outputs, meaning the linux output is the expected one, while windows for some reason rounds 472.5 + 2^-44 down to 472.5.

Meaning the linux output is the correct one as float_sub(2^-44 + 472.5, 472.5) == 2^-44 (No rounding errors)

Code: Select all

/c
local root2 = math.sqrt(2)
local x = -209.5 /root2
local y = 735.5/ root2
local angle = math.rad(45)
local qx = math.cos(angle)
local qy = math.sin(angle)
local rot_x =  - qy * y + qx * x
res = rot_x + 472.5
game.force_crc()
--NO DESYNC UNTIL HERE
game.print(res)

EDIT: I have updated my faulty math, i hope there are no errors now. It was late last night.
EDIT2: Fixed: Smallest bit is 2^-52 not 2^-53

Re: Arithmetic causes desync

Posted: Tue Sep 25, 2018 10:08 pm
by Valansch
On a "Headless" version of Factroio running on Windows Server 2016 with a Windows 10 Client the command does not cause a desync.
While you will desync with a linux client on a windows server...

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 9:36 am
by Valansch
We did some additional testing and I just want to share my obvservation.

If you first serialize the resulting variable, you also desync when printing the string.
So that would negate my statement that its an issue with game.print doing something different on linux vs windows.
But this means resStr is already different and game.force_crc doesnt notice it for some reason. Because game.force_crc still does not desync, only after calling game.print it desyncs.

Modified code:

Code: Select all

/c
local root2 = math.sqrt(2)
local x = -209.5 /root2
local y = 735.5/ root2
local angle = math.rad(45)
local qx = math.cos(angle)
local qy = math.sin(angle)
local rot_x =  - qy * y + qx * x
res = rot_x + 472.5
resStr = tostring()
game.force_crc()
--Still NO DESYNC UNTIL HERE
game.print(resStr)

I hope that helped

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 10:32 am
by eradicator
Valansch wrote: Wed Sep 26, 2018 9:36 am

Code: Select all

/c
--[stuff]
game.force_crc()
--Still NO DESYNC UNTIL HERE
game.print(resStr)
Just to clarify your misconception about desyncs: You can do whatever you want inside the lua vm and it will never cause any desyncs. A desync can only be caused by using that data to affect the game state (i.e. by printing the data).

TL;DR: It's not a fault of game.print, but rather the two machines diverge somewhere during the calculations.

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 1:25 pm
by Valansch
Just to clarify your misconception about desyncs: You can do whatever you want inside the lua vm and it will never cause any desyncs.
This is not true. game.force_crc is specifically there to check the state of the lua vm.

Edit: You are right. I dont understand what game.force_crc does. However the caculations should still not cause any desyncs...

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 1:31 pm
by eradicator
[quote=Valansch post_id=380340 time=1537968332 user_id=45145]
This is not true. game.force_crc is specifically there to check the state of the lua vm.
[/quote]
Huh. The doc only says "map crc", which usually doesn't include lua (or so i thought). But if you say so, i'll believe you as i can't test right now. :(

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 1:32 pm
by Valansch
Huh. The doc only says "map crc", which usually doesn't include lua. But if you say so, i'll believe you as i can't test right now. :(
Nono dont believe me please i was wrong. I updated my comment.

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 1:34 pm
by eradicator
Valansch wrote: Wed Sep 26, 2018 1:25 pm Edit: You are right. I dont understand what game.force_crc does. However the caculations should still not cause any desyncs...
Well, it foces the crc to happen right then, instead of at the start of the next tick i guess, which might be useful for debugging. As for the OP issue dunno...maybe if the lua backend on windoes/linux has different rounding errors for such very small numbers?

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 1:40 pm
by Valansch
maybe if the lua backend on windoes/linux has different rounding errors for such very small numbers?
Exactly (See my original comment)
472.5 + 2^-44 incorrectly rounds down to 472.5 on windows. You can even repeat this in single player and get 0 on windows and 5.68e^-14 on linux....

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 3:46 pm
by Valansch
After further testing i got it down to math.sin giving different answers.

Code: Select all

game.print(math.sin(0.7853981) % 0.1)


Result on windows: 0.0071067363577805
Result on linux:   0.0071067363577804
The difference comes after the 15th significant digit. Therefor the % 0.1


Showing more digits:

Code: Select all

game.print(math.sin(0.7853981) * math.pow(10,10) % 1)

Result on windows: 0.577805519104
Result on linux:   0.57780456542969

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 4:34 pm
by Rseding91
Interesting. I was wondering if this would ever show up.

We internally use a custom implementation of most math functions because different compilers/operating systems implement them slightly differently (as you're seeing with the desyncs).

We'll have to highjack Luas calls to those math operations and direct them to the custom implementations.

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 4:35 pm
by Valansch
other angles that cause this (which may be more common):
math.sin(18)
math.sin(89)

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 4:39 pm
by quyxkh
It's that 472.5+2^-44 == 472.5 on windows part that scares me most. DP addition can't be needing a mathlib, right? wtf's up with dropping bits in simple addition? 472's 9 bits, so the addition's only preserving 52 of the 53 bit's it's supposed to have, is there some integer-unit manipulation going on that forgot about the implicit high bit?

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 4:53 pm
by Valansch
quyxkh wrote: Wed Sep 26, 2018 4:39 pm It's that 472.5+2^-44 == 472.5 on windows part that scares me most. DP addition can't be needing a mathlib, right? wtf's up with dropping bits in simple addition? 472's 9 bits, so the addition's only preserving 52 of the 53 bit's it's supposed to have, is there some integer-unit manipulation going on that forgot about the implicit high bit?
Ignore that part. Floats work perfectly fine. I was just rambling while trying to get behind the real issue, which is math.sin (and math.cos etc).
As Rseding91 also confirmed.

Sorry for the confusion.

game.print and `+` are innocent!

Edit: But btw 472 has an exponent of 8, so you will be dropping any digits lower than 2^-52+8 = 2^-44 this is normal and according to IEEE spec.

Re: Desync on game.print of small number

Posted: Wed Sep 26, 2018 5:40 pm
by quyxkh
Valansch wrote: Wed Sep 26, 2018 4:53 pm Edit: But btw 472 has an exponent of 8, so you will be dropping any digits lower than 2^-52+8 = 2^-44 this is normal and according to IEEE spec.
It's not. DP has _53_ bits of precision. It only stores 52 because the high bit is ~always 1 and doesn't need to be stored.

But lua's `print` and apparently factorio's `game.print` don't insist on roundtrippable formatting. Try `game.print(string.format('%.17g',472.5+2^-44))` on your windows client and see what that gets you. Also `game.print(serpent.line(472.5+2^-44))`, since serpent's formatting is intended for serialization.

Re: [Rseding91] Desync on game.print of small number

Posted: Thu Sep 27, 2018 12:41 am
by Valansch
You fundamentally misunderstood how floats work. Floating point meaning the position of the point floats according to your exponent.
Your machine precision is 2^-52 yes. However this neither means that this is the smallest number nor does it mean that this is the smallest number that can be added to any other number.
2^-52 is the smallest number that can be added to 1.
Likewise 2^-53 can not. However you can add 2^-53 to 1/2 because its exponent is -1 (-1-52=-53)

In our example: 472.5 is bigger than 2^8 smaller than 2^9. Therefor the exponent is gonna be 8. So the smallest number that can be added to it is 2^(8-52)=2^-44. So this works. But always 472.5 + 2^-45 == 472.5.

This has nothing to do with lua or print.

Re: [Rseding91] Desync on game.print of small number

Posted: Thu Sep 27, 2018 2:00 am
by quyxkh
Well, that or I misunderstood what you said. We were talking about behavior for 472+2^-44. Why bring up behavior for 472+2^-45 at all, let alone without even writing it?

Re: [Rseding91] Desync on game.print of small number

Posted: Thu Sep 27, 2018 12:33 pm
by Valansch
If you read the thread backwards it doesnt make sense^^.
But at the time i mentioned it we (or I) didnt know it was math.sin(). I was just noticing that the calculation rounded it down on windows. But this was false, as the input for the addidion was already different on both OS's, so the floating point operation worked fine. I guess i should have looked at the code from beginning to end and not end to beginning. Hope that makes sense.