[0.17.37] Platform differences in integral mathematics

This subforum contains all the issues which we already resolved.
User avatar
H8UL
Fast Inserter
Fast Inserter
Posts: 114
Joined: Mon May 15, 2017 4:02 pm
Contact:

[0.17.37] Platform differences in integral mathematics

Post by H8UL »

The following console command gives different results on Linux and Windows:

/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))

The example might seem a little strange -- and the multiplication should really be 32-bit with overflow -- but it does isolate the problem.

Related issue where numerical calculations caused a desync: viewtopic.php?t=62674

Whether a fix is possible is not clear to me, but I would hope for at least a workaround. I think it likely that there are floating point differences in Lua 5.2 on Linux and Windows that cannot be reconciled. But unlike Lua 5.3, there is no integral subtype, so this leaves a big gap in basic mathematical operations. I'd be more than happy if the fix was to add in some integer maths support into Factorio's mod API. Even just 32-bit multiply, add, subtract, divide, and modulo would be huge -- they would supplement the existing bit32 functions to give a fairly comprehensive set of integral operations.

I wanted to approach this as a bug rather than a mod interface request however, since my players have all been in agreement that platform differences on the console like this are a bug.

The impact for me has been huge. The source of desyncs and inconsistencies in mathematical operations is a major problem for mods with a substantial procedural generation element; without reliable integer mathematics we can't even write pure lua functions to work around these problems when we identify them. It's a shame. In spite of going to great lengths to avoid all the usual desync pitfalls and provide reliable procedural generation, content creators who were going to run a community map have been unable to do so, and have moved on.

(Edited, I put bit32.rshift instead of bit32.band -- both show there is a platform difference, but bit32.band makes more sense as an example).
Last edited by H8UL on Wed May 08, 2019 6:25 pm, edited 1 time in total.
Shameless mod plugging: Ribbon Maze
orzelek
Smart Inserter
Smart Inserter
Posts: 3924
Joined: Fri Apr 03, 2015 10:20 am
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by orzelek »

Would this work for you:

Code: Select all

local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
Merssedes
Fast Inserter
Fast Inserter
Posts: 147
Joined: Sun Oct 29, 2017 7:05 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by Merssedes »

Just out of curiocity: what results do you get in each case?
User avatar
H8UL
Fast Inserter
Fast Inserter
Posts: 114
Joined: Mon May 15, 2017 4:02 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by H8UL »

orzelek wrote: Wed May 08, 2019 5:23 pm Would this work for you:

Code: Select all

local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588
Shameless mod plugging: Ribbon Maze
orzelek
Smart Inserter
Smart Inserter
Posts: 3924
Joined: Fri Apr 03, 2015 10:20 am
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by orzelek »

H8UL wrote: Wed May 08, 2019 6:03 pm
orzelek wrote: Wed May 08, 2019 5:23 pm Would this work for you:

Code: Select all

local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588
I've read that but I admit I haven't noticed any problems caused by that rng behaviour.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.

Unless it's that way in any range?

RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.
User avatar
H8UL
Fast Inserter
Fast Inserter
Posts: 114
Joined: Mon May 15, 2017 4:02 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by H8UL »

orzelek wrote: Wed May 08, 2019 6:11 pm
H8UL wrote: Wed May 08, 2019 6:03 pm
orzelek wrote: Wed May 08, 2019 5:23 pm Would this work for you:

Code: Select all

local function normalize(n) -- keep numbers at 32 bits
	return floor(n) % 0xffffffff
end
It's in the code from very long time and never caused any desyncs as far as I can tell. It's used to normalize seeds for lua random generator.
My guess is that main drawback of it is that it might be slower then your method.
There's a lot of that in my code already, and that ain't the half of it. Take a look at https://github.com/h8ul-modder/factorio ... b/cmwc.lua and c.f. my new post here about why I even need to do this viewtopic.php?f=34&t=70588
I've read that but I admit I haven't noticed any problems caused by that rng behaviour.
If you multiply your seed through high enough values you will rarely end up with seeds in problematic range.

Unless it's that way in any range?

RSO creates seeds from x/y coordinates pretty frequently and I haven't noticed any problems with layout duplication on map.
Perhaps, but an even better way to mutate the seed into something that isn't directly related to neighbouring seeds is to put it through an LCG, and that's what has this problem. If it helps to avoid the debate of using LuaRandomGenerator: I originally wrote my RNG to be used at the data stage. I now use it in Ribbon Maze but intend to use it in the data stage in future. In such a situation I cannot rely on the provided RNG systems anyway. You'll see if you look at Serendipity that they've written out a pure Lua RNG implementation. Whether that runs into OS differences in basic maths, I am unsure.

But even if LuaRandomGenerator was amazing, I should be able to do basic maths without fear that if numbers are "too big" in some unspecified way, then it can cause a desync.
Shameless mod plugging: Ribbon Maze
User avatar
H8UL
Fast Inserter
Fast Inserter
Posts: 114
Joined: Mon May 15, 2017 4:02 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by H8UL »

Merssedes wrote: Wed May 08, 2019 5:41 pm Just out of curiocity: what results do you get in each case?
When executing:

Code: Select all

/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
Windows: 1079053356

Linux: 2158106711

And when executed in C (you can try it in an online repl https://repl.it/languages/c):

Code: Select all

int main(void) {
  long a = 1664525L*2031137496+1013904223;
  long b = a & 0xFFFFFFFFL;
  int c = (int)b;
  printf("%lu\n", b);
  printf("%u\n", c);
  return 0;
}
Output:

Code: Select all

2158106711
2158106711
So they all agree, except Windows.
Shameless mod plugging: Ribbon Maze
Rseding91
Factorio Staff
Factorio Staff
Posts: 14798
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by Rseding91 »

"long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.
If you want to get ahold of me I'm almost always on Discord.
User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by TruePikachu »

Rseding91 wrote: Wed May 08, 2019 10:13 pm "long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.
The described math above doesn't depend on any bits past #31 -- the result is masked to just the low 32 bits, and neither addition nor multiplication have rightwards carry.

I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.

It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere -- I noticed that the reported Windows output is half the Linux output, rounded up (or to nearest even, I'm not sure).

----
EDIT 2:
Just wrote up a quick test of that type-pun.

Code: Select all

#include <cstdint>
#include <iostream>

union luai_Cast {
	double l_d;
	std::uint32_t l_p[2];
};

int main() {
	constexpr double foo = 3380880154433623.0; // Number that gets passed into bit32.band
	volatile union luai_Cast u;
	u.l_d = foo + 6755399441055744.0; // 10136279595489368.0
	std::cout << "Casting:      " << static_cast<uint32_t>(foo) << std::endl;
	std::cout << "Type punning: " << u.l_p[0] << std::endl;
	return 0;
}
The above code, compiled for Windows x86-64 (not that it should matter), outputs:

Code: Select all

Casting:      2158106711
Type punning: 1079053356
DaleStan
Filter Inserter
Filter Inserter
Posts: 379
Joined: Mon Jul 09, 2018 2:40 am
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by DaleStan »

H8UL wrote: Wed May 08, 2019 6:49 pmWhen executing:

Code: Select all

/c game.player.print(bit32.band(1664525*2031137496+1013904223, 0xffffffff))
Windows: 1079053356

Linux: 2158106711
According to the Lua 5.2 reference, "all functions accept numeric arguments in the range (-2⁵¹,+2⁵¹)." The number you supplied (1664525*2031137496+1013904223) is outside that range. I'd file this under "Undefined behavior causes undefined behavior".


TruePikachu wrote: Thu May 09, 2019 12:34 am
Rseding91 wrote: Wed May 08, 2019 10:13 pm "long" is 64 bit on mac and linux. "long" is 32 bit on windows.
"long long" is 64 bit on mac and linux. "long long" is 64 bit on windows.

Most likely it's that garbage again.
The described math above doesn't depend on any bits past #31
It absolutely depends on bits past 31. On Windows, the first line of main is equivalent to

Code: Select all

int32_t a = 0xC02E4`80A21857L;
That's 52 bits. C being C, that most likely invokes undefined behavior.
Even if that is defined, the printf is definitely undefined behavior, since %lu is a promise to pass a 64-bit integer, but a is a 32-bit integer.
Rseding91
Factorio Staff
Factorio Staff
Posts: 14798
Joined: Wed Jun 11, 2014 5:23 am
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by Rseding91 »

TruePikachu wrote: Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have lying around, there's three methods that Lua will use to convert a Lua-side number (`lua_Number` -- `double` by default) to a 32-bit unsigned integer (`lua_Unsigned` -- `unsigned int` by default, intended to be something like `uint32_t`). It will either use MASM `fld`/`fistp` (MS-only), type-pun the `lua_Number` to `lua_Unsigned` via a `union` (I don't really want to do the long math right now to check if that implementation is correct), run a modulo via well-defined floating point operations, or just cast. The actual method lies in the macro `lua_number2unsigned` in `llimits.h`.

It sounds like (I haven't checked the Factorio binary to verify) something like the type-punning method is being used on Windows, and there's an off-by-one error in there somewhere
I was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.

I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.
If you want to get ahold of me I'm almost always on Discord.
User avatar
TruePikachu
Filter Inserter
Filter Inserter
Posts: 978
Joined: Sat Apr 09, 2016 8:39 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by TruePikachu »

DaleStan wrote: Thu May 09, 2019 3:18 am I'd file this under "Undefined behavior causes undefined behavior".
The problem is, undefined behaviour doesn't mix well with determinism. The behaviour might be undefined, yes, but it should still be identical regardless of platform.
DaleStan wrote: Thu May 09, 2019 3:18 am That's 52 bits. C being C, that most likely invokes undefined behavior.
I was speaking in terms of variables being unsigned (where, in C++ at least, overflow and underflow are well-defined). Even if it's signed math, however, assuming that signed integers wrap around the same way, you get the same result (but it's a negative number since the high bit is set). In Lua, the math is done with double-precision floats, which have full integer precision within 53 bits.
DaleStan wrote: Thu May 09, 2019 3:18 am Even if that is defined, the printf is definitely undefined behavior, since %lu is a promise to pass a 64-bit integer, but a is a 32-bit integer.
printf(3) states that `l` is a long and `ll` is a long long. There are no issues with the length modifier in the printf (only with the fact that `u` is used with a signed argument).
User avatar
H8UL
Fast Inserter
Fast Inserter
Posts: 114
Joined: Mon May 15, 2017 4:02 pm
Contact:

Re: [0.17.37] Platform differences in integral mathematics

Post by H8UL »

Rseding91 wrote: Thu May 09, 2019 3:27 am
TruePikachu wrote: Thu May 09, 2019 12:34 am I'm currently investigating the internal math behind this, I'll have an answer in a few hours.

-----
EDIT:
Going by a copy of vanilla Lua 5.2.1 sources I have...
I was looking into that logic some time ago and thought specifically that it shouldn't be using the type punning logic because it would be a problem area. My IDE was telling me it wasn't being used but C being C and the Lua library being written in C it makes *heavy* use of macros which obfuscated the logic enough that I didn't see it was in fact using type punning on the windows build but standard casting on the other platforms.

I just deleted the "special" cast logic and forced every platform to use simple casts + made a test for it. So, this is now fixed for the next version of 0.17.
That's amazing! Thank you so much!

Edit: also thanks to everyone else for their input/investigation, amazing community as always!
Shameless mod plugging: Ribbon Maze
Post Reply

Return to “Resolved Problems and Bugs”