[1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows

quale · Post by **quale** » Tue Sep 14, 2021 11:47 am

: Lamp 2-wire overflow.png (661.51 KiB) Viewed 5031 times

One constant combinator supplies 2 billion red. The other supplies 2 billion green and negative 2 billion blue. When passed through both red and green wires, the sum of those wires overflows at the input. This isn’t normally a concern, as demonstrated by letting an arithmetic combinator do the summing. However, the lamp itself behaves erratically when the overflow occurs at its input. How does overflowed/now-negative red become yellow? How does a lamp radiate black?

I do like the black lamp. It would be nice if you could set that color by normal means, and as the off state for improved contrast on displays that don’t need a combinator per pixel.

Edit again: confirmed in 1.1.42.

SoShootMe · Post by **SoShootMe** » Tue Sep 14, 2021 12:16 pm

Hmm... when I import and place your blueprint, I get two white lamps (left) and two blue lamps (right), which seems to be the expected result: the highest priority of the colours that are positive, after summing the wires, determines the colour.

The only obvious difference is Factorio 1.1.39 (build 58937, mac, steam) (you) vs Factorio 1.1.39 (build 58937, win64, steam) (me)...

quale · Post by **quale** » Tue Sep 14, 2021 1:01 pm

Then I smell undefined behavior. People forget that C and C++ don’t allow signed overflow among other things. It will often behave like the underlying hardware, until it comes to bite you. Maybe the Mac compiler took advantage in this case. A compiler flag like -fwrapv should stop that, if not a source code change to make it overflow in a defined way.

SoShootMe · Post by **SoShootMe** » Tue Sep 14, 2021 3:46 pm

quale wrote: ↑Tue Sep 14, 2021 1:01 pm Then I smell undefined behavior.

Agreed, but I'd be surprised if it's the undefined behaviour (in the C or C++ standard sense) of signed integer overflow as that is invariably defined for a given CPU architecture and virtually universal anyway.

quale · Post by **quale** » Tue Sep 14, 2021 4:13 pm

That doesn’t matter. A compiler can do what it wants, and can have legitimate reasons to optimize. Examples include using wider types to store values, and eliminating conditions that are redundant under these assumptions. I can observe the latter on x86-64 in practice. It very much resembles this bug.

Code: Select all

bool redundant(int a, int b)
{
    return a > 0 && b > 0 && a + b > 0;
}

redundant(int, int):                         # @redundant(int, int)
        test    edi, edi
        setg    cl
        test    esi, esi
        setg    al
        and     al, cl
        ret

SoShootMe · Post by **SoShootMe** » Wed Sep 15, 2021 12:49 am

quale wrote: ↑Tue Sep 14, 2021 4:13 pm That doesn’t matter. A compiler can do what it wants, and can have legitimate reasons to optimize. Examples include using wider types to store values, and eliminating conditions that are redundant under these assumptions. I can observe the latter on x86-64 in practice. It very much resembles this bug.

Sorry, I misread your previous post and missed the point. But AIUI, the optimisation in your example is valid because, given a > 0 and b > 0, "a + b > 0" is true and therefore can be optimised away in all cases where the behaviour is defined, ie where a + b does not overflow. "The usual arithmetic conversions" prohibit "using wider types to store values" (assuming you mean eg the result of a + b in your example being wider than int) but that is part of what makes the optimisation valid - it's important to specify when undefined behaviour occurs

.

quale · Post by **quale** » Wed Sep 15, 2021 7:20 am

I dug into the assembly because this was killing me. It was indeed as I suspected, in addition to being an unspeakable mess I’ll spare you. Distilled down to the bare minimum that should help you understand the bugged coloring:

Code: Select all

color_channels = {
	{"red", {1, 0, 0, 1}},
	{"green", {0, 1, 0, 1}},
	{"blue", {0, 0, 1, 1}},
	{"yellow", {1, 1, 0, 1}},
	{"pink", {1, 0, 1, 1}},
	{"cyan", {0, 1, 1, 1}},
	{"white", {1, 1, 1, 1}}
};

result = {1, 1, 1, -1};
for (channel : color_channels) {
	signal = red_network[channel.id] + green_network[channel.id];
	
	// Using the flags of the add instruction, as if we have 33 result bits.
	// Undefined behavior allows this to happen.
	if (signal > 0) {
		result.b = channel.color.b;
		result.a = channel.color.a;
	}
	
	// Using the flags of a subsequent test instruction,
	// based on the wrapped 32-bit result.
	if (wrap(signal) > 0) {
		result.r = channel.color.r;
		result.g = channel.color.g;
		break;
	}
}
return result;

An alpha value of -1 means the color is ignored, making it white.

SoShootMe · Post by **SoShootMe** » Thu Sep 16, 2021 12:00 pm

quale wrote: ↑Wed Sep 15, 2021 7:20 am

Code: Select all

...
	signal = red_network[channel.id] + green_network[channel.id];
	
	// Using the flags of the add instruction, as if we have 33 result bits.
	// Undefined behavior allows this to happen.
	if (signal > 0) {
...

Given your "as if we have 33 result bits" comment, the flags state tested to effectively evaluate "signal > 0" is not obvious to me (at least). Care to elaborate, to satisfy my curiousity?

Not that I see reason to doubt there is a bug due to reliance on a particular outcome of undefined behaviour, as you suggested.

quale · Post by **quale** » Thu Sep 16, 2021 6:00 pm

It uses a JLE instruction, which jumps when ZF=1 or SF≠OF. Like all instructions named after inequalities, it uses a flag based on the carry bit. That’s very useful after a comparison (which is just a subtraction that doesn’t store its result). The optimization opportunity exists in the limited set of conditional instructions: there is no single instruction for > or ≤ that doesn’t use the carry bit. If you were subtracting, you could swap the operands and use SF for < or ≥ instead. But after an addition, all you can do is test the 32-bit result with an extra instruction, or check ZF and SF separately in two conditional instructions.

This is where the compiler decides it’s not having any of it, and invokes undefined behavior to save a whole instruction. Not that it helps when it goes split-brain on the two halves of the result, with the overhead of both. There’s more weird stuff accounting for way more than one instruction, while other sequences are conspicuously optimal. It must be hard to identify which optimization to apply at which time, and when to end. It’s honestly amazing that optimizers do a pretty good job in a reasonable amount of time.

Anachrony · Post by **Anachrony** » Mon Oct 25, 2021 10:38 pm

quale wrote: ↑Tue Sep 14, 2021 4:13 pm That doesn’t matter. A compiler can do what it wants, and can have legitimate reasons to optimize. Examples include using wider types to store values, and eliminating conditions that are redundant under these assumptions.

An area where this can potentially make a huge difference is with stuff like loop iteration variables. A lot of existing code is written with a 32-bit int as the iteration variable, which is pretty straightforward to handle on a 32-bit target. But on a 64-bit target, if you can't evaluate the iteration condition to a known constant range, then you could face having to perform an extra truncation operation on the iteration variable after every increment to make sure it still fits in a 32-bit range. It makes the loop a lot harder to unroll and optimize efficiently. You can exploit the undefined behavior to ignore the truncation and generate efficient code for the normal case where the iteration variable is not expected to overflow. But in some bizarro cases where the programmer was expecting that to be defined and somehow relying on it, it can lead to surprises.

mrvn · Post by **mrvn** » Mon Oct 25, 2021 11:54 pm

Anachrony wrote: ↑Mon Oct 25, 2021 10:38 pm
quale wrote: ↑Tue Sep 14, 2021 4:13 pm That doesn’t matter. A compiler can do what it wants, and can have legitimate reasons to optimize. Examples include using wider types to store values, and eliminating conditions that are redundant under these assumptions.
An area where this can potentially make a huge difference is with stuff like loop iteration variables. A lot of existing code is written with a 32-bit int as the iteration variable, which is pretty straightforward to handle on a 32-bit target. But on a 64-bit target, if you can't evaluate the iteration condition to a known constant range, then you could face having to perform an extra truncation operation on the iteration variable after every increment to make sure it still fits in a 32-bit range. It makes the loop a lot harder to unroll and optimize efficiently. You can exploit the undefined behavior to ignore the truncation and generate efficient code for the normal case where the iteration variable is not expected to overflow. But in some bizarro cases where the programmer was expecting that to be defined and somehow relying on it, it can lead to surprises.

Code: Select all

    char c = 0;
    while (c++ >= 0) { ... };

Compiler then generates this code:

Code: Select all

    1: b 1b

it just jumps back to the label 1 every tick. An endless loop. On some systems char is signed and the loop goes through all the ASCII characters and then stops. On other systems char is unsigned and unsigned can never be negative. So the loop never terminates. No need to compute anything inside because it can never be accessed. It's real fun tracking those kind of bugs down.

Post by **Klonan** » Wed Nov 10, 2021 10:35 am

We made some changes, can anybody confirm if it is fixed in the latest 1.1 release?

SoShootMe · Post by **SoShootMe** » Wed Nov 10, 2021 12:14 pm

Klonan wrote: ↑Wed Nov 10, 2021 10:35 am We made some changes, can anybody confirm if it is fixed in the latest 1.1 release?

It worked as expected on Windows with 1.1.39, and still does with 1.1.46. Need somebody to test on Mac OS.

(But from what quale described, this is the sort of bug you should know is fixed, otherwise it isn't, eg it may "randomly" resurface.)

quale · Post by **quale** » Thu Nov 11, 2021 2:27 am

Klonan wrote: ↑Wed Nov 10, 2021 10:35 am We made some changes, can anybody confirm if it is fixed in the latest 1.1 release?

I gave 1.1.46 a go on my Mac. It seems to use the full, pre-overflow result now. A cursory glance at the machine code suggests the same: addl followed by jg, without split-brain handling of the two result parts. It’s at least consistent with itself, but not with the usual circuit network behavior and the way it used to work (and still works according to SoShootMe) on other compilers. My blueprint is now a proper gaming build with RGB lights.

: Lamp overflow going RGB.png (447.67 KiB) Viewed 4225 times

You can get the expected behavior by casting to unsigned or a wider signed type before the addition, and then casting back to 32-bit signed for the comparison. Conversion to unsigned and unsigned operations are defined to wrap by the language. Conversion to signed is implementation-defined in case of overflow: the implementation has to pick a definite result or raise a signal, which comes down to wrapping on contemporary systems. The undefinedness only happens during signed arithmetic, which is avoided here. Both Clang and GCC cut through it and generate efficient code. That would fix it without reducing optimization potential elsewhere, unlike boilerplate -fwrapv. I’m just worried that there is more code like this that’s broken by such optimizations, waiting to be discovered or to become a problem in the future.

Factorio Forums

[1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows

[1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.39] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows

Re: [1.1.42] Lamp color is wrong when the sum of red+green circuit wires overflows