Map download never finishes [14.5] headless windows

Anything that prevents you from playing the game properly. Do you have issues playing for the game, downloading it or successfully running it on your computer? Let us know here.
gurka
Burner Inserter
Burner Inserter
Posts: 5
Joined: Sat Jan 28, 2017 10:57 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by gurka »

AyrA wrote: To generally mitigate this problem in the future, add a 16bit counter to the UDP packets your applications send, this way, you automatically change the checksum for each packet that is transmitted again.
That sounds like a sane solution to this problem.

User avatar
d3phoenix
Inserter
Inserter
Posts: 46
Joined: Wed Dec 17, 2014 2:24 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by d3phoenix »

Awesome find! This is going in the 'weird edge case' notes for work for sure. The counter does sound like a pretty reasonable workaround.

admalledd
Burner Inserter
Burner Inserter
Posts: 8
Joined: Thu Oct 06, 2016 4:05 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by admalledd »

Results from the our new network testing. First the process: we were still using our respective desktops, but we also had my server in the middle for other low-level tests (extremely useful later!). With this base set up in our normal networks we would only able to complete the connection about one out of twenty or so times. Note that our map is now almost 50MB. Thus for good P levels and confidence we tested four times each way with a specific set up to be certain if it was working.

1. Testing again and verifying that our "normal networks" were still having problems. Could not download map.
2. Testing by turning off every single other feature on our routers besides the basics to NAT and ipv4. (So no ALG, no SPI etc): Could not download map.
3. Flipped a coin and BattleshipBrotemkin lost and disconnected his router and did a direct connect. Map downloaded just fine.
4. My turn: Brotemkin reconnected his router and I disconnected mine. Map downloaded fine (Things started to confuse us at this point)
5. Direct connect to both our modems: Map downloaded fine.
6. With this data we reconnected back both our routers, and to our great surprise we no longer can reproduce. Map downloads fine!

At this point we break for the night because we are very confused. With wireshark in the background capturing everything and trying to test filters and look at raw data we were at a complete loss. Sorry guys but those captures were our full connections without any filters, and were also a couple gig's each by the end. Neither of us saved them in the end...

At this point it is late and I break for sleep. However I have a restless night because I can't get over this now suddenly working. At around 3am I decide to get up and check the thread, see if any other ideas were cropping up.

Twinsen wrote:
Voltara wrote:
gurka wrote: I re-calculated the checksum both for the first large packet and the small packet. They both (should) have checksum 0x99F2. I don't think that's a coincidence...
Edit: I re-calculated the second large packet also and surprise: the checksum is also 0x99F2.


And if you take 0x99F2 and adjust it for the NAT, i.e. change 192.168.0.12 (0xC0A8000C) to the public IP 24.21.66.146 (0x18154292), the checksum becomes 0xFFFF:

Code: Select all

99F2 + (C0A8 + 000C) - (1815 + 4292)
FFFF
Absolutely solid find guys!
Excuse my language but holy shit!

Wrote some quick scapy code in my ipython terminal and have a open UDP echo server at my remote server. Using scapy I try to send custom UDP packets I find that if I force either the checksum to end up being 0x0000 or 0xFFFF at any point my packets never reach my UDP echo server!

With this thought, I now look at our WAN IPs and they have changed! So now it seems we don't get 0xFFFF as a UDP checksum (both before and after NAT both ways) for any packets in our test map download flow!

Now a few hours later and had some sleep to be able to write semi-coherently and here we are!

Can we call this something further than a edge case to a full WTF corner case?

I honestly hope this information is enough to fix it for every one else: Some small change to make the checksums rotate through possibilities (eg the random byte!) instead of always the same. A yay for everyone all around!

... At least I can't possibly think of any *other* reason for anyone else. This at least solves it for us and also helps answer why when playing in local LAN we never encounter the problems.[/color]

ikiris
Inserter
Inserter
Posts: 26
Joined: Sun Jul 19, 2015 5:57 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by ikiris »

I figured it was something along these lines, which is why I asked in irc for the real packet headers.

TLDR: this likely isn't any magic firewalling in the path, but much more likely to be a simple asic bug in how one of the standards is interpreted. +1 to the suggestion of counters.

One of the things you eventually learn as a neteng is that everyone builds routers/switches terribly, and then fixes all the things the big clients complain about that break them the most that they're willing to yell loud enough over. It's honestly a miracle the internet works as it is.

Twinsen
Factorio Staff
Factorio Staff
Posts: 1331
Joined: Tue Sep 23, 2014 7:10 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Twinsen »

There is no doubt this can be fixed by adding random data or a counter. Now I'm just very curious what router is faulty, or maybe if ISP hardware is faulty.

With the WAN IPs changed it's possible that the checksum will never be 0xFFFF with factorio packets or it's possible the routing has changed. But you could craft a custom 2byte payload for the packet that will make the checksum 0xFFFF again.

User avatar
mexmer
Filter Inserter
Filter Inserter
Posts: 870
Joined: Wed Aug 03, 2016 2:00 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by mexmer »

this problem reminds me about issue with certain intel network adapters "eating" audio packets.
i will not give you exact keywords, since it was like 4 or more years ago. but problem was rather simple and it was issue of adapter firmware. under specific conditions packets were treated as magic packet due error in card firmware (usually you don't use card firmware for packet processing, but on servers you have offten enabled chime offload and other stuff to utilize your hardware at max, which gives your network adapter higher troughput). Most funny part of that was, it was affecting only certain services that were utilizing voice communication and even then not always.

later dell issued firmware update for that adapters and intel too (why dell? because those adapters were used mainly as oem parts in dell poweredge servers, with customized for their management software, but in fact bug was affected whole family, so every OEM that used those adapters had that issue)

and no, it was not me who found solution, i just had that issue, and incidently found on stackexchange people were discussing it, as well as solution ;)

Paul17041993
Inserter
Inserter
Posts: 36
Joined: Fri Nov 25, 2016 4:26 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Paul17041993 »

Twinsen wrote:There is no doubt this can be fixed by adding random data or a counter. Now I'm just very curious what router is faulty, or maybe if ISP hardware is faulty.

With the WAN IPs changed it's possible that the checksum will never be 0xFFFF with factorio packets or it's possible the routing has changed. But you could craft a custom 2byte payload for the packet that will make the checksum 0xFFFF again.
If you're generating the checksum based on the entire packet generated just before sending it off, there's no surprise that you'll end up with inconsistencies. If you're generating the checksum from the actual payload data, then packaging and sending it then that should be consistent.

If you're sending packets that contain purely checksum data for previous packets and files, I suggest you encrypt them for various reasons as well as this one.

That's about as far as I can help without looking at code examples though...
Please be sure you've googled your question before asking me about code... :T

ikiris
Inserter
Inserter
Posts: 26
Joined: Sun Jul 19, 2015 5:57 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by ikiris »

Twinsen wrote:There is no doubt this can be fixed by adding random data or a counter. Now I'm just very curious what router is faulty, or maybe if ISP hardware is faulty.

With the WAN IPs changed it's possible that the checksum will never be 0xFFFF with factorio packets or it's possible the routing has changed. But you could craft a custom 2byte payload for the packet that will make the checksum 0xFFFF again.
It could be anything in the path. You can try to do a traceroute with the packet by walking the TTL up and look for icmp dest unreach / ttl exceed if you have a raw socket, but that will only get you a rough idea where the problem is due to the nature of most networks security policy and how the forwarding planes work. It also requires you to have a known broken test case, and that's easier said than done.

Seriously though, this is pretty normal. You should just include the counter and move on. The nature of the internet basically means that something somewhere is ALWAYS broken, and you just have to deal with it by doing basic things like this to protect yourself. Its why its usually a bad idea to write your own protocols instead of using standard ones as well =) There's a reason the deeper you guys go with this, the closer and closer it starts to look like everything else you could have used to begin with. *chuckles*

sillyfly
Smart Inserter
Smart Inserter
Posts: 1099
Joined: Sun May 04, 2014 11:29 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by sillyfly »

Paul17041993 wrote: If you're generating the checksum based on the entire packet generated just before sending it off, there's no surprise that you'll end up with inconsistencies. If you're generating the checksum from the actual payload data, then packaging and sending it then that should be consistent.

If you're sending packets that contain purely checksum data for previous packets and files, I suggest you encrypt them for various reasons as well as this one.

That's about as far as I can help without looking at code examples though...
*sigh*
Factorio doesn't* calculate the UDP checksum, that's way outside of the scope of the game. The UDP checksum is inevitably recalculated at least once, as the host isn't aware it's behind a NAT (that's one of the key features of a NAT). Encryption is a massive overhead, and IMHO completely redundant for something like Factorio.
ikiris wrote: It could be anything in the path. You can try to do a traceroute with the packet by walking the TTL up and look for icmp dest unreach / ttl exceed if you have a raw socket, but that will only get you a rough idea where the problem is due to the nature of most networks security policy and how the forwarding planes work. It also requires you to have a known broken test case, and that's easier said than done.

Seriously though, this is pretty normal. You should just include the counter and move on. The nature of the internet basically means that something somewhere is ALWAYS broken, and you just have to deal with it by doing basic things like this to protect yourself. Its why its usually a bad idea to write your own protocols instead of using standard ones as well =) There's a reason the deeper you guys go with this, the closer and closer it starts to look like everything else you could have used to begin with. *chuckles*


*double sigh*
Traceroute uses ICMP protocol, not UDP, and although the checksum algorithm is almost identical, ICMP doesn't allow omission of the checksum and thus doesn't consider 0x0000 to be special. The possibility therefore to reproduce this problem at the same host is next no nothing.
But you're right, there will always be problems and bugs in network equipment, and the Factorio devs will have nothing to do about it, which is exactly why their decision to roll their own protocol is inconsequential to this (and similar) problems.

I'm still almost certain my initial guess is accurate, that is - it's a combination of two bugs that interfere destructively - the first is almost definitely in the firmware on the home routers, namely the one which neglects to replace a 0x0000 UDP checksum with 0xFFFF; the second is most probably in one or two of the ISP routers, which drops UDP packets with checksum 0x0000 instead of re-calculating the checksum.


* This is admittedly a guess on my part, but I would be extremely surprised if it's wrong, and at any rate due to the fact both computers were behind NAT is completely irrelevant.

AyrA
Inserter
Inserter
Posts: 37
Joined: Mon Aug 31, 2015 8:00 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by AyrA »

sillyfly wrote:*double sigh*
Traceroute uses ICMP protocol, not UDP
Traceroute works by setting the TTL too small on purpose and gradually increasing it. This works with any packet type that uses IPv4 or IPv6. We just use ICMP because it is convenient but there is no need to do so. In fact if the target system doesn't responds to ping signals or a device in between filters ICMP, using TCP or UDP cal yield way better results.

gurka
Burner Inserter
Burner Inserter
Posts: 5
Joined: Sat Jan 28, 2017 10:57 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by gurka »

mexmer wrote:this problem reminds me about issue with certain intel network adapters "eating" audio packets.
http://blog.krisk.org/2013/02/packets-of-death.html ?

Vxsote
Inserter
Inserter
Posts: 38
Joined: Sat Oct 01, 2016 12:51 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Vxsote »

I think it's great that you've been able to demonstrate that the problem is definitely related to the checksum, and I think that the workaround that many have suggested (a retry counter or other changing data in retransmission attempts) would indeed mask this particular problem effectively. And if the Factorio devs want to help out their customers with this problem by implementing a workaround, that's cool.

But really, I'm left wanting more. A workaround is just a workaround, not a proper solution. As a software developer myself, I expect the underlying APIs, frameworks, and yes - networks, to behave they way they are supposed to behave. So I would like to ask that you keep investigating your network and try to determine if one or other of your routers is the culprit. That information will be most helpful, whether this turns out to be a widespread problem (as you often seen with other flaws in router firmwares based on the same code) or just one particular brand or model.

pieppiep
Fast Inserter
Fast Inserter
Posts: 170
Joined: Mon Mar 14, 2016 8:52 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by pieppiep »

Vxsote wrote:As a software developer myself, I expect the underlying APIs, frameworks, and yes - networks, to behave they way they are supposed to behave.
Really?
In the years that I'm a software developer I've learned most but not all APIs, frameworks and networks behave the way they are supposed to behave. That's why they need bug fixes.

BattleshipBrotemkin
Manual Inserter
Manual Inserter
Posts: 3
Joined: Fri Jan 27, 2017 2:52 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by BattleshipBrotemkin »

I handled the reddit thread followup on Saturday, but forgot to post my thanks to everyone involved here, too. Thank you all for your assistance and suggestions. And good luck on the fix to the devs.

Vxsote
Inserter
Inserter
Posts: 38
Joined: Sat Oct 01, 2016 12:51 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Vxsote »

pieppiep wrote:
Vxsote wrote:As a software developer myself, I expect the underlying APIs, frameworks, and yes - networks, to behave they way they are supposed to behave.
Really?
In the years that I'm a software developer I've learned most but not all APIs, frameworks and networks behave the way they are supposed to behave. That's why they need bug fixes.
Well, yes, really. That doesn't mean I'm blind to the reality that sometimes there are bugs in them (and sometimes more often than 'sometimes'). But the point I was trying to make is that when such a bug appears, it should be isolated and fixed if possible; don't just apply a band-aid and ignore the underlying issue.

User avatar
hansinator
Fast Inserter
Fast Inserter
Posts: 160
Joined: Sat Sep 10, 2016 10:42 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by hansinator »

Vxsote wrote:But really, I'm left wanting more. A workaround is just a workaround, not a proper solution. As a software developer myself, I expect the underlying APIs, frameworks, and yes - networks, to behave they way they are supposed to behave. So I would like to ask that you keep investigating your network and try to determine if one or other of your routers is the culprit. That information will be most helpful, whether this turns out to be a widespread problem (as you often seen with other flaws in router firmwares based on the same code) or just one particular brand or model.
I have to ask: which planet do you come from? I want to be a coder there as well!
Do you know why the underlying stuff often works as you expect? Because they include workarounds for lots of faulty hardware that will never be fixed. Please, search the Linux kernel source code for the word "quirk".

Here's as an example the header comment from the 4000+ lines ./drivers/pci/quirks.c

Code: Select all

This file contains work-arounds for many known PCI hardware
bugs.
Edit:
Vxsote wrote:Well, yes, really. That doesn't mean I'm blind to the reality that sometimes there are bugs in them (and sometimes more often than 'sometimes'). But the point I was trying to make is that when such a bug appears, it should be isolated and fixed if possible; don't just apply a band-aid and ignore the underlying issue.
It depends. In this case a workaround that always works (tm) to fix the problem is the way to go. I would still try to contact comcast about the issue and include traceroutes for server connections in the debug logs to pinpoint the source of the error, though.

Voltara
Burner Inserter
Burner Inserter
Posts: 9
Joined: Wed Apr 13, 2016 6:57 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Voltara »

Vxsote wrote:
pieppiep wrote:
Vxsote wrote:As a software developer myself, I expect the underlying APIs, frameworks, and yes - networks, to behave they way they are supposed to behave.
Really?
In the years that I'm a software developer I've learned most but not all APIs, frameworks and networks behave the way they are supposed to behave. That's why they need bug fixes.
Well, yes, really. That doesn't mean I'm blind to the reality that sometimes there are bugs in them (and sometimes more often than 'sometimes'). But the point I was trying to make is that when such a bug appears, it should be isolated and fixed if possible; don't just apply a band-aid and ignore the underlying issue.
I think you may have missed one key aspect: the bug is in the end user's equipment (or the end user's ISP's equipment), and thus completely outside of the Factorio devs' control. The proposed workaround isn't a band-aid; it's an important step toward making the multiplayer network protocol more robust.

If you're suggesting the affected users should investigate further, I wholeheartedly agree. I can just imagine the intermittent DNS, DHCP, and other issues they must be experiencing.

User avatar
mexmer
Filter Inserter
Filter Inserter
Posts: 870
Joined: Wed Aug 03, 2016 2:00 pm
Contact:

Re: Map download never finishes [14.5] headless windows

Post by mexmer »

gurka wrote:
mexmer wrote:this problem reminds me about issue with certain intel network adapters "eating" audio packets.
http://blog.krisk.org/2013/02/packets-of-death.html ?
oh this one looks like even more fun, that it was able to completely shutdown NIC adapter :lol:
one i had in mind is few years elder, but guess intel never learns :roll:

admalledd
Burner Inserter
Burner Inserter
Posts: 8
Joined: Thu Oct 06, 2016 4:05 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by admalledd »

I actually mentioned that I tested earlier with scapy to craft custom UDP packets that would have length <64, and checksums of either 0xFFFF or 0x0000 on either the DST or SRC side of things to a UDP echo server (that just prints to console) on my main remote server (http://www.admalledd.com).

Either with my router or plugged directly into the cable modem here I never saw those packets make it. However for any other checksum things worked fine (yay python+scapy+for-loops...)

So, was not my home router, Brotemkin's maybe considering its age, but I can at least confirm with no consumer router and routing to my server I was loosing these packets.

No DHCP/DNS issues that I have noticed here though, although I have a local DNS resolver server (dns.home.admalledd.com)...

Shoutout to /r/homelab :P

If anyone has ideas on how to narrow down which hop is eating packets I would love to know.

PS: Twinsen, is there any ETA on a small patch for 0.14? Or too much potential impact and going to wait for 0.15? (although as I and Brotemkin mention, we don't encounter this anymore with our IP changes between us).

Twinsen
Factorio Staff
Factorio Staff
Posts: 1331
Joined: Tue Sep 23, 2014 7:10 am
Contact:

Re: Map download never finishes [14.5] headless windows

Post by Twinsen »

admalledd wrote: Either with my router or plugged directly into the cable modem here I never saw those packets make it. However for any other checksum things worked fine (yay python+scapy+for-loops...)
Nice. I think you have enough information to contact your ISP. Start by mentioning this serious issue was investigated by many technical people around the world and you know what your talking about. Write all the technical details as clear as possible and ask them to forward you message to their network engineers, maybe you can even get a reply back with some interesting details :). Customer support can be tricky, you just need to get past the "please restart your modem and wait for 30 minutes".
admalledd wrote: PS: Twinsen, is there any ETA on a small patch for 0.14? Or too much potential impact and going to wait for 0.15? (although as I and Brotemkin mention, we don't encounter this anymore with our IP changes between us).
We already added our fix to the game, but it will be on 0.15, mainly because we don't want to do any more 0.14 releases unless there are serious issues.
The fix should be there in 0.14.22, just released

Post Reply

Return to “Technical Help”